This function is a filter query that runs in two phases:

  1. Runs a query to determine a set of IDs (specified using the field parameter), for which there exists an event with that field ID which satisfy all the where clauses. Each where clause can be satisfied by distinct events (but they must all have the same ID).

  2. Runs as a filter function that lets all events that have one of the determined IDs pass through. In the secondary run, the events need only match the ID, not any of the where clauses; unless prefilter=true is set.

ParameterTypeRequiredDefault ValueDescription
field[a]array of stringsrequired  Specifies which field in the event (log line) is the join key identifier.
prefilterbooleanoptional[b]false Only pass through values matching at least one of the where clauses.
where[filter]required  The subquery to execute producing the values to join with.

[a] The parameter name field can be omitted.

[b] Optional parameters use their default value unless explicitly set.

Hide omitted argument names for this function

Show omitted argument names for this function

Hide negatable operation for this function

Show negatable operation for this function

The function uses a compact and fast, but imprecise, summary of the relevant keys being filtered and is therefore useful when narrowing down the set of events and keys in an efficient manner where other aggregate functions may reach their key limit. This can be used most effectively to produce a data set of events that share a common key.

When using the function, a query should use:

  • Filter the event set to find the base set of events.

  • Use selfJoinFilter() to find events with the common keys.

  • Correlate the content, for example by using groupBy() to aggregate the contents.

  • (Optionally) filter the results to exclude any correlated data not required in the output.

selfJoinFilter() is probabilistic and the result can contain false positives.

matches false positive rate number of false positives
1000 0.00000% 0.0
10000 0.00029% 0.0
20000 0.00224% 0.4
25000 0.00434% 1.1
50000 0.03289% 16.4

If, for example, the where clauses (along with any preceding filtering) limits the matching IDs to 25,000 elements, then out of those 1.1 will be false positives on average.

When passed the additional argument prefilter=true, the resulting output will only contain those log lines that match one of the where clauses. With prefilter set to false by default, all log lines with a join key for which there exists events that satisfy the where clauses will be passed through.

Warning

This function does two passes over the data and can therefore not be used in a live query unless in combination with beta:repeating().

Note

If multiple fields are specified in the field parameter, they must all exist in an event, for it to be valid for selfJoinFilter().

Click + next to an example below for to get the full details.