This function is a filter query that runs in two phases.
Runs a query to determine a set of IDs (specified using the
field
parameter), for which there exists an event with that field ID which satisfy all thewhere
clauses. Eachwhere
clause can be satisfied by distinct events (but they must all have the same ID).Runs as a filter function that lets all events that have one of the determined IDs pass through. In the secondary run, the events need only match the ID, not any of the
where
clauses; unlessprefilter=true
is set.
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
field [a] | Array of strings | required | Specifies which field in the event (log line) is the join key identifier. | |
prefilter | boolean | optional[b] | false | Only pass through values matching at least one of the where clauses. |
where | [Filter] | required | The subquery to execute producing the values to join with. | |
[b] Optional parameters use their default value unless explicitly set |
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscaleselfJoinFilter("field",where="value")
and:
logscaleselfJoinFilter(field="field",where="value")
These examples show basic structure only; full examples are provided below.
selfJoinFilter()
is probabilistic and the result can
contain false positives.
matches | false positive rate | number of false positives |
---|---|---|
1000 | 0.00000% | 0.0 |
10000 | 0.00029% | 0.0 |
20000 | 0.00224% | 0.4 |
25000 | 0.00434% | 1.1 |
50000 | 0.03289% | 16.4 |
If, for example, the where
clauses (along with any
preceding filtering) limits the matching IDs to 25,000 elements, then out
of those 1.1 will be false positives on average.
When passed the additional argument
prefilter=true
, the resulting output will
only contain those log lines that match one of the
where
clauses. With
prefilter
set to
false
by default, all log lines with a
join key for which there exists events that satisfy the
where
clauses will be passed through.
This function does two passes over the data and can therefore not be used
in a live query unless in combination with
beta:repeating()
.
Note
If multiple fields are specified in the field
parameter, they must all exist in an event, for it to be valid for
selfJoinFilter()
.
selfJoinFilter()
Examples
You have emails logged with one event for each header. Find all attachments for emails sent from Peter to Paul, by first finding all the email_ids that correspond to mails from Peter to Paul, subsequently find all log messages with one of those email_ids that also has an attachment.
selfJoinFilter(field=email_id, where=[{ from=peter }, {to=paul}])
| attachment=*