This function is a filter query that runs in two phases:
Runs a query to determine a set of IDs (specified using the
field
parameter), for which there exists an event with that field ID which satisfy all thewhere
clauses. Eachwhere
clause can be satisfied by distinct events (but they must all have the same ID).Runs as a filter function that lets all events that have one of the determined IDs pass through. In the secondary run, the events need only match the ID, not any of the
where
clauses; unlessprefilter=true
is set.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
field [a] | array of strings | required | Specifies which field in the event (log line) is the join key identifier. | |
prefilter | boolean | optional[b] | false | Only pass through values matching at least one of the where clauses. |
where | [filter] | required | The subquery to execute producing the values to join with. | |
[b] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale SyntaxselfJoinFilter(["value"],where="value")
and:
logscale SyntaxselfJoinFilter(field=["value"],where="value")
These examples show basic structure only.
Hide negatable operation for this function
Negatable Function OperationThis function is negatable, implying the inverse of the result. For example:
logscale Syntax!selfJoinFilter()
Or:
logscale Syntaxnot selfJoinFilter()
For more information, see Negating the Result of Filter Functions.
The function uses a compact and fast, but imprecise, summary of the relevant keys being filtered and is therefore useful when narrowing down the set of events and keys in an efficient manner where other aggregate functions may reach their key limit. This can be used most effectively to produce a data set of events that share a common key.
When using the function, a query should use:
Filter the event set to find the base set of events.
Use
selfJoinFilter()
to find events with the common keys.Correlate the content, for example by using
groupBy()
to aggregate the contents.(Optionally) filter the results to exclude any correlated data not required in the output.
selfJoinFilter()
is probabilistic and the result can
contain false positives.
matches | false positive rate | number of false positives |
---|---|---|
1000 | 0.00000% | 0.0 |
10000 | 0.00029% | 0.0 |
20000 | 0.00224% | 0.4 |
25000 | 0.00434% | 1.1 |
50000 | 0.03289% | 16.4 |
If, for example, the where
clauses (along with any
preceding filtering) limits the matching IDs to 25,000 elements, then out
of those 1.1 will be false positives on average.
When passed the additional argument
prefilter=true
, the resulting output will
only contain those log lines that match one of the
where
clauses. With
prefilter
set to
false
by default, all log lines with a
join key for which there exists events that satisfy the
where
clauses will be passed through.
Warning
This function does two passes over the data and can therefore not be
used in a live query unless in combination with
beta:repeating()
.
Note
If multiple fields are specified in the field
parameter, they must all exist in an event, for it to be valid for
selfJoinFilter()
.
selfJoinFilter()
Examples
Click
next to an example below to get the full details.Compare and Filter Values in Same Table
Retrieves all emails with attachments sent from one given person to another given person using the selfJoinFilter()
function matching only the ID
Query
selfJoinFilter(field=email_id, where=[{ from=peter }, {to=paul}])| attachment=*
Introduction
The selfJoin()
function is a join query
that matches events across the same event sets.
selfJoin()
joins an event set to itself and
allows you to combine events from the same table based on two
fields in the same event. In order to do this, the event set
must have a common field with a unique ID, a primary field, and
a secondary (or subquery) field that will be matched to each
other. The selfJoinFilter()
function
differs from the selfJoin()
function as
this is a filter query.
Where selfJoin()
uses the
where
parameter to find the values of a
given field where the conditions are met and return them as a
result, selfJoinFilter()
passes the events
containing values that meet the condition on to a second phase,
where a filter function is run on the ID of the event itself -
the events need only match the ID, not any of the where clauses;
unless prefilter=true is set. In this example, emails are logged
with one event for each header (each email has its own ID) and
the selfJoinFilter()
function is used to
find all attachments for emails sent from one given person to
another given person. Notice, that this query does two passes
over the data and, therefore, cannot be used in a live query.
Step-by-Step
Starting with the source repository events.
- logscale
selfJoinFilter(field=email_id, where=[{ from=peter }, {to=paul}])
Finds all the values in the emails_id field that correspond to mails from
Peter
toPaul
. - logscale
| attachment=*
Finds all log messages with one of those values in the emails_ids field that has been passed on from first phase that also have an attachment.
Event Result set.
Summary and Results
The query is used to find all emails with attachments sent from
one given person to another person. In general, the
selfJoinFilter()
function is useful for
narrowing down a set of events in a fairly efficient manner, in
cases where the total set of events is too voluminous.