selfJoinFilter() Query Function

This function is a filter query that runs in two phases:

First runs a query to determine a set of IDs (specified using the field parameter), for which there exists an event with that field ID which satisfy all the where clauses. Each where clause can be satisfied by distinct events (but they must all have the same ID).

Second, it runs as a filter function that lets all events that have one of the determined IDs pass thru. In the secondary run, the events need only match the ID, not any of the where clauses; unless prefilter=true is set.

SelfJoinFilter() is probabilistic in and the result can contain false positives.

matches

false positive rate

number of false positives

1000

0.00000%

0.0

10000

0.00029%

0.0

20000

0.00224%

0.4

25000

0.00434%

1.1

50000

0.03289%

16.4

If, for example, the where clauses (along with any preceding filtering) limits the matching IDs to 25,000 elements, then out of those 1.1 will be false positives on average.

When passed the additional argument prefilter=true, the resulting output will only contain those log lines that match one of the where clauses. With prefilter=false (the default) all log lines with a join key for which there exists events that satisfy the where clauses will be passed thru.

This function does two passes over the data and can therefore not be used in a live query unless in combination with beta:repeating().

Parameters

Name

Type

Required

Default

Description

where

[Filter]

Yes

The subquery to execute producing the values to join with.

field

[string]

Yes

Specifies which field in the event (log line) that is the join key identifier.

prefilter

boolean

No

false

Only pass thru values matching at least one of the where clauses.

The implied parameter is field.

Examples

You have emails logged with one event for each header. Find all attachments for emails sent from Peter to Paul, by first finding all the email_ids that correspond to mails from Peter to Paul, subsequently find all log messages with one of those email_ids that also has an attachment.

humio
selfJoinFilter(field=email_id, where=[{ from=peter }, {to=paul}]) | attachment=*