Filter For Items Not Part of Data Set Using !join()

Find the set difference using the join() function with negation

Query

logscale
#repo=A session_id=*
| !join(query={#repo=B session_id=*}, field=session_id, key=session_id)

Introduction

The join() function can be used with a negation to filter for items that are not part of a data set, for example, items in data set A not found in data set B.

In this example, the join() function is used with a negation to search and find all session IDs from data set A that are not found in data set B.

Example incoming data from repository A might look like this:

timestampsession_iduser_nameactionstatus
2025-04-01T07:00:00Z123456john.doeloginsuccess
2025-04-01T07:05:00Z123457jane.smithdownloadsuccess
2025-04-01T07:10:00Z123458mike.jonesuploadfailed
2025-04-01T07:15:00Z123459sara.wilsonloginsuccess
2025-04-01T07:20:00Z123460bob.brownlogoutsuccess

Example incoming data from repository B might look like this:

timestampsession_iduser_nameactionstatus
2025-04-01T07:00:00Z123456john.doeloginsuccess
2025-04-01T07:05:00Z123457jane.smithdownloadsuccess
2025-04-01T07:20:00Z123460bob.brownlogoutsuccess

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0["Expression"] 1["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    #repo=A session_id=*

    Filters for all events from repository A, that have a session_id field.

  3. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0["Expression"] 1["Expression"] result{{Result Set}} repo --> 0 0 --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    | !join(query={#repo=B session_id=*}, field=session_id, key=session_id)

    Performs a negated join with repository B, and returns sessions that exist in repository A but not in repository B. The negation operator is used to make it an anti-join operation.

    LogScale recommends using the defineTable() function with !match() instead of negated join. See example Filter For Items Not Part of Data Set Using defineTable()

  4. Event Result set.

Summary and Results

The query is used to find the set difference between two repositories. This is, for example, useful for identifying sync issues or performing data consistency checks. Or just to make a cross-repository comparison.

For more information, see also Query Joins and Lookups

Sample output from the incoming example data:

timestampsession_iduser_nameactionstatus
2025-04-01T07:10:00Z123458mike.jonesuploadfailed
2025-04-01T07:15:00Z123459sara.wilsonloginsuccess