Deduplicate Content by Field

Deduplicating content based on a specific field using the groupBy() function with tail()

Query

flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1{{Aggregate}} result{{Result Set}} repo --> 1 1 --> result
logscale
groupBy(field, function=tail(1))

Introduction

If you want to deduplicate events by a given field, for example to identify a unique list of events for further processing, you can use an aggregate function. In this example, the groupBy() function is used with tail() to use the last value in a sequence of events.

Example incoming data might look like this:

@timestampuserstatusip_address
2025-11-06T10:00:00.000Zaliceactive192.168.1.100
2025-11-06T10:15:00.000Zbobinactive192.168.1.101
2025-11-06T10:30:00.000Zaliceinactive192.168.1.102
2025-11-06T10:45:00.000Zbobactive192.168.1.103
2025-11-06T11:00:00.000Zaliceactive192.168.1.104

Step-by-Step

  1. Starting with the source repository events.

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 1{{Aggregate}} result{{Result Set}} repo --> 1 1 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;
    logscale
    groupBy(field, function=tail(1))

    Groups all events in a specific field, and reduces the results using tail() to take only the last value.

  3. Event Result set.

Summary and Results

The query is used to deduplicate events by a given field. This is useful if you want to create a unique list of events for further processing.

Sample output from the incoming example data where field=user:

@timestampuserstatusip_address
2025-11-06T11:00:00.000Zaliceactive192.168.1.104
2025-11-06T10:45:00.000Zbobactive192.168.1.103

Note that only the last event for each unique value in the user field is kept in the results, while earlier events for the same user are removed.