How-To: Sorting by Timestamps within groupBy()

When collating a group of field data that includes a field that you also want to sort on, use the collect() and use an embedded call to the sort() function in your aggregate function call.

For example, if you are reporting on data that you want to summarize by a field value and a formatted timestamp, you can sort the collected fields with by the timestamp using this format:

Sorting Timestamps with groupBy()

Query
flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[\Add Field/] 1{{Aggregate}} 2>Augment Data] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> result

Search Repository: humio

logscale
timestamp := formatTime(format="%H:%M")
  |groupby([thread],
function=[{sort("timestamp") | collect("timestamp")}])
Introduction

When using aggregation, you may want to sort on a field that is part of the aggregated set but not the main feature of the aggregated value. For example, sorting the values byt their timestamp rather than the embedded value. To achieve this, you should use a function that sorts the field to b use as the sort field, and then use collect() so that the value from before the aggregaion can be displayed in the generated event set. This query can be executed n the humio respository.

Step-by-Step
  1. Starting with the source repository events

  2. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[\Add Field/] 1{{Aggregate}} 2>Augment Data] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> result style 0 fill:#ff0000,stroke-width:4px,stroke:#000;

    The first line creates a new field, timestamp formatted as HH:MM.

    logscale
    timestamp := formatTime(format="%H:%M")
  3. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[\Add Field/] 1{{Aggregate}} 2>Augment Data] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> result style 1 fill:#ff0000,stroke-width:4px,stroke:#000;

    Group the events, first by the name of the thread and then the formatted timestamp.

    logscale
    |groupby([thread],
  4. flowchart LR; %%{init: {"flowchart": {"defaultRenderer": "elk"}} }%% repo{{Events}} 0[\Add Field/] 1{{Aggregate}} 2>Augment Data] result{{Result Set}} repo --> 0 0 --> 1 1 --> 2 2 --> result style 2 fill:#ff0000,stroke-width:4px,stroke:#000;

    Use the sort() combined wtih collect() as the method fo aggregation. As an embedeed expression for the function, this will sort the events on the timestamp field and then retrieve the feld as it would normally be removed as part of the aggregation process.

    logscale
    function=[{sort("timestamp") | collect("timestamp")}])
  5. Event Result set

Summary and Results

The result set will contain a list of the aggregated thread names sorted by the timestamp:

threadtimestamp
BootstrapInfoJob10:09
DataSynchJob10:09
Global event loop10:10
LocalLivequeryMonitor10:09
LogCollectorManifestUpdate10:09
TransientChatter event loop10:10
aggregate-alert-job10:09
alert-job10:09
block-processing-monitor-job10:09
bloom-scheduler10:09
bucket-entity-config10:09
bucket-overcommit-metrics-job10:09
bucket-storage-download10:09
bucket-storage-prefetch10:09
chatter-runningqueries-logger10:09
chatter-runningqueries-stats10:09