groupBy() Query Function

Groups events by specified fields and executes aggregate functions on each group. This is similar to GROUP BY in SQL databases.

Returns events containing the fields specified in the field parameter and the fields returned by each aggregate function. For example a _count field if function=count().

When showing time series data the timechart and bucket functions are an extension of group by that groups by time. Look at timeChart() and bucket().

GroupBy() limits the number of groups to what is configured in MAX_STATE_LIMIT. Default is 20000. If the limit is hit in a search, additional groups will not be created and the results returned will be inconsistent because they are merged together randomly, so which groups end up being kept is random as well. This can result in some results showing up while the search is executing prior to hitting the limit, that will later disappear when the results are merged. If you run into this then the top function may work better for you. Removing this limitation is on Humio’s development roadmap. At the moment, groupBy() is implemented to work entirely in memory and cannot spill to disk, so the limit is necessary to prevent searches from consuming too much memory.

Parameters

Name

Type

Required

Default

Description

field

[string]

Yes

Specifies which fields to group by. Note it is possible to group by multiple fields.

function

[Aggregate]

No

count(as=_count)

Specifies which aggregate functions to perform on each group. Default is to count the elements in each group.

limit

string

No

20000 (MAX_STATE_LIMIT)

Limit for the number of group elements [0..∞]. Default is What is specified in the configuration parameter MAX_STATE_LIMIT which by default is 20000.

The implied parameter is field.

Examples

Count different http status codes

humio
groupBy(field=status_code, function=count())

group by http method and http statuscode and count the events in each group

humio
groupBy(field=[method, status_code], function=count())

Find the maximum response time for each device while also counting number of requests for each device

humio
groupBy(field=device, function=[max(responsetime, as=time), count()]) | sort(time)