The groupBy()
query function is used to group
together events by one or more specified fields. This is similar to the
GROUP BY
method in SQL databases. Further, it can be
used to execute aggregate functions on each group. The results are
returned in the field parameter for each aggregate function. For example,
the _count
field if the count()
is
used.
When showing time series data, the timeChart()
and
bucket()
functions are an extension of
groupBy()
that groups by time.
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
field | [string] | true | This specifies which field or fields to group. | |
function | [Aggregate] | false | count(as=_count) | This specifies which aggregate functions to use with each group. The default is count() . |
limit | string | false | GroupDefaultLimit (i.e., 20,000) | This sets the limit for the number of group elements. The value, if specified, can be either a positive integer or "max", in which case the value of GroupMaxLimit (i.e. 200,000) is used. |
The groupBy()
function is limited in the number of
groups it will handle. Such limit is set in the Humio configuration using
GraphQL mutations bound by the GroupMaxLimit
dynamic
configuration.
The maximum value of the limit
parameter for this
dynamic configuration is 200,000 by default (adjustable by the dynamic
configuration), which means it is possible to create up to 200k groups.
When the limit
parameter is not supplied
explicitly, its default value is determined by the
GroupDefaultLimit
dynamic configuration, which has a
default value of 20,000.
GroupDefaultLimit
cannot be larger than
MAX_STATE_LIMIT
.
Attention
As the limit
parameter was previously bounded by
MAX_STATE_LIMIT
environment variable, if you have made
any modifications to it, please set the dynamic configurations
GroupMaxLimit
and
GroupDefaultLimit
to the same value for a seamless
upgrade.
Additionally, groupBy()
has an internal memory limit
determined by the dynamic configuration
QueryMemoryLimit
for non-live queries, and
LiveQueryMemoryLimit
for live queries. Both of these
have a default value of 100MB. If either kind of limit is reached in a
search query, additional groups won’t be created. The results returned
will then be inconsistent because they were merged together randomly;
which groups are kept will be random, as well. For such situations, you
may want to use the top()
function to narrow the list
of groups.
One of the reasons for this limitation is because
groupBy()
is implemented to work entirely in memory,
and cannot spill to disk. The limit helps to prevent searches from
consuming too much memory. This method and limitation may be changed in
future versions of Humio.
Examples
As an example of how to use this query function, suppose you have a
Humio repository that is ingesting data from one or more web servers.
Now suppose you want to get a list of HTTP status codes, the results of
users accessing the server. For instance, HTTP code 404 for Not
Found
— for web pages not found. You could get a list of
all of the HTTP codes returned and a count of each by executing a query
like this:
Count different http status codes
groupBy(field=status_code, function=count())
| sort(statuscode, order=asc)
In addition to the groupBy()
, the results here are
piped to the sort()
function to sort the results
from the lowest HTTP code to the highest. This query will look something
like in figure here below:

Figure 256. groupBy()
Example
With the groupBy()
function, you can have Humio
group by more than one field. You would just give the fields in an
array, a comma-separated list within square-brackets. It would look
something like this query in which events are grouped on HTTP method and
status code:
groupBy(field=[method, statuscode], function=count())
| sort([method, statuscode], order=asc)
The figure below shows you these results.

Figure 257. groupBy()
Example
Although this query works, you may also want to show the total for each method, not just the total for each status code of each method. To do that, you'll have to adjust the query to look like this:
[groupby(method, function=[count(as=method_total),
groupby(statuscode, function=count(as=method_status_count))])]
In this query, a groupBy()
is nested within another
groupBy()
. The results will look like in figure
here:

Figure 258. groupBy()
Example
These results still might not be as tidy as you like. You might do better with separate queries for each method or each status code — assuming you're interested in only a few specific ones. Another possibility might be to use an API to be able to assemble the results in a manner that you prefer.