groupBy()

`groupBy()`

The groupBy() query function is used to group together events by one or more specified fields. This is similar to the GROUP BY method in SQL databases.

Parameter	Type	Required	Default Value	Description
`field`^[a]	array of strings	required		This specifies which field or fields to group.
`function`	array of aggregate functions	optional^[b]	`count(as=_count)`	This specifies which aggregate functions to use with each group. If several aggregators are listed for the `function` parameter, then their outputs are combined using the rules described for `stats()`.
`limit`	string	optional^[b]	`20,000`	This sets the limit for the number of group elements. Default value controlled by `GroupDefaultLimit` dynamic configuration, max value controlled by `GroupMaxLimit`. If the argument is `max` (`limit=max`), then the value of `GroupMaxLimit` is used.
		Maximum	`1,000,000`
^[a]The parameter name `field` can be omitted. ^[b]Optional parameters use their default value unless explicitly set.

Hide omitted argument names for this function

Show omitted argument names for this function

The groupBy() can be used to execute aggregate functions on each group. The fields generated are grouped by the field or fields in the field parameter for each aggregate function. For example, by default the count() function is used and the grouped count returned in the _count field by default.

The default is to use count() as an aggregate function. However, if the goal is to only find the unique values of a field or combinations of multiple fields, then groupBy() can be provided with an empty list as its aggregate function.

When showing time series data, the timeChart() and bucket() functions are an extension of groupBy() that groups by time.

Warning

The groupBy() may return incomplete results limited to a restricted subset of groups and related events, in cases where the limit parameter is exceeded.

The groupBy() function is limited in the number of groups it will handle. Such limit is set in the LogScale configuration using GraphQL mutations bound by the GroupMaxLimit dynamic configuration.

Starting from version 1.127, the maximum value of the limit parameter is 1,000,000 by default (adjustable by the corresponding GroupMaxLimit dynamic configuration), which means it is possible to create up to 1 Million groups.

When the limit parameter is not supplied explicitly, its default value is determined by the GroupDefaultLimit dynamic configuration, which has a default value of 20,000.

GroupDefaultLimit cannot be higher than GroupMaxLimit.

Attention

The limit parameter is governed by the dynamic configurations GroupMaxLimit and GroupDefaultLimit to the same value for a seamless upgrade.

Additionally, like all LogScale functions, groupBy() has an internal memory limit determined by the dynamic configuration QueryCoordinatorMemoryLimit. This means that it is possible for the function to collect less than the specified limit number of groups, if the total amount of data collected by the function exceeds this limit. The internal memory limit is in place to protect the cluster health, and ensure sufficient memory for other queries to run. Should the memory limit be reached, no more groups are added. This may lead to results that are inconsistent with the results shown prior to the limit being reached. The groups that were present in prior results may be pushed out of the result set during merges between nodes, to ensure that the memory usage is kept within bounds. If a group is shown, the results for that group will be correct. In situations where you reach the memory limit, you may want to use the top() function to narrow the list of groups.

One of the reasons for this limitation is because groupBy() is implemented to work entirely in memory, and cannot spill to disk. The limit helps to prevent searches from consuming too much memory. This method and limitation may be changed in future versions of LogScale.

`groupBy()` Examples

Since the most common use-case is to count the distinct values of a field, the default behavior is to use count() to aggregate the events. The simplest use-case of groupBy() is, therefore, to count the occurrence of the distinct values of a field:

logscale

groupBy(status_code)

Given input:

status_code	ip
440	1.111.111.111
500	1.111.111.111
440	1.111.111.111
440	2.222.222.222

The output is:

status_code	_count
440	3
500	1

The task is to only find the unique values, groupBy() can be provided with an empty list as the aggregate function, implying that nothing will be aggregated:

logscale

groupBy(status_code, function=[])

The output is:

status_code
440
500

Similarly, groupBy() can be used to find the unique combinations of field values present in the data:

logscale

groupBy([status_code, ip], function=[])

The output is:

status_code	ip
440	1.111.111.111
500	1.111.111.111
440	2.222.222.222

A usage of the function such as the one above is also very useful for query-based parameters in dashboards, see Query Parameter for more explanations.

As an example of how to use this query function, suppose you have a LogScale repository that is ingesting data from one or more web servers. Now suppose you want to get a list of HTTP status codes, the results of users accessing the server. For instance, HTTP code 404 for Not Found — for web pages not found. You could get a list of all of the HTTP codes returned and a count of each by executing a query that counts the different http status codes, like this:

logscale

groupBy(field=status_code, function=count())
| sort(statuscode, order=asc)

In addition to the groupBy(), the results here are piped to the sort() function to sort the results from the lowest HTTP code to the highest.

statuscode	_count
200	7485
301	2314
304	65
400	101
404	6425
405	2
408	9

With the groupBy() function, you can have LogScale group by more than one field. You would just give the fields in an array, a comma-separated list within square-brackets. It would look something like this query in which events are grouped on HTTP method and status code:

logscale

groupBy(field=[method, statuscode], function=count())
| sort([method, statuscode], order=asc)

Will produce these results:

method	statuscode
CONNECT	301
GET	200
GET	301
GET	304
GET	400
GET	404
HEAD	200
HEAD	301
HEAD	400
HEAD	404
OPTIONS	200
OPTIONS	301
POST	200
POST	301
POST	400
POST	404

Although this query works, you may also want to show the total for each method, not just the total for each status code of each method. To do that, you'll have to adjust the query to look like this:

logscale

[groupBy(method, function=[count(as=method_total), groupBy(statuscode, function=count(as=method_status_count))])]

In this query, a groupBy() is nested within another groupBy(). The results will look like this:

method	method_total	statuscode	method_status_count
GET	14078	200	7326
GET	14078	301	1246
GET	14078	304	65
GET	14078	400	70
GET	14078	404	5371
HEAD	139	200	14
HEAD	139	301	22
HEAD	139	400	6
HEAD	139	404	97
POST	2005	200	14
POST	2005	301	1002
POST	2005	400	25
POST	2055	404	964

These results still might not be as tidy as you like. You might do better with separate queries for each method or each status code — assuming you're interested in only a few specific ones. Another possibility might be to use an API to be able to assemble the results in a manner that you prefer.

Instead of a single function, additional query functions can be also applied, allowing for calculations on the count that is obtained from groupBy(). The following query contains an embedded expression within groupBy(), as a demonstration of formatting/calculating the value within a groupBy() expression:

logscale

groupBy(host, function=[{count()
| esp:=_count/300}])

The query generates a count() of the number of hosts in the aggregation, which creates a field _count which is then divided by 300 and placed into the field esp.

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Dashboards & Widgets

Automation

Query Language Syntax

Query Functions

Template Language

Keyboard Shortcuts

`groupBy()`

Warning

`groupBy()` Examples

Enter search term

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Dashboards & Widgets

Automation

Query Language Syntax

Query Functions

Template Language

Keyboard Shortcuts

Warning

groupBy() Examples

Enter search term

`groupBy()`

`groupBy()` Examples