groupBy()

`groupBy()`

The groupBy() query function is used to group together events by one or more specified fields. This is similar to the GROUP BY method in SQL databases.

Parameter	Type	Required	Default	Description
`field`^[a]	Array of strings	required		This specifies which field or fields to group.
`function`	Array of Aggregate Functions	optional^[b]	`count(as=_count)`	This specifies which aggregate functions to use with each group.
`limit`	string	optional^[b]	`20,000`	This sets the limit for the number of group elements. Default value controlled by `GroupDefaultLimit` dynamic configuration, max value controlled by `GroupMaxLimit`.
		Maximum	`1,000,000`
^[a]The argument name `field` can be omitted. ^[b]Optional parameters use their default value unless explicitly set

Omitted Argument Names
The argument name for field can be omitted; the following forms of this function are equivalent:
logscale
groupBy("value")
and:
logscale
groupBy(field="value")

It can be used to execute aggregate functions on each group. The results are returned in the field parameter for each aggregate function. For example, the _count field if the count() function is used.

The default is to use count() as an aggregate function. However, if the goal is to only find the unique values of a field or combinations of multiple fields, then groupBy() can be provided with an empty list as its aggregate function.

When showing time series data, the timeChart() and bucket() functions are an extension of groupBy() that groups by time.

Warning

groupBy may return incomplete results limited to a restricted subset of groups and related events, in cases where the limit> parameter is exceeded.

The groupBy() function is limited in the number of groups it will handle. Such limit is set in the LogScale configuration using GraphQL mutations bound by the GroupMaxLimit dynamic configuration.

Starting from version 1.127, the maximum value of the limit parameter is 1,000,000 by default (adjustable by the corresponding GroupMaxLimit dynamic configuration), which means it is possible to create up to 1 Million groups.

When the limit parameter is not supplied explicitly, its default value is determined by the GroupDefaultLimit dynamic configuration, which has a default value of 20,000.

GroupDefaultLimit cannot be higher than GroupMaxLimit.

Attention

The limit parameter was previously bounded by MAX_STATE_LIMIT environment variable (Self-Hosted customers only). If you have previously made any modifications to this variable, please set the dynamic configurations GroupMaxLimit and GroupDefaultLimit to the same value for a seamless upgrade.

Additionally, groupBy() has an internal memory limit determined by the dynamic configuration QueryMemoryLimit for non-live queries, and LiveQueryMemoryLimit for live queries. Both of these have a default value of 100MB. If either kind of limit is reached in a search query, additional groups won't be created. The results returned will then be inconsistent because they were merged together randomly; which groups are kept will be random, as well. For such situations, you may want to use the top() function to narrow the list of groups.

One of the reasons for this limitation is because groupBy() is implemented to work entirely in memory, and cannot spill to disk. The limit helps to prevent searches from consuming too much memory. This method and limitation may be changed in future versions of LogScale.

`groupBy()` Examples

Since the most common use-case is to count the distinct values of a fieldthe default behavior is to use count() to aggregate the events. The simplest use-case of groupBy() is, therefore, to count the occurence of the distinct values of a field:

logscale

groupBy(status_code)

Given input:

status_core	ip
440	1.111.111.111
500	1.111.111.111
440	1.111.111.111
440	2.222.222.222

The output is:

status_core	_count
440	3
500	1

The task is to only find the unique values, groupBy() can be provided with an empty list as the aggregate function, implying that nothing will be aggregated:

logscale

groupBy(status_code, function=[])

The output is:

status_core
440
500

Similarly, groupBy() can be used to find the unique combinations of field values present in the data:

logscale

groupBy([status_code, ip], function=[])

The output is:

status_core	ip
440	1.111.111.111
500	1.111.111.111
440	2.222.222.222

A usage of the function such as the one above is also very useful for query-based parameters in dashboards, see Query Parameter for more explanations.

As an example of how to use this query function, suppose you have a LogScale repository that is ingesting data from one or more web servers. Now suppose you want to get a list of HTTP status codes, the results of users accessing the server. For instance, HTTP code 404 for Not Found — for web pages not found. You could get a list of all of the HTTP codes returned and a count of each by executing a query that counts the different http status codes, like this:

logscale

groupBy(field=status_code, function=count())
| sort(statuscode, order=asc)

In addition to the groupBy(), the results here are piped to the sort() function to sort the results from the lowest HTTP code to the highest.

statuscode	_count
200	7485
301	2314
304	65
400	101
404	6425
405	2
408	9

With the groupBy() function, you can have LogScale group by more than one field. You would just give the fields in an array, a comma-separated list within square-brackets. It would look something like this query in which events are grouped on HTTP method and status code:

logscale

groupBy(field=[method, statuscode], function=count())
| sort([method, statuscode], order=asc)

Will produce these results:

method	statuscode
CONNECT	301
GET	200
GET	301
GET	304
GET	400
GET	404
HEAD	200
HEAD	301
HEAD	400
HEAD	404
OPTIONS	200
OPTIONS	301
POST	200
POST	301
POST	400
POST	404

Although this query works, you may also want to show the total for each method, not just the total for each status code of each method. To do that, you'll have to adjust the query to look like this:

logscale

[groupby(method, function=[count(as=method_total),
    groupby(statuscode, function=count(as=method_status_count))])]

In this query, a groupBy() is nested within another groupBy(). The results will look like this:

method	method_total	statuscode	method_status_count
GET	14078	200	7326
GET	14078	301	1246
GET	14078	304	65
GET	14078	400	70
GET	14078	404	5371
HEAD	139	200	14
HEAD	139	301	22
HEAD	139	400	6
HEAD	139	404	97
POST	2005	200	14
POST	2005	301	1002
POST	2005	400	25
POST	2055	404	964

These results still might not be as tidy as you like. You might do better with separate queries for each method or each status code — assuming you're interested in only a few specific ones. Another possibility might be to use an API to be able to assemble the results in a manner that you prefer.

Data Analysis 1.107.0-1.112.0

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Dashboards & Widgets

Automation

Query Language Syntax

Query Functions

Template Language

Keyboard Shortcuts

`groupBy()`

Warning

`groupBy()` Examples

Enter search term

Data Analysis 1.107.0-1.112.0

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Dashboards & Widgets

Automation

Query Language Syntax

Query Functions

Template Language

Keyboard Shortcuts

Warning

groupBy() Examples

Enter search term

`groupBy()`

`groupBy()` Examples