groupBy() query function is used to group
together events by one or more specified fields. This is similar to the
GROUP BY method in SQL databases. Further,
it can be used to execute aggregate functions on each group. The results
are returned in the
field parameter for each
aggregate function. For example, the
_count field if the
count() function is used.
The default is to use
count() as an aggregate
function. However, if the goal is to only find the unique values of a
field or combinations of multiple fields, then
groupBy() can be provided with an empty list as its
|Array of strings||required||This specifies which field or fields to group.|
|Array of Aggregate Functions||optional||This specifies which aggregate functions to use with each group.|
|string||optional||This sets the limit for the number of group elements.|
|Valid Values||Use the GroupMaxLimit (200,000)|
The parameter name for
field can be omitted; the following forms are equivalent:
groupBy may return incomplete results limited to a
restricted subset of groups and related events, in cases where the
limit parameter is exceeded.
The maximum value of the
limit parameter for this
dynamic configuration is 200,000 by default (adjustable by the dynamic
configuration), which means it is possible to create up to 200k groups.
limit parameter is not supplied
explicitly, its default value is determined by the
configuration, which has a default value of 20,000.
limit parameter was previously bounded by
MAX_STATE_LIMIT environment variable, if you have made
any modifications to it, please set the dynamic configurations
the same value for a seamless upgrade.
groupBy() has an internal memory limit
determined by the dynamic configuration
non-live queries, and
live queries. Both of these have a default value of 100MB. If either kind
of limit is reached in a search query, additional groups won't be created.
The results returned will then be inconsistent because they were merged
together randomly; which groups are kept will be random, as well. For such
situations, you may want to use the
top() function to
narrow the list of groups.
One of the reasons for this limitation is because
groupBy() is implemented to work entirely in memory,
and cannot spill to disk. The limit helps to prevent searches from
consuming too much memory. This method and limitation may be changed in
future versions of LogScale.
Since the most common use-case is to count the distinct values of a
fieldthe default behavior is to use
aggregate the events. The simplest use-case of
groupBy() is, therefore, to count the occurence of
the distinct values of a field:
The output is:
The task is to only find the unique values,
groupBy() can be provided with an empty list as the
aggregate function, implying that nothing will be aggregated:
The output is:
groupBy() can be used to find the unique
combinations of field values present in the data:
groupBy([status_code, ip], function=)
The output is:
A usage of the function such as the one above is also very useful for query-based parameters in dashboards, see Query Parameter for more explanations.
As an example of how to use this query function, suppose you have a
LogScale repository that is ingesting data from one or more web servers.
Now suppose you want to get a list of HTTP status codes, the results of
users accessing the server. For instance, HTTP code 404 for
Not Found — for web pages not
found. You could get a list of all of the HTTP codes returned and a
count of each by executing a query like this:
Count different http status codes
groupBy(field=status_code, function=count()) | sort(statuscode, order=asc)
groupBy() function, you can have LogScale
group by more than one field. You would just give the fields in an
array, a comma-separated list within square-brackets. It would look
something like this query in which events are grouped on HTTP method and
groupBy(field=[method, statuscode], function=count()) | sort([method, statuscode], order=asc)
The figure below shows you these results.
Although this query works, you may also want to show the total for each method, not just the total for each status code of each method. To do that, you'll have to adjust the query to look like this:
[groupby(method, function=[count(as=method_total), groupby(statuscode, function=count(as=method_status_count))])]
These results still might not be as tidy as you like. You might do better with separate queries for each method or each status code — assuming you're interested in only a few specific ones. Another possibility might be to use an API to be able to assemble the results in a manner that you prefer.