The groupBy()
query function is used to
group together events by one or more specified fields. This is
similar to the GROUP BY
method
in SQL databases.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
field [a] | array of strings | required | This specifies which field or fields to group. | |
function | array of aggregate functions | optional[b] | count(as=_count) | This specifies which aggregate functions to use with each group. If several aggregators are listed for the function parameter, then their outputs are combined using the rules described for stats() . |
limit | string | optional[b] | 20,000 | This sets the limit for the number of group elements. Default value controlled by GroupDefaultLimit dynamic configuration, max value controlled by GroupMaxLimit . If the argument is max (limit=max ), then the value of GroupMaxLimit is used. It prioritizes the top-N series. The top N value being the series with the highest numerical value attributed to it by the subquery across all fields. |
Maximum | 1,000,000 | |||
Controlling Variables | ||||
Variable default: | ||||
[b] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
field
can be omitted; the following forms of this function are equivalent:logscale SyntaxgroupBy([a,b])
and:
logscale SyntaxgroupBy(field=[a,b])
These examples show basic structure only.
groupBy()
Function Operation
The groupBy()
function has specific
implementation and operational considerations, outlined below.
Series Selection in groupBy()
The selection is based on the aggregate numerical output across all specified functions and all time buckets, not the series identifiers themselves.
The limit
prioritizes the top-N series. The top N value being the
series with the highest numerical value attributed to it by
the subquery across all fields.
Series Selection Process:
The selection is based on the numerical values produced by the subquery/function.
It is not based on the series names.
When multiple functions are used, the function considers all values produced.
For different examples of top N series selection, see Find Top N Value of Series - Example 1 and Find Top N Value of Series - Example 2.
Grouping in groupBy()
The groupBy()
can be used to execute
aggregate functions on each group. The fields generated are
grouped by the field or fields in the
field
parameter
for each aggregate function. For example, by default the
count()
function is used and the
grouped count returned in the
_count field by
default.
The default is to use count()
as an
aggregate function. However, if the goal is to only find the
unique values of a field or combinations of multiple fields,
then groupBy()
can be provided with an
empty list as its aggregate function.
When showing time series data, the
timeChart()
and
bucket()
functions are an extension of
groupBy()
that groups by time.
Limits when using groupBy()
Warning
The groupBy()
may return incomplete
results limited to a restricted subset of groups and
related events, in cases where the
limit
parameter is exceeded.
The groupBy()
function is limited in
the number of groups it will handle. Such limit is set in
the LogScale configuration using GraphQL mutations
bound by the GroupMaxLimit
dynamic
configuration.
Starting from version 1.127, the maximum value of the
limit
parameter
is 1,000,000 by default (adjustable by the corresponding
GroupMaxLimit
dynamic configuration), which
means it is possible to create up to 1 Million groups.
When the limit
parameter is not supplied explicitly, its default value is
determined by the GroupDefaultLimit
dynamic
configuration, which has a default value of 20,000.
GroupDefaultLimit
cannot be higher than
GroupMaxLimit
.
Attention
The limit
parameter is governed by
the dynamic configurations GroupMaxLimit
and GroupDefaultLimit
to the same value for
a seamless upgrade.
Additionally, like all LogScale functions,
groupBy()
has an internal memory limit
determined by the dynamic configuration
QueryCoordinatorMemoryLimit
. This means that
it is possible for the function to collect less than the
specified limit number of groups, if the total amount of
data collected by the function exceeds this limit. The
internal memory limit is in place to protect the cluster
health, and ensure sufficient memory for other queries to
run. Should the memory limit be reached, no more groups are
added. This may lead to results that are inconsistent with
the results shown prior to the limit being reached. The
groups that were present in prior results may be pushed out
of the result set during merges between nodes, to ensure
that the memory usage is kept within bounds. If a group is
shown, the results for that group will be correct. In
situations where you reach the memory limit, you may want to
use the top()
function to narrow the
list of groups.
One of the reasons for this limitation is because
groupBy()
is implemented to work
entirely in memory, and cannot spill to disk. The limit
helps to prevent searches from consuming too much memory.
This method and limitation may be changed in future versions
of LogScale.
groupBy()
Syntax Examples
Since the most common use-case is to count the distinct values
of a field, the default behavior is to use
count()
to aggregate the events. The
simplest use-case of groupBy()
is,
therefore, to count the occurrence of the distinct values of a
field:
groupBy(status_code)
Given input:
status_code | ip |
---|---|
440 | 1.111.111.111 |
500 | 1.111.111.111 |
440 | 1.111.111.111 |
440 | 2.222.222.222 |
The output is:
status_code | _count |
---|---|
440 | 3 |
500 | 1 |
The task is to only find the unique values,
groupBy()
can be provided with an empty
list as the aggregate function, implying that nothing will be
aggregated:
groupBy(status_code, function=[])
The output is:
status_code |
---|
440 |
500 |
Similarly, groupBy()
can be used to find
the unique combinations of field values present in the data:
groupBy([status_code, ip], function=[])
The output is:
status_code | ip |
---|---|
440 | 1.111.111.111 |
500 | 1.111.111.111 |
440 | 2.222.222.222 |
A usage of the function such as the one above is also very useful for query-based parameters in dashboards, see Query Parameter for more explanations.
As an example of how to use this query function, suppose you
have a LogScale repository that is ingesting data from
one or more web servers. Now suppose you want to get a list of
HTTP status codes, the results of users accessing the server.
For instance, HTTP code 404 for Not
Found
— for web pages not found. You could get a
list of all of the HTTP codes returned and a count of each by
executing a query that counts the different http status codes,
like this:
groupBy(field=status_code, function=count())
| sort(statuscode, order=asc)
In addition to the groupBy()
, the results
here are piped to the sort()
function to
sort the results from the lowest HTTP code to the highest.
statuscode | _count |
---|---|
200 | 7485 |
301 | 2314 |
304 | 65 |
400 | 101 |
404 | 6425 |
405 | 2 |
408 | 9 |
With the groupBy()
function, you can have
LogScale group by more than one field. You would just
give the fields in an array, a comma-separated list within
square-brackets. It would look something like this query in
which events are grouped on HTTP method and status code:
groupBy(field=[method, statuscode], function=count())
| sort([method, statuscode], order=asc)
Will produce these results:
method | statuscode |
---|---|
CONNECT | 301 |
GET | 200 |
GET | 301 |
GET | 304 |
GET | 400 |
GET | 404 |
HEAD | 200 |
HEAD | 301 |
HEAD | 400 |
HEAD | 404 |
OPTIONS | 200 |
OPTIONS | 301 |
POST | 200 |
POST | 301 |
POST | 400 |
POST | 404 |
Although this query works, you may also want to show the total for each method, not just the total for each status code of each method. To do that, you'll have to adjust the query to look like this:
[groupBy(method, function=[count(as=method_total), groupBy(statuscode, function=count(as=method_status_count))])]
In this query, a groupBy()
is nested
within another groupBy()
. The results
will look like this:
method | method_total | statuscode | method_status_count |
---|---|---|---|
GET | 14078 | 200 | 7326 |
GET | 14078 | 301 | 1246 |
GET | 14078 | 304 | 65 |
GET | 14078 | 400 | 70 |
GET | 14078 | 404 | 5371 |
HEAD | 139 | 200 | 14 |
HEAD | 139 | 301 | 22 |
HEAD | 139 | 400 | 6 |
HEAD | 139 | 404 | 97 |
POST | 2005 | 200 | 14 |
POST | 2005 | 301 | 1002 |
POST | 2005 | 400 | 25 |
POST | 2055 | 404 | 964 |
These results still might not be as tidy as you like. You might do better with separate queries for each method or each status code — assuming you're interested in only a few specific ones. Another possibility might be to use an API to be able to assemble the results in a manner that you prefer.
Instead of a single function, additional query functions can
be also applied, allowing for calculations on the count that
is obtained from groupBy()
. The following
query contains an embedded expression within
groupBy()
, as a demonstration of
formatting/calculating the value within a
groupBy()
expression:
groupBy(host, function=[{count()
| esp:=_count/300}])
The query generates a count()
of the
number of hosts in the aggregation, which creates a field
_count which is then
divided by 300
and placed into
the field esp.
groupBy()
Examples
Click
next to an example below to get the full details.Hourly Data Events
Summarize events by providing a count of the number of data events
per hour using the time:hour()
function
Query
hr := time:hour(field="@ingesttimestamp")
|groupBy(hr)
Introduction
In this example, the time:hour()
function is used
with groupBy()
to average the count of data events
per hour.
Step-by-Step
Starting with the source repository events.
- logscale
hr := time:hour(field="@ingesttimestamp")
Gets the hour (24-hour clock) of the values in the @ingesttimestamp and returns the results in a new field named
hr
. - logscale
|groupBy(hr)
Groups the returned results by hr field and provides a count of the number of data events per hour in a _count field.
Event Result set.
Summary and Results
The query is used to average the count of data events per hour. The results can be plotted onto a bar chart.
Sort Timestamps With groupBy()
Sorting fields based on aggregated field values
Query
timestamp := formatTime(format="%H:%M")
| groupBy([thread],
function=[{sort("timestamp")
| collect("timestamp")}])
Introduction
When using aggregation, you may want to sort on a field that is
part of the aggregated set but not the main feature of the
aggregated value. For example, sorting the values by their
timestamp rather than the embedded value. To achieve this, you
should use a function that sorts the field to be used as the sort
field, and then use collect()
so that the
value from before the aggregation can be displayed in the
generated event set. This query can be executed in the
humio respository.
Step-by-Step
Starting with the source repository events.
- logscale
timestamp := formatTime(format="%H:%M")
Creates a new field, timestamp formatted as
HH:MM
. - logscale
| groupBy([thread],
Groups the events, first by the name of the thread and then the formatted timestamp.
- logscale
function=[{sort("timestamp") | collect("timestamp")}])
Uses the
sort()
combined withcollect()
as the method fo aggregation. As an embedded expression for the function, this will sort the events on the timestamp field and then retrieve the field as it would normally be removed as part of the aggregation process. Event Result set.
Summary and Results
The result set will contain a list of the aggregated thread names sorted by the timestamp:
thread | timestamp |
---|---|
BootstrapInfoJob | 10:09 |
DataSynchJob | 10:09 |
Global event loop | 10:10 |
LocalLivequeryMonitor | 10:09 |
LogCollectorManifestUpdate | 10:09 |
TransientChatter event loop | 10:10 |
aggregate-alert-job | 10:09 |
alert-job | 10:09 |
block-processing-monitor-job | 10:09 |
bloom-scheduler | 10:09 |
bucket-entity-config | 10:09 |
bucket-overcommit-metrics-job | 10:09 |
bucket-storage-download | 10:09 |
bucket-storage-prefetch | 10:09 |
chatter-runningqueries-logger | 10:09 |
chatter-runningqueries-stats | 10:09 |
Deduplicate Content by Field
Deduplicating content based on a specific field
Query
groupBy(field, function=tail(1))
Introduction
If you want to deduplicate events by a given field, for example to
identify a unique list of events for further processing, you can
use an aggregate function. In this example, the
groupBy()
function is used with
tail()
to use the last value in a sequence of
events.
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(field, function=tail(1))
Groups all events in a specific field, and reduces the results using
tail()
to take only the last value. Event Result set.
Summary and Results
The query is used to deduplicate events by a given field. This is useful if you want to create a unique list of events for further processing.
Alert Query For Parsers Issues
Reporting errors
Query
#type=humio #kind=logs
| loglevel=WARN
| class = c.h.d.ParserLimitingJob
| "Setting reject ingest for"
| groupBy(id, function=[count(), min(@timestamp), max(@timestamp)] )
| timeDiff:=_max-_min
| timeDiff > 300000 and _count > 10
Introduction
This alert query tries to balance reacting when there are problems with parsers, without being too restrictive.
Step-by-Step
Starting with the source repository events.
- logscale
#type=humio #kind=logs
Filters on all logs across all hosts in the cluster.
- logscale
| loglevel=WARN
Filters for all events where the loglevel is equal to
WARN
. - logscale
| class = c.h.d.ParserLimitingJob
Assigns the value
c.h.d.ParserLimitingJob
to the class for the logs having the loglevel valueWARN
. - logscale
| "Setting reject ingest for"
Filters for events containing the string
Setting reject ingest for
. This is the error message generated when ingested events are rejected. - logscale
| groupBy(id, function=[count(), min(@timestamp), max(@timestamp)] )
Groups the returned result by the field id, makes a count on the events and returns the minimum timestamp and maximum timestamp. This returns a new event set, with the fields id, _count, _min, and _max.
- logscale
| timeDiff:=_max-_min
Calculates the time difference between the maximum timestamp values and the minimum timestamp values and returns the result in a new field named timeDiff.
- logscale
| timeDiff > 300000 and _count > 10
Returns all events where the values of timeDiff is greater that
300000
and where there are more than10
occurrences. Event Result set.
Summary and Results
This query is used to set up alerts for parsers issues. Setting up alerts for parsers issues will allow to proactively reach out to customers where their queries are being throttled and help them.
Calculate Events per Second by Host
Determine event rate for each host over a 5-minute period using an
embedded expression within the groupBy()
function
Query
groupBy(host, function=[{count() | esp:=_count/300}])
Introduction
In this example, the groupBy()
function is used
with an embedded expression to calculate the total event count and
events per second for each host over a 5-minute period.
Example incoming data might look like this:
@timestamp | host | service | status | response_time |
---|---|---|---|---|
2025-08-06T10:00:00Z | server1 | web | ok | 120 |
2025-08-06T10:00:01Z | server2 | database | ok | 85 |
2025-08-06T10:00:02Z | server1 | web | ok | 95 |
2025-08-06T10:00:03Z | server3 | cache | ok | 45 |
2025-08-06T10:00:04Z | server2 | database | error | 250 |
2025-08-06T10:00:05Z | server1 | web | ok | 110 |
2025-08-06T10:00:06Z | server3 | cache | ok | 40 |
2025-08-06T10:00:07Z | server2 | database | ok | 90 |
2025-08-06T10:00:08Z | server1 | web | error | 300 |
2025-08-06T10:00:09Z | server3 | cache | ok | 42 |
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(host, function=[{count() | esp:=_count/300}])
Groups events by the host field and uses an embedded expression to count the number of events per host and calculate events per second in one operation.
The
count()
function returns the count in a field named _count by default. The embedded expression then divides this value by300
and stores the result in a new field named esp. This calculation provides the average events per second for each host over the time period.Using an embedded expression within the
groupBy()
function performs both the count and the calculation as part of the same aggregation. Both the original _count and the calculated esp field are included in the results. Event Result set.
Summary and Results
The query is used to analyze event frequency patterns by calculating both the total event count and the average events per second for each host.
Note that the query aggregates against both the original count and the count/300 as one aggregate set. Using an embedded expression is more efficient for larger event sets.
This query is useful, for example, to monitor system load distribution across hosts, identify hosts with unusual event rates, or establish baseline activity patterns for capacity planning.
Sample output from the incoming example data:
host | _count | esp |
---|---|---|
server1 | 4 | 0.013333 |
server2 | 3 | 0.010000 |
server3 | 3 | 0.010000 |
Note that the _count field shows the total number of events per host, and the esp field shows the calculated events per second (total events divided by 300 seconds)
This data is ideal for visualization using a Time Chart widget to show event rates over time. A Bar Chart widget could compare event rates across hosts, while a Gauge widget could show current event rates against predefined thresholds. Consider creating a dashboard that combines these visualizations with alerts for when event rates exceed normal ranges.
Calculate Query Cost for All Users by Repository
Search across multiple repositories to calculate query costs for
all users by repository using sort()
and
groupBy()
functions
Query
#type=humio #kind=logs class=c.h.j.RunningQueriesLoggerJob message="Highest Cost query"
| groupBy(repo, initiatingUser, totalLiveCost, totalStaticCost)
| sort([totalLiveCost, totalStaticCost])
Introduction
In this example, the query uses sort()
and
groupBy()
functions to find query costs. The query
filters logs in humio repository that are tagged with
kind
equal to
logs
and then returns the events
where the class field has values
containing
c.h.j.RunningQueriesLoggerJob
,
searching for the specific value Highest Cost
query
.
Example incoming data might look like this:
#type | #kind | class | message | timestamp | dataspace | initiatingUser | totalLiveCost | totalStaticCost | deltaTotalCost | repo |
---|---|---|---|---|---|---|---|---|---|---|
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:30:00Z | production | john.doe | 1500 | 800 | 2300 | security-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:31:00Z | development | jane.smith | 2000 | 1200 | 3200 | app-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:32:00Z | staging | bob.wilson | 1000 | 500 | 1500 | infra-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:33:00Z | production | john.doe | 1800 | 900 | 2700 | security-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:34:00Z | development | jane.smith | 2500 | 1300 | 3800 | app-logs |
humio | logs | c.h.j.RunningQueriesLoggerJob | Highest Cost query | 2025-03-26T09:35:00Z | staging | alice.cooper | 1200 | 600 | 1800 | infra-logs |
Step-by-Step
Starting with the source repository events.
- logscale
#type=humio #kind=logs class=c.h.j.RunningQueriesLoggerJob message="Highest Cost query"
Filters for Humio internal logs containing
c.h.j. RunningQueriesLoggerJob
in the class field and where the value in the message field is equal toHighest Cost query
. - logscale
| groupBy(repo, initiatingUser, totalLiveCost, totalStaticCost)
Groups the results by the repo field, the initiatingUser field, and by both cost types (the fields totalLiveCost, totalStaticCost), and returns a count in a field named _count.
- logscale
| sort([totalLiveCost, totalStaticCost])
Sorts the results by both the totalLiveCost field and the totalStaticCost field, in descending order by default.
Event Result set.
Summary and Results
The query is used to search across multiple repositories and output query costs for all users by repository. The query returns the count in a field named _count. Use this query to focus on live and static costs separately.
Sample output from the incoming example data:
repo | initiatingUser | totalLiveCost | totalStaticCost | _count |
---|---|---|---|---|
app-logs | jane.smith | 2000 | 1200 | 1 |
security-logs | john.doe | 1500 | 800 | 1 |
infra-logs | bob.wilson | 1000 | 500 | 1 |
Detect All Occurrences of Event A Before Event B
Detect all occurrences of event A before event B (brute force
attack) using the partition()
function
combined with groupBy()
Query
head()
| groupBy(
key,
function = partition(
condition=test(status=="success"),
split="after",
[
{ status="failure" | count(as=failures) },
range(@timestamp, as=timespan),
selectLast(status)
]
)
)
| failures >= 3
| status = "success"
Introduction
In this example, the partition()
function is used
with the groupBy()
function to detect all
occurrences of event A before event B (brute force attack).
The query will detect instances where there were 3 or more failed attempts followed by a successful attempt within the specified 10-second window.
Note that the partition()
function must be used
after an aggregator function to ensure event ordering. Also note that
the events must be sorted in order by timestamp to prevent errors when
running the query. It is possible to select any field to use as a
timestamp.
Example incoming data might look like this:
@timestamp | key | status |
---|---|---|
1451606300200 | c | failure |
1451606300400 | c | failure |
1451606300600 | c | failure |
1451606301000 | a | failure |
1451606302000 | a | failure |
1451606302200 | a | failure |
1451606302300 | a | failure |
1451606302400 | b | failure |
1451606302500 | a | failure |
1451606302600 | a | success |
1451606303200 | b | failure |
1451606303300 | c | success |
1451606303400 | b | failure |
1451606304500 | a | failure |
1451606304600 | a | failure |
1451606304700 | a | failure |
1451606304800 | a | success |
Step-by-Step
Starting with the source repository events.
- logscale
head()
Selects the oldest events ordered by time.
- logscale
| groupBy( key, function = partition( condition=test(status=="success"), split="after", [ { status="failure" | count(as=failures) }, range(@timestamp, as=timespan), selectLast(status) ] ) )
Groups the events by a specified key (for example, a user ID or IP address), filters for successful events (filters for events that meet the defined condition for the field status that must contain the value
success
), then splits the data after each successful event. Notice how the condition is provided as a non-aggregate subquery.Furthermore, it filters all the failed attempts where the field status contains the value
failure
.Makes a count of all the failed attempts, and returns the results in a field named failures, calculates the timespan of the failures, and selects the status of the last event. Calculating the timespan of the failure sequence, is useful for analysis.
- logscale
| failures >= 3
Filters for partitions with 3 or more failures.
- logscale
| status = "success"
Filters for partitions containing the value
success
in the status field. Event Result set.
Summary and Results
The query is used to detect all occurrences of potential brute force attack patterns. It looks for instances where there were 3 or more failed attempts (event A) followed by a successful attempt (event B), regardless of the time between failures. The timespan between each attempt is reported, which could be used to identify brute force attacks.
Sample output from the incoming example data:
key | failures | timespan | status |
---|---|---|---|
a | 5 | 1600 | success |
a | 3 | 300 | success |
c | 3 | 3100 | success |
Detect Event A Happening X Times Before Event B
Detect event A happening X times before event B (brute force
attack) using the partition()
function
combined with groupBy()
Query
head()
| groupBy(
key,
function = partition(
condition=test(status=="success"),
split="after",
[
{ status="failure" | count(as=failures) },
range(@timestamp, as=timespan),
max(@timestamp),
selectLast(status)
]
)
)
| failures >= 3
| status = "success"
Introduction
In this example, the partition()
function is used
with the groupBy()
function to detect event A
happening X times before event B (brute force attack).
The query will detect instances where there were 3 or more failed attempts followed by a successful attempt within the specified 10-second window.
Note that the partition()
function must be used
after an aggregator function to ensure event ordering. Also note that
the events must be sorted in order by timestamp to prevent errors when
running the query. It is possible to select any field to use as a
timestamp.
Example incoming data might look like this:
@timestamp | key | status |
---|---|---|
1451606300200 | c | failure |
1451606300400 | c | failure |
1451606300600 | c | failure |
1451606301000 | a | failure |
1451606302000 | a | failure |
1451606302200 | a | failure |
1451606302300 | a | failure |
1451606302400 | b | failure |
1451606302500 | a | failure |
1451606302600 | a | success |
1451606303200 | b | failure |
1451606303300 | c | success |
1451606303400 | b | failure |
1451606304500 | a | <no value> |
1451606304600 | c | failure |
1451606304700 | c | failure |
1451606304800 | c | failure |
Step-by-Step
Starting with the source repository events.
- logscale
head()
Selects the oldest events ordered by time.
- logscale
| groupBy( key, function = partition( condition=test(status=="success"), split="after", [ { status="failure" | count(as=failures) }, range(@timestamp, as=timespan), max(@timestamp), selectLast(status) ] ) )
Groups the events by a specified key (for example, a user ID or IP address), then splits the sequence of events after each successful event (where the condition
status=="success"
).For each partition, it counts the number of
failure
in status and stores it in the field failures, finds the range of timestamps in the partition, finds the newest timestamp, and finds the latest status to show if the partition ended with a success. - logscale
| failures >= 3
Filters for partitions that contained 3 or more failures.
- logscale
| status = "success"
Filters for partitions with the value
success
in the status field to ensure that the final status is a success. Event Result set.
Summary and Results
The query is used to detect instances where there are 3 or more failed attempts followed by a successful attempt. The query can be used to detect a brute force attack where an attacker tries multiple times before succeeding. Note that the effectiveness of this query depends on the nature of your data and the typical patterns in your system.
Sample output from the incoming example data:
key | failures | timespan | status |
---|---|---|---|
a | 5 | 1600 | success |
a | 3 | 300 | success |
c | 3 | 3100 | success |
Filter Out Fields With No Value
Filter out fields with no values from search results using the
groupBy
with sort()
Query
method=GET
groupBy(field=[method, statuscode], function=count(as=method_total))
sort([method, statuscode], order=asc)
FieldName!=""
Introduction
It is possible to filter out on fields with no values in a given returned search result. In this example, all statuscode fields containing no value is filtered out from the final search result.
Example incoming data might look like this:
method | statuscode | method_total |
---|---|---|
GET | <no value> | 10 |
GET | 200 | 32492 |
GET | 301 | 1 |
GET | 304 | 113 |
GET | 403 | 9 |
GET | 404 | 132 |
Step-by-Step
Starting with the source repository events.
- logscale
method=GET
Filters for all events with methods of the type
GET
. - logscale
groupBy(field=[method, statuscode], function=count(as=method_total))
Groups the returned results into a method field and a statuscode field and makes a count of the events in a new field named method_total.
- logscale
sort([method, statuscode], order=asc)
Sorts the returned results in ascending order.
- logscale
FieldName!=""
Excludes all events where one of the fields do not contain a value.
Event Result set.
Summary and Results
The query is used to filter out fields not containing any values from the returned search result.
Sample output from the incoming example data:
method | statuscode | method_total |
---|---|---|
GET | 200 | 32492 |
GET | 301 | 1 |
GET | 304 | 113 |
GET | 403 | 9 |
GET | 404 | 132 |
Find Overlapping User Sessions
Detect when user sessions overlap in time using the
neighbor()
function within
groupBy()
Query
groupBy(user.id, function=neighbor(include=[startTime, endTime], prefix=previous))
| groupBy(user.id, function=neighbor(direction=succeeding, include=[startTime, endTime], prefix=next))
| case {
test(startTime <= previous.endTime) | overlaps := "previous";
test(endTime >= next.startTime) | overlaps := "next";
}
| select([user.id, startTime, endTime, overlaps, previous.startTime, previous.endTime, next.startTime, next.endTime])
Introduction
In this example, the neighbor()
function is used to
identify overlapping user sessions by comparing session start and end
times with both previous and next sessions for each user. The function
is implemented as part of groupBy()
operations for
optimal performance, as this approach processes the data in a single
pass.
Example incoming data might look like this:
@timestamp | user.id | startTime | endTime |
---|---|---|---|
2025-08-06T10:00:00Z | user1 | 2025-08-06T10:00:00Z | 2025-08-06T10:30:00Z |
2025-08-06T10:15:00Z | user1 | 2025-08-06T10:15:00Z | 2025-08-06T10:45:00Z |
2025-08-06T10:45:00Z | user1 | 2025-08-06T10:45:00Z | 2025-08-06T11:15:00Z |
2025-08-06T10:00:00Z | user2 | 2025-08-06T10:00:00Z | 2025-08-06T10:20:00Z |
2025-08-06T10:30:00Z | user2 | 2025-08-06T10:30:00Z | 2025-08-06T10:50:00Z |
2025-08-06T11:00:00Z | user2 | 2025-08-06T11:00:00Z | 2025-08-06T11:20:00Z |
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(user.id, function=neighbor(include=[startTime, endTime], prefix=previous))
Groups events by user.id and uses
neighbor()
to access the previous session's startTime and endTime. Theprefix
parameter addsprevious.
to the field names in the output. Usingneighbor()
as part of thegroupBy()
function is significantly more efficient than using it separately, as it allows the operation to be performed during group creation, eliminating the need for additional filtering and transformation steps to manually find sessions for particular users. - logscale
| groupBy(user.id, function=neighbor(direction=succeeding, include=[startTime, endTime], prefix=next))
Groups events by user.id again and uses
neighbor()
to access the next session's startTime and endTime. Thedirection
parameter is set tosucceeding
to look at the next event, and the prefix addsnext.
to the field names. This second grouping operation maintains the performance benefits of integratingneighbor()
withingroupBy()
. - logscale
| case { test(startTime <= previous.endTime) | overlaps := "previous"; test(endTime >= next.startTime) | overlaps := "next"; }
Uses a case statement to evaluate two conditions and returns the results in a new field named overlaps. The field indicates whether a session overlaps with either its previous or next session:
First test:
test(startTime <= previous.endTime)
checks if the current session starts before or at the same time as when the previous session ends. If true, it assignsprevious
to the overlaps field.Second test:
test(endTime >= next.startTime)
checks if the current session ends after or at the same time as when the next session starts. If true, it assignsnext
to the overlaps field.
Note that if a session has no overlaps, the overlaps field will not be created for that event.
- logscale
| select([user.id, startTime, endTime, overlaps, previous.startTime, previous.endTime, next.startTime, next.endTime])
Explicitly selects and orders the output fields for clarity. This ensures consistent output formatting and removes any additional fields that might have been created during the processing steps.
Event Result set.
Summary and Results
The query is used to identify user sessions that overlap in time with either their previous or next sessions.
This query is useful, for example, to detect potential security issues where a user appears to be logged in from multiple locations simultaneously, or to identify problems with session management in applications.
Note
Note that this query is optimized for performance by incorporating the
neighbor()
function within the
groupBy()
operations. This approach is
significantly more efficient than applying
neighbor()
separately after grouping, as it
reduces the number of processed events by the
neighbor()
function and leverages
LogScale's internal optimizations for grouped operations.
Sample output from the incoming example data:
user.id | startTime | endTime | overlaps | previous.startTime | previous.endTime | next.startTime | next.endTime |
---|---|---|---|---|---|---|---|
user1 | 2025-08-06T10:00:00Z | 2025-08-06T10:30:00Z | next | <no value> | <no value> | 2025-08-06T10:15:00Z | 2025-08-06T10:45:00Z |
user1 | 2025-08-06T10:15:00Z | 2025-08-06T10:45:00Z | previous | 2025-08-06T10:00:00Z | 2025-08-06T10:30:00Z | 2025-08-06T10:45:00Z | 2025-08-06T11:15:00Z |
user1 | 2025-08-06T10:45:00Z | 2025-08-06T11:15:00Z | previous | 2025-08-06T10:15:00Z | 2025-08-06T10:45:00Z | <no value> | <no value> |
The output demonstrates the overlap detection: sessions are marked as
overlapping with either their previous session or their next session.
The overlaps field contains either
previous
or next
depending on
which neighboring session overlaps with the current session.
The results from this query would be well-suited for visualization in a time chart widget, showing the overlapping sessions across time.
Find Processes with Low Execution Count
Group processes by hash and name to identify rarely executed ones
using the groupBy()
function
Query
#event_simpleName=ProcessRollup2 OR #event_simpleName=SyntheticProcessRollup2
aid=?aid
groupBy([SHA256HashData, ImageFileName], limit=max)
_count < 5
sort(_count, limit=1000)
Introduction
In this example, the groupBy()
function is used to
identify processes that have been executed only a few times on a
specific host, which could be useful for detecting unusual or
potentially suspicious activity.
Example incoming data might look like this:
@timestamp | event_simpleName | aid | SHA256HashData | ImageFileName | CommandLine |
---|---|---|---|---|---|
2025-10-06T10:00:00Z | ProcessRollup2 | 12345abc | a1b2c3d4e5f6... | chrome.exe | C:\Program Files\Google\Chrome\Application\chrome.exe |
2025-10-06T10:05:00Z | ProcessRollup2 | 12345abc | a1b2c3d4e5f6... | chrome.exe | C:\Program Files\Google\Chrome\Application\chrome.exe |
2025-10-065T10:10:00Z | SyntheticProcessRollup2 | 12345abc | f6e5d4c3b2a1... | suspicious.exe | C:\Users\Admin\Downloads\suspicious.exe |
2025-10-06T10:15:00Z | ProcessRollup2 | 12345abc | 98765432dcba... | notepad.exe | C:\Windows\System32\notepad.exe |
2025-10-06T10:20:00Z | ProcessRollup2 | 12345abc | 98765432dcba... | notepad.exe | C:\Windows\System32\notepad.exe |
2025-10-06T10:25:00Z | ProcessRollup2 | 12345abc | 11223344aabb... | calc.exe | C:\Windows\System32\calc.exe |
Step-by-Step
Starting with the source repository events.
- logscale
#event_simpleName=ProcessRollup2 OR #event_simpleName=SyntheticProcessRollup2
Filters events to include only process execution events with event_simpleName equal to
ProcessRollup2
orSyntheticProcessRollup2
. - logscale
aid=?aid
Filters events for a specific host using the aid (agent ID) parameter.
- logscale
groupBy([SHA256HashData, ImageFileName], limit=max)
Groups events by both the SHA256HashData and ImageFileName fields. The
limit
parameter is set tomax
to ensure all groups are included.By default the
count()
function is used and the grouped count returned in a field named _count. - logscale
_count < 5
Filters the groups to show only those with fewer than 5 executions, using the built-in _count field that is automatically created by
groupBy()
. - logscale
sort(_count, limit=1000)
Sorts the results by execution count in ascending order, limiting the output to 1000 results.
Event Result set.
Summary and Results
The query is used to identify processes that have been executed infrequently on a specific host by grouping them based on their hash value and image name.
This query is useful, for example, to detect potentially suspicious or unusual processes that do not run often, which could indicate malicious activity or unauthorized software installations.
Sample output from the incoming example data:
SHA256HashData | ImageFileName | _count |
---|---|---|
f6e5d4c3b2a1... | suspicious.exe | 1 |
11223344aabb... | calc.exe | 1 |
98765432dcba... | notepad.exe | 2 |
The results are sorted by execution count, showing the least frequently executed processes first. Each row represents a unique combination of process hash and name, along with how many times it was executed.
Processes with the same name but different hashes are treated as separate entries, helping identify potentially malicious files masquerading as legitimate processes.
Group Events by Single Field
Basic grouping of events by status_code field
using the groupBy()
function
Query
groupBy(status_code)
Introduction
In this example, the groupBy()
is used to group
events by their status codes to analyze the distribution of different
response statuses.
Example incoming data might look like this:
@timestamp | status_code | endpoint | response_time |
---|---|---|---|
1686837825000 | 200 | /api/users | 145 |
1686837826000 | 404 | /api/products | 89 |
1686837827000 | 200 | /api/orders | 167 |
1686837828000 | 500 | /api/payment | 890 |
1686837829000 | 200 | /api/users | 156 |
1686837830000 | 404 | /api/items | 78 |
1686837831000 | 200 | /api/orders | 178 |
1686837832000 | 500 | /api/checkout | 923 |
1686837833000 | 200 | /api/products | 134 |
1686837834000 | 404 | /api/users | 92 |
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(status_code)
Groups events by unique values in the status_code field. When used without any aggregate functions,
groupBy()
automatically creates a field named _count showing the number of events for each unique value.It is the same as:
groupBy(status_code, function=count())
Event Result set.
Summary and Results
The query is used to analyze the distribution of status codes across all events.
The _count field is automatically added to show the number of events in each group.
This query is useful, for example, to monitor system health, identify error patterns, or track the frequency of different response types in a service.
For other examples with groupBy()
, see
groupBy()
Syntax Examples.
Sample output from the incoming example data:
status_code | _count |
---|---|
200 | 5 |
404 | 3 |
500 | 2 |
Group Events by Single Field Without Count
Basic grouping of events by status_code field
with explicit empty
function
parameter
using the groupBy()
function
Query
groupBy(status_code, function=[])
Introduction
In this example, the groupBy()
is used to group
events by their status codes without calculating count.
Example incoming data might look like this:
@timestamp | status_code | endpoint | response_time |
---|---|---|---|
1686837825000 | 200 | /api/users | 145 |
1686837826000 | 404 | /api/products | 89 |
1686837827000 | 200 | /api/orders | 167 |
1686837828000 | 500 | /api/payment | 890 |
1686837829000 | 200 | /api/users | 156 |
1686837830000 | 404 | /api/items | 78 |
1686837831000 | 200 | /api/orders | 178 |
1686837832000 | 500 | /api/checkout | 923 |
1686837833000 | 200 | /api/products | 134 |
1686837834000 | 404 | /api/users | 92 |
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(status_code, function=[])
Groups events by unique values in the status_code field. The empty function array (
function
=[]
) prevents automatic counting.This approach helps conserve memory while identifying the unique status codes in the events.
Event Result set.
Summary and Results
The query is used to identify unique field values (in this case different status codes) while minimizing memory usage.
This query is useful, for example, to quickly discover unique values in large event sets and support initial data exploration before detailed analysis.
For other examples with groupBy()
, see
groupBy()
Syntax Examples.
Sample output from the incoming example data:
status_code |
---|
200 |
404 |
500 |
Group First Events by Log Level
Limit and group events using head()
and
groupBy()
functions
Query
head(limit=10)
groupBy(loglevel)
Introduction
In this example, the head()
function is used to
limit the result set to 100 events, which are then grouped by their log
level using the groupBy()
function.
Example incoming data might look like this:
@timestamp | loglevel | service | message | status_code |
---|---|---|---|---|
2025-09-01T10:00:00Z | ERROR | authentication | Failed login attempt | 401 |
2025-09-01T10:00:05Z | INFO | authentication | Successful login | 200 |
2025-09-01T10:00:10Z | ERROR | database | Connection timeout | 503 |
2025-09-01T10:00:15Z | WARN | api | Rate limit approaching | 429 |
2025-09-01T10:00:20Z | ERROR | authentication | Invalid token | 401 |
2025-09-01T10:00:25Z | INFO | api | Request processed | 200 |
2025-09-01T10:00:30Z | DEBUG | database | Query executed | 200 |
2025-09-01T10:00:35Z | ERROR | api | Internal error | 500 |
2025-09-01T10:00:40Z | INFO | authentication | User logout | 200 |
2025-09-01T10:00:45Z | WARN | database | High CPU usage | 200 |
2025-09-01T10:00:50Z | DEBUG | api | Cache hit | 200 |
2025-09-01T10:00:55Z | ERROR | authentication | Session expired | 401 |
Step-by-Step
Starting with the source repository events.
- logscale
head(limit=10)
Returns the first 10 events from the dataset. The
limit
parameter explicitly specifies the number of events to return. The events are returned in the order they were received, starting from the oldest event in the time range. - logscale
groupBy(loglevel)
Groups the events by the values in the loglevel field. The
groupBy()
function creates buckets for each unique value and counts the number of events in each bucket. By default, it creates a field named _count containing the number of events in each group. Event Result set.
Summary and Results
The query is used to analyze the distribution of log levels across the
first 10 events in the dataset. If head(limit=100)
it
would have returned 100 events.
This query is useful, for example, to quickly assess the proportion of different log levels in a sample of events or to identify if there is an unusual distribution of log severities.
Sample output from the incoming example data:
loglevel | _count |
---|---|
ERROR | 5 |
INFO | 3 |
WARN | 2 |
DEBUG | 2 |
Note that the output shows the count of events for each log level found within the first 10 events, providing a quick overview of the log level distribution in the sample.
Group HTTP Methods and Count Status Codes
Analyze HTTP traffic patterns using nested
groupBy()
function
Query
groupBy(method, function=[count(as=method_total),
groupBy(statuscode, function=count(as=method_status_count))])
Introduction
In this example, the groupBy()
function is used to
analyze HTTP traffic patterns by grouping requests first by HTTP method
and then by status code, providing counts at both levels.
Example incoming data might look like this:
@timestamp | method | statuscode | path | bytes |
---|---|---|---|---|
2025-08-06T10:00:00Z | GET | 200 | /index.html | 1024 |
2025-08-06T10:00:01Z | POST | 201 | /api/users | 512 |
2025-08-06T10:00:02Z | GET | 404 | /missing.html | 256 |
2025-08-06T10:00:03Z | GET | 200 | /about.html | 768 |
2025-08-06T10:00:04Z | POST | 400 | /api/users | 128 |
2025-08-06T10:00:05Z | PUT | 200 | /api/users/1 | 896 |
2025-08-06T10:00:06Z | GET | 200 | /contact.html | 645 |
2025-08-06T10:00:07Z | POST | 201 | /api/orders | 789 |
2025-08-06T10:00:08Z | GET | 404 | /old-page.html | 234 |
2025-08-06T10:00:09Z | DELETE | 204 | /api/users/2 | 0 |
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(method, function=[count(as=method_total), groupBy(statuscode, function=count(as=method_status_count))])
Groups events first by the method field and performs two functions:
Event Result set.
Summary and Results
The query is used to analyze HTTP traffic patterns by providing a hierarchical view of request methods and their associated status codes.
This query is useful, for example, to identify patterns in API usage, detect potential issues with specific HTTP methods, or monitor the distribution of success and error responses across different request types.
Sample output from the incoming example data:
method | method_total | statuscode | method_status_count |
---|---|---|---|
GET | 5 | 200 | 3 |
GET | 5 | 404 | 2 |
POST | 3 | 201 | 2 |
POST | 3 | 400 | 1 |
PUT | 1 | 200 | 1 |
DELETE | 1 | 204 | 1 |
Note that the output shows both the total count per method (method_total) and the breakdown of status codes (method_status_count) within each method, providing a comprehensive view of the HTTP traffic distribution.
This data would be effectively visualized using a Sankey diagram widget to show the flow from HTTP methods to status codes, or a nested pie chart to display the distribution.
Group HTTP Methods and Status Codes Using Nested groupBy()
Analyze HTTP traffic patterns by method and status code using the
groupBy()
function
Query
groupBy(method, function=[count(as=method_total), groupBy(statuscode, function=count(as=method_status_count))])
Introduction
In this example, the groupBy()
function is used to
analyze HTTP traffic patterns by grouping requests first by HTTP method
and then by status code within each method.
Example incoming data might look like this:
@timestamp | method | statuscode | path | bytes |
---|---|---|---|---|
2025-08-06T10:00:00Z | GET | 200 | /index.html | 1024 |
2025-08-06T10:00:01Z | POST | 201 | /api/users | 512 |
2025-08-06T10:00:02Z | GET | 404 | /missing.html | 256 |
2025-08-06T10:00:03Z | GET | 200 | /about.html | 768 |
2025-08-06T10:00:04Z | POST | 400 | /api/users | 128 |
2025-08-06T10:00:05Z | PUT | 200 | /api/users/1 | 384 |
2025-08-06T10:00:06Z | GET | 200 | /contact.html | 896 |
2025-08-06T10:00:07Z | DELETE | 204 | /api/users/2 | 0 |
2025-08-06T10:00:08Z | GET | 500 | /error.html | 1024 |
2025-08-06T10:00:09Z | POST | 201 | /api/orders | 756 |
Step-by-Step
Starting with the source repository events.
- logscale
groupBy(method, function=[count(as=method_total), groupBy(statuscode, function=count(as=method_status_count))])
Groups events by the method field and performs two aggregations:
Event Result set.
Summary and Results
The query is used to analyze HTTP traffic patterns by providing a hierarchical view of request methods and their associated status codes.
This query is useful, for example, to identify patterns in API usage, detect potential issues with specific HTTP methods, or monitor the distribution of success and error responses across different request types.
Sample output from the incoming example data:
method | method_total | statuscode | method_status_count |
---|---|---|---|
GET | 5 | 200 | 3 |
GET | 5 | 404 | 1 |
GET | 5 | 500 | 1 |
POST | 3 | 201 | 2 |
POST | 3 | 400 | 1 |
PUT | 1 | 200 | 1 |
DELETE | 1 | 204 | 1 |
Note that the output shows the total count for each HTTP method in method_total and a breakdown of status codes and their counts within each method in method_status_count.
This data is well-suited for visualization using a Sankey diagram widget, which can effectively show the flow from HTTP methods to status codes.
List All EC2 Hosts With FirstSeen Data Within 14 Days
List all the EC2 hosts with FirstSeen data within 14 days using
the groupBy()
function with
selectLast()
Query
#repo=sensor_metadata #data_source_name=aidmaster cloud.provider = "AWS_EC2_V2"
| groupBy([aid], function=(selectLast([event_platform, aid, ComputerName, AgentVersion, FirstSeen])), limit=max)
| FirstSeen := formatTime("%FT%T%z", field=FirstSeen)
| TimeDelta := now() - duration("14d")
Introduction
In this example, the groupBy()
function is used
with selectLast()
to retrieve the latest
information about AWS EC2
instances running
CrowdStrike sensors, showing their platform, hostname, agent version,
and when they were first seen, with a 14-day reference point for age
comparison.
Example incoming data (raw data in sensor_metadata) might look like this:
@timestamp | aid | cloud.provider | event_platform | ComputerName | AgentVersion | FirstSeen |
---|---|---|---|---|---|---|
2025-05-20T10:00:00Z | 1234abcd | AWS_EC2_V2 | Windows | ec2-web-01 | 6.45.15678 | 2025-01-15T08:30:00Z |
2025-05-21T11:00:00Z | 1234abcd | AWS_EC2_V2 | Windows | ec2-web-01 | 6.45.15679 | 2025-01-15T08:30:00Z |
2025-05-22T12:00:00Z | 5678efgh | AWS_EC2_V2 | Linux | ec2-app-02 | 6.45.15678 | 2025-02-01T14:45:00Z |
2025-05-23T13:00:00Z | 5678efgh | AWS_EC2_V2 | Linux | ec2-app-02 | 6.45.15679 | 2025-02-01T14:45:00Z |
2025-05-24T14:00:00Z | 90123ijk | AWS_EC2_V2 | Windows | ec2-db-03 | 6.45.15678 | 2025-03-10T09:15:00Z |
2025-05-25T15:00:00Z | 90123ijk | AWS_EC2_V2 | Windows | ec2-db-03 | 6.45.15679 | 2025-03-10T09:15:00Z |
Step-by-Step
Starting with the source repository events.
- logscale
#repo=sensor_metadata #data_source_name=aidmaster cloud.provider = "AWS_EC2_V2"
Searches in the sensor_metadata repository, and filters for #data_source_name fields containing the value
aidmaster
, looking for cloud.provider of the typeAWS_EC2_V2
only. - logscale
| groupBy([aid], function=(selectLast([event_platform, aid, ComputerName, AgentVersion, FirstSeen])), limit=max)
Groups results by the field aid (Agent ID). Then, for each unique group, selects the most recent values for the fields: event_platform, aid, ComputerName, AgentVersion, FirstSeen.
Using the
selectLast()
within thegroupBy()
is what actually selects the most recent record for each group. - logscale
| FirstSeen := formatTime("%FT%T%z", field=FirstSeen)
Formats the timestamp in the FirstSeen field into ISO 8601 format. The result is stored back in the FirstSeen field.
- logscale
| TimeDelta := now() - duration("14d")
Calculates timestamp from 14 days ago, and returns the results into a new field named TimeDelta. The calculation is done by subtracting a 14-day duration from the current time using
duration()
.This new TimeDelta field that represents a timestamp from 14 days ago, can be used for filtering or comparing against the FirstSeen timestamps.
Event Result set.
Summary and Results
The query is used to retrieve the latest information about AWS EC2 instances running CrowdStrike sensors, showing their platform, hostname, agent version, and when they were first seen, with a 14-day reference point for age comparison.
The query is useful, for example, for auditing EC2 instance coverage, identifying newly added EC2 instances within the last two weeks, monitoring sensor versions or identifying aging or outdated installations.
Sample output from the incoming example data:
aid | event_platform | ComputerName | AgentVersion | FirstSeen | TimeDelta |
---|---|---|---|---|---|
1234abcd | Windows | ec2-web-01 | 6.45.15679 | 2025-01-15T08:30:00+0000 | 2025-05-12T13:06:56+0000 |
5678efgh | Linux | ec2-app-02 | 6.45.15679 | 2025-02-01T14:45:00+0000 | 2025-05-12T13:06:56+0000 |
90123ijk | Windows | ec2-db-03 | 6.45.15679 | 2025-03-10T09:15:00+0000 | 2025-05-12T13:06:56+0000 |
Each aid appears only once with its most recent values. Note that TimeDelta value is based on the current date provided (Mon, 26 May 2025 13:06:56 GMT).