Aggregate Query Functions

LogScale's aggregate query functions provide grouping and/or aggregation of event data.

Table: Aggregate Query Functions

FunctionDefault ArgumentAvailabilityDescription
array:intersection(array, [as])array  Determines the set intersection of array values over input events.
array:reduceAll(array, function, var)array  Computes a value from all events and array elements of the specified array.
array:reduceColumn(array, [as], function, var)array  Computes an aggregate value for each array element with the same index.
array:union(array, [as])array  Determines the set union of array values over input events.
avg([as], field)field  Calculates the average for a field of a set of events.
bucket([buckets], [field], [function], [limit], [minSpan], [span], [timezone], [unit])span  Extends the groupBy() function for grouping by time.
callFunction([as], field, function)function  Calls the named function on a field over a set of events.
collect(fields, [limit], [multival], [separator])fields  Collects fields from multiple events into one event.
count([as], [distinct], [field])field  Counts given events.
counterAsRate([as], field)field  Calculates the rate for a counter field.
fieldstats([limit])   Retrieves stats about fields.
groupBy(field, [function], [limit])field  Groups events by specified fields and executes aggregate functions on each group.
head([limit])limit  Returns the oldest events from an event stream.
linReg([prefix], x, y)   Computes linear relationship model between two variables using least-squares fitting.
max([as], field, [type])field  Finds the largest number for the specified field over a set of events.
min([as], field, [type])field  Finds the smallest number for the specified field over a set of events.
percentile([accuracy], [as], field, [percentiles])field  Finds one event with a field for each percentile specified.
range([as], field)field  Finds numeric range between smallest and largest numbers for field over a set of events.
rdns([as], field, [server])field  Events using RDNS lookup.
sankey(source, target, [weight])   Produces data compatible with Sankey widget.
selectFromMax(field, include)field  Selects event with the largest value for the specified field.
selectFromMin(field, include)field  Selects event with the smallest value for the specified field.
selectLast(fields)fields  Specify fields to select from events, keeping value of most recent event for each field.
series(collect, [endmatch], [maxduration], [maxpause], [memlimit], [separator], [startmatch])collect  Collects a series of values for selected fields from multiple events into one or more events.
session([function], [maxpause])function  Collects events into sessions, and aggregates them.
sort([field], [limit], [order], [reverse], [type])field  Sorts events by their fields.
stats([function])function  Used to compute multiple aggregate functions over the input.
stdDev([as], field)field  Calculates the standard deviation for a field over a set of events.
sum([as], field)field  Calculates the sum for a field over a set of events.
table(fields, [limit], [order], [reverse], [sortby], [type])fields  Used to create a widget to present the data in a table.
tail([limit])limit  Returns the newest events from an event stream.
timeChart([buckets], [function], [limit], [minSpan], [series], [span], [timezone], [unit])series  Used to draw a linechart where the x-axis is time.
top([as], [error], field, [limit], [max], [percent], [rest], [sum])field  Finds the top results based on a given field.
transpose([column], [header], [limit], [pivot])pivot  Transposes a query results set by creating an event for each attribute.
window([buckets], [function], [span])function  Computes aggregate functions over a sliding window of data.
worldMap([ip], [lat], [lon], [magnitude], [precision])   Used to produce data compatible with the World Map widget.

The functions listed in the Aggregate (for testing) Query Functions table are supported only for testing queries and parsers.

Table: Aggregate (for testing) Query Functions

FunctionDefault ArgumentAvailabilityDescription
createEvents(rawstring)rawstring  Generates temporary events as part of the query.

A query becomes an aggregation query if it uses at least one aggregate function like sum(), count() or avg().

For example, the query count() takes a stream of events as its input, and produces a single record containing a _count field.

Below are some examples:

logscale
loglevel = ERROR
| timechart()
logscale
x := y * 2
| bucket(function=sum(x))

Using Aggregate Query Functions

Aggregate query functions are used to summarize event data during a query. Aggregation simplifies an event set and provides both a shorter set of events and reduces the processing load when querying the event set. For example, when searching for specific events, the events may be summarized by their type or class. By reducing the event data later, parts of the query can be processed more efficiently.

Aggregation is also used when examining trends and finding patterns and creating sets of data suitable for displaying through a widget such as a bar chart or sankey diagram. For some functions and queries, an aggregate function must be used to summarize the data before it is processed.

Aggregate functions combine multiple event values into a summarized single value and/or grouped values, reducing many data points into key metrics that can be used for further manipulation, analyzation and visualization. For example, the groupBy() function summarizes data by one or more fields, and then provides a summary value for that group, such as count, sum, average, minimum, or maximum for all rows in a column or table as defined by the query.

Functions Requiring Prior Aggregation

Some functions (for example visualization functions or analysis functions) need summarized or grouped data to create meaningful widgets or to simplify the processing of the incoming data. Processing millions of individual events would create a significant load on the system for relatively small values. For more efficient querying, aggregation creates a smaller set of data on which to perform more intensive processing.

As an example, you must group the data first using aggregate functions like groupBy() or bucket() if you are trying to:

  • Collect events into blocks of time; for example counting the occurrences per minute or hour either in live or static queries.

  • Show summarized value data (for example: counts, sums, averages).

  • Create charts through the widgets or other visualizations.

  • Sort events or matches by count or incidence.

  • Find the top N values or bottom N values of a list.

Examples of functions used for visualization could be timeChart() and sankey().

Examples of functions used for analysis could be percentage(), top() and sort().

Sequence functions must also come after an aggregator function (for example, bucket(), head(), sort()) to provide a meaningful order to the events. Order matters in query construction.

In some cases, if a query is defined so that it requires an aggregate data set, a warning will be generated by LogScale to indicate the aggregate query requirement. For example:

The partition() function must come after an aggregator. E.g. groupBy(), bucket(), sort(), tail(), or head().

Aggregates as Arguments

For some functions used during a query, the argument used must be an aggregate function. For example, the partition() function splits a sequence of events into multiple sequences, using an aggregate function to identify each partition.

Using aggregate functions as arguments enables dynamic calculations and data-driven analysis based on your data patterns. The aggregate functions calculate a value such as count, sum, average, minimum, or maximum for all rows in a column or table as defined by the query.

In some cases, multiple aggregates can be used together and also be nested for advanced calculations.

As an example, with groupBy() you can specify multiple aggregate functions so that the grouped event set contains minimum, maximum and average values for each group.