Aggregate Query Functions
LogScale's aggregate query functions provide grouping and/or aggregation of event data.
Table: Aggregate Query Functions
Function | Default Argument | Availability | Description |
---|---|---|---|
array:intersection(array, [as]) | array | Determines the set intersection of array values over input events. | |
array:reduceAll(array, function, var) | array | Computes a value from all events and array elements of the specified array. | |
array:reduceColumn(array, [as], function, var) | array | Computes an aggregate value for each array element with the same index. | |
array:union(array, [as]) | array | Determines the set union of array values over input events. | |
avg([as], field) | field | Calculates the average for a field of a set of events. | |
bucket([buckets], [field], [function], [limit], [minSpan], [span], [timezone], [unit]) | span |
Extends the groupBy() function for grouping
by time.
| |
callFunction([as], field, function) | function | Calls the named function on a field over a set of events. | |
collect(fields, [limit], [multival], [separator]) | fields | Collects fields from multiple events into one event. | |
count([as], [distinct], [field]) | field | Counts given events. | |
counterAsRate([as], field) | field | Calculates the rate for a counter field. | |
fieldstats([limit]) | Retrieves stats about fields. | ||
groupBy(field, [function], [limit]) | field | Groups events by specified fields and executes aggregate functions on each group. | |
head([limit]) | limit | Returns the oldest events from an event stream. | |
linReg([prefix], x, y) | Computes linear relationship model between two variables using least-squares fitting. | ||
max([as], field, [type]) | field | Finds the largest number for the specified field over a set of events. | |
min([as], field, [type]) | field | Finds the smallest number for the specified field over a set of events. | |
percentile([accuracy], [as], field, [percentiles]) | field | Finds one event with a field for each percentile specified. | |
range([as], field) | field | Finds numeric range between smallest and largest numbers for field over a set of events. | |
rdns([as], field, [server]) | field | Events using RDNS lookup. | |
sankey(source, target, [weight]) | Produces data compatible with Sankey widget. | ||
selectFromMax(field, include) | field | Selects event with the largest value for the specified field. | |
selectFromMin(field, include) | field | Selects event with the smallest value for the specified field. | |
selectLast(fields) | fields | Specify fields to select from events, keeping value of most recent event for each field. | |
series(collect, [endmatch], [maxduration], [maxpause], [memlimit], [separator], [startmatch]) | collect | Collects a series of values for selected fields from multiple events into one or more events. | |
session([function], [maxpause]) | function | Collects events into sessions, and aggregates them. | |
sort([field], [limit], [order], [reverse], [type]) | field | Sorts events by their fields. | |
stats([function]) | function | Used to compute multiple aggregate functions over the input. | |
stdDev([as], field) | field | Calculates the standard deviation for a field over a set of events. | |
sum([as], field) | field | Calculates the sum for a field over a set of events. | |
table(fields, [limit], [order], [reverse], [sortby], [type]) | fields | Used to create a widget to present the data in a table. | |
tail([limit]) | limit | Returns the newest events from an event stream. | |
timeChart([buckets], [function], [limit], [minSpan], [series], [span], [timezone], [unit]) | series | Used to draw a linechart where the x-axis is time. | |
top([as], [error], field, [limit], [max], [percent], [rest], [sum]) | field | Finds the top results based on a given field. | |
transpose([column], [header], [limit], [pivot]) | pivot | Transposes a query results set by creating an event for each attribute. | |
window([buckets], [function], [span]) | function | Computes aggregate functions over a sliding window of data. | |
worldMap([ip], [lat], [lon], [magnitude], [precision]) | Used to produce data compatible with the World Map widget. |
The functions listed in the Aggregate (for testing) Query Functions table are supported only for testing queries and parsers.
Table: Aggregate (for testing) Query Functions
Function | Default Argument | Availability | Description |
---|---|---|---|
createEvents(rawstring) | rawstring | Generates temporary events as part of the query. |
A query becomes an aggregation query
if it uses at least one aggregate function like
sum()
, count()
or
avg()
.
For example, the query count()
takes a stream of
events as its input, and produces a single record containing a
_count field.
Below are some examples:
loglevel = ERROR
| timechart()
x := y * 2
| bucket(function=sum(x))
Using Aggregate Query Functions
Aggregate query functions are used to summarize event data during a query. Aggregation simplifies an event set and provides both a shorter set of events and reduces the processing load when querying the event set. For example, when searching for specific events, the events may be summarized by their type or class. By reducing the event data later, parts of the query can be processed more efficiently.
Aggregation is also used when examining trends and finding patterns and creating sets of data suitable for displaying through a widget such as a bar chart or sankey diagram. For some functions and queries, an aggregate function must be used to summarize the data before it is processed.
Aggregate functions combine multiple event values into a summarized
single value and/or grouped values, reducing many data points into key
metrics that can be used for further manipulation, analyzation and
visualization. For example, the groupBy()
function
summarizes data by one or more fields, and then provides a summary value
for that group, such as count, sum, average, minimum, or maximum for all
rows in a column or table as defined by the query.
Functions Requiring Prior Aggregation
Some functions (for example visualization functions or analysis functions) need summarized or grouped data to create meaningful widgets or to simplify the processing of the incoming data. Processing millions of individual events would create a significant load on the system for relatively small values. For more efficient querying, aggregation creates a smaller set of data on which to perform more intensive processing.
As an example, you must group the data first using aggregate functions
like groupBy()
or bucket()
if
you are trying to:
Collect events into blocks of time; for example counting the occurrences per minute or hour either in live or static queries.
Show summarized value data (for example: counts, sums, averages).
Create charts through the widgets or other visualizations.
Sort events or matches by count or incidence.
Find the top N values or bottom N values of a list.
Examples of functions used for visualization could be
timeChart()
and sankey()
.
Examples of functions used for analysis could be
percentage()
, top()
and
sort()
.
Sequence functions must also come after an aggregator function (for
example, bucket()
,
head()
, sort()
) to provide a
meaningful order to the events. Order matters in query construction.
In some cases, if a query is defined so that it requires an aggregate data set, a warning will be generated by LogScale to indicate the aggregate query requirement. For example:
The partition() function must come after an aggregator. E.g. groupBy(), bucket(), sort(), tail(), or head().
Aggregates as Arguments
For some functions used during a query, the argument used must be an
aggregate function. For example, the partition()
function splits a sequence of events into multiple sequences, using an
aggregate function to identify each partition.
Using aggregate functions as arguments enables dynamic calculations and data-driven analysis based on your data patterns. The aggregate functions calculate a value such as count, sum, average, minimum, or maximum for all rows in a column or table as defined by the query.
In some cases, multiple aggregates can be used together and also be nested for advanced calculations.
As an example, with groupBy()
you can specify
multiple aggregate functions so that the grouped event set contains
minimum, maximum and average values for each group.