FAQ: Understanding the Query State Size

The query state size, also known as state size, or query state, quantifies the amount of memory used by a query during execution.

Queries contains mainly three types of operations; filters, mutators, and aggregators. Where filters and mutators mainly work on a single event at a time, aggregators collect the result of several events and contribute the most to the overall memory consumption. For an aggregation, the state size contains a list of the events in the overall query in that part of the query chain.

The size of the query state depends on the number of events and the type of operation:

  • With groupBy() LogScale uses more memory because the function collects all possible values of a field or set of fields. The overall query state size is dependent on the function, the algorithm used and the number of events in each group within with the query.

  • With top(), to find the most accessed URLs in webserver logs. Performing this calculation would require keeping all the different URLs in the search state and count the number of occurrences of each. The more unique URLs, the larger the state.

LogScale uses compression and other techniques to keep this value to a minimum, but it is difficult to predict the state size in advance. Approximation algorithms are used to provide numbers and counts when computing the exact value would be too computationally expensive.

The effect of the query state size is that for some queries and event collections, the amount of memory required can be considerable. This is one of the reasons why the default query size is limited to 200 events; the limit helps to reduce the overall state size.

When creating searches that hits the limits on state sizes LogScale will warn the user. For, example groupBy() on a high cardinality field resulting in millions of groups.

The state size is also related to the query cost, which is calculated by combining the memory used by the query and the CPU time required to generate it.