Best Practice for Query Writing
In order to write efficient queries, queries that can run fast and are less expensive in terms of CPU and memory usage, it is important to follow an ordered flow through the query pipeline, starting from the repository.
When possible, the ideal order when writing a complex query would be:
Set one or more tag filters to narrow the search only to certain events
Filter the fields (for example, with
Add any other filter that may add, remove or modify fields (using transformation functions such as
eval, or any function with an
Aggregate the filtered results using the aggregate functions (such as
Post process the aggregate result (using for example
The syntax looks like this:
tag-filters | field filters | transformations | aggregate function | post processing of aggregate result
For example, this query:
| eval(fStart = @timestamp - 600) | eval(fEnd = @timestamp + 600) | formattime("%F %T.%Q", as=Timestamp, field=@timestamp, timezone=UTC) | table([fStart, Timestamp, fEnd])
might be very expensive in terms of resources since it is formatting timestamps for all events hit in the time span searched. By specifying a limit of 200 events before the transformation of the output, the same query will run much faster as it displays 200 events, thus performing less format operations — like this:
| table([@timestamp],limit=200) // selects the last 200 timestamps by timestamp | eval(fStart = @timestamp - 600) | eval(fEnd = @timestamp + 600) | format("%1$TF %1$TT.%1$TQ", field=[@timestamp], as=Timestamp), timezone=UTC) | table([fStart, Timestamp, fEnd])
The maximum length of a query string is 66,000 characters. Refer to Limits for other query or general limits in Humio.
To summarize, the following generic rules apply to obtain efficient queries.
Reduce the dataset with filters, so to have results limited to a smaller pool of information.
Postpone transformations after the aggregator whenever possible.
If you're in doubt about which of two possible forms of a query is less expensive, you can try running them both on the same data set, and check Humio's measure of their cost. You can find Humio's approximate idea of how expensive a query was to run reported in the UI as Work below the query result.