Best Practice for Query Writing

In order to write efficient queries, queries that can run fast and are less expensive in terms of CPU and memory usage, it is important to follow an ordered flow through the query pipeline, starting from the repository.

When possible, the ideal order when writing a complex query would be:

  1. Set one or more tag filters to narrow the search only to certain events

  2. Filter the fields (for example, with *, "", !=)

  3. Add any other filter that may add, remove or modify fields (using transformation functions such as format(), eval(), or any function with an as parameter)

  4. Aggregate the filtered results using the aggregate functions (such as sum(), top(), or groupBy())

  5. Post process the aggregate result (using for example table() or format() functions)

The syntax looks like this:

humio
tag-filters
| field filters
| transformations
| aggregate function
| post processing of aggregate result

For example, this query:

humio
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| formattime("%F %T.%Q", as=Timestamp, field=@timestamp, timezone=UTC)
| table([fStart, Timestamp, fEnd])

might be very expensive in terms of resources since it is formatting timestamps for all events hit in the time span searched. By specifying a limit of 200 events before the transformation of the output, the same query will run much faster as it displays 200 events, thus performing less format operations — like this:

humio
| table([@timestamp],limit=200) // selects the last 200 timestamps by timestamp
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| format("%1$TF %1$TT.%1$TQ", field=[@timestamp], as=Timestamp), timezone=UTC)
| table([fStart, Timestamp, fEnd])

Note

The maximum length of a query string is 66,000 characters. Refer to Limits for other query or general limits in Humio.

To summarize, the following generic rules apply to obtain efficient queries.

  • Reduce the dataset with filters, so to have results limited to a smaller pool of information.

  • Always aggregate.

  • Postpone transformations after the aggregator whenever possible.

Tip

If you're in doubt about which of two possible forms of a query is less expensive, you can try running them both on the same data set, and check Humio's measure of their cost. You can find Humio's approximate idea of how expensive a query was to run reported in the UI as Work below the query result.