Best Practice for Query Writing
In order to write efficient queries, queries that can run fast and are less expensive in terms of CPU and memory usage, it is important to follow an ordered flow through the query pipeline, starting from the repository.
When possible, the ideal order when writing a complex query would be:
Set one or more tag filters to narrow the search only to certain events
Filter the fields (for example, with
*
,""
,!=
)Add any other filter that may add, remove or modify fields (using transformation functions such as
format()
,eval()
, or any function with anas
parameter)Aggregate the filtered results using the aggregate functions (such as
sum()
,top()
, orgroupBy()
)Post process the aggregate result (using for example
table()
orformat()
functions)
The syntax looks like this:
tag-filters
| field filters
| transformations
| aggregate function
| post processing of aggregate result
For example, this query:
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| formattime("%F %T.%Q", as=Timestamp, field=@timestamp, timezone=UTC)
| table([fStart, Timestamp, fEnd])
might be very expensive in terms of resources since it is formatting timestamps for all events hit in the time span searched. By specifying a limit of 200 events before the transformation of the output, the same query will run much faster as it displays 200 events, thus performing less format operations — like this:
| table([@timestamp],limit=200) // selects the last 200 timestamps by timestamp
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| format("%1$TF %1$TT.%1$TQ", field=[@timestamp], as=Timestamp), timezone=UTC)
| table([fStart, Timestamp, fEnd])
Note
The maximum length of a query string is 66,000 characters. Refer to Limits for other query or general limits in LogScale.
To summarize, the following generic rules apply to obtain efficient queries.
Reduce the dataset with filters, so to have results limited to a smaller pool of information.
Always aggregate.
Postpone transformations after the aggregator whenever possible.
Tip
If you're in doubt about which of two possible forms of a query is less expensive, you can try running them both on the same data set, and check LogScale's measure of their cost. You can find LogScale's approximate idea of how expensive a query was to run reported in the UI as Work below the query result.