Writing Better Queries

Specificity matters when writing LogScale queries. The more specific you can be when writing a query, the fewer results you will have to sort through and the faster the query will run. Writing queries that run efficiently and reduce the cost of CPU and memory usage must follow an ordered flow through the query pipeline, starting with the repository.

Find a reduced set of events by filtering first, and then only perform the formatting and modification of the found data as the formatting has to be performed on every event, which can be a CPU intensive process.

When Writing Queries it is best to follow the following process in order:

  1. Narrow down the search timeframe as much as possible

  2. Narrow down the result set starting with tagged fields (those starting with #)

  3. Continue to filter the dataset with remaining Field Values that exist

  4. After you have filtered what you want to see, filter what you don't want to see

  5. Utilize any regex needed to filter down the data further, or for unparsed fields

  6. Transform your data how you would like with functions like Math, Evaluation, format, etc

  7. Aggregate your data utilizing any aggregate functions, such as a sum(), top(), or groupBy()

  8. Perform any final visualization processing such as sorting or table functions.

Note

In some situations, it may be necessary to perform steps in a different order. For example, transforming data after it has been aggregated, or transforming and visualizing in the same step.

When optimizing queries, you want to lower the total number of hits and events returned, then focus on lowering the total work cost of the query, and then handle the formatting and output. Monitoring the work and performance and making small changes as you optimized the query will help to get the best overall performance.

This process in code format looks like this:

logscale
tag-filters
| field filters
| transformations
| aggregate function
| post processing of aggregate result

Remember that the order of operations for queries is important. Let's look at an example of a resource-costly query:

logscale
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| formattime("%F %T.%Q", as=Timestamp, field=@timestamp, timezone=UTC)
| table([fStart, Timestamp, fEnd])

This query is particularly costly because it formats timestamps for all events in the defined timespan.

Specifying a limit of 200 events prior to output transformation will make the query run much faster because it displays only 200 events, thus performing fewer format operations:

logscale
| table([@timestamp],limit=200) // selects the last 200 timestamps by timestamp
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| format("%1$TF %1$TT.%1$TQ", field=[@timestamp], as=Timestamp), timezone=UTC)
| table([fStart, Timestamp, fEnd])

To check which query format is less expensive, run them both using the same data set. Then check LogScale's measure of their cost by looking at the Work report, found below the query result. The lower the number, the better the result!