Query Writing Best Practices

Specificity matters when writing LogScale queries. The more specific you can be when writing a query, the fewer results you will have to sort through and the faster the query will run. Writing queries that run efficiently and reduce the cost of CPU and memory usage must follow an ordered flow through the query pipeline, starting with the repository.

Find a reduced set of events by filtering first, and then only perform the formatting and modification of the found data as the formatting has to be performed on every event, which can be a CPU intensive process.

When Writing Queries it is best to follow the following process in order:

  1. Narrow down the search timeframe as much as possible

  2. Narrow down the result set starting with tagged fields (those starting with #)

  3. Continue to filter the dataset with remaining Field Values that exist

  4. After you have filtered what you want to see, filter what you don't want to see

  5. Utilize any regex needed to filter down the data further, or for unparsed fields

  6. Transform your data how you would like with functions like Math, Evaluation, format, etc

  7. Aggregate your data utilizing any aggregate functions, such as a sum(), top(), or groupBy()

  8. Perform any final visualization processing such as sorting or table functions.

Note

In some situations, it may be necessary to perform steps in a different order. For example, transforming data after it has been aggregated, or transforming and visualizing in the same step.

When optimizing queries, you want to lower the total number of hits and events returned, then focus on lowering the total work cost of the query, and then handle the formatting and output. Monitoring the work and performance and making small changes as you optimized the query will help to get the best overall performance.

This process in code format looks like this:

logscale
tag-filters
| field filters
| transformations
| aggregate function
| post processing of aggregate result

Remember that the order of operations for queries is important. Let's look at an example of a resource-costly query:

logscale
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| formattime("%F %T.%Q", as=Timestamp, field=@timestamp, timezone=UTC)
| table([fStart, Timestamp, fEnd])

This query is particularly costly because it formats timestamps for all events in the defined timespan.

Specifying a limit of 200 events prior to output transformation will make the query run much faster because it displays only 200 events, thus performing fewer format operations:

logscale
| table([@timestamp],limit=200) // selects the last 200 timestamps by timestamp
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| format("%1$TF %1$TT.%1$TQ", field=[@timestamp], as=Timestamp), timezone=UTC)
| table([fStart, Timestamp, fEnd])

Tip: To check which query format is less expensive, run them both using the same data set. Then check LogScale's measure of their cost by looking at the 'Work' report, found below the query result. The lower the number, the better the result!

Using Tags in Queries

Tags are essential when developing queries, particularly at scale. Tags tell LogScale where to search for data. The more specific the tag, the better the search will perform.

You can also apply tags to fields. Doing so allows LogScale to quickly and efficiently organize, include, and/or exclude large collections of events as you search. Applying tags to raw telemetry is done for you transparently via the parser when dealing with Falcon LTR data. Two important fields are already tagged in LogScale: event_simpleName and cid. To ensure you make best use of the tags, be sure to place a # in front of each of them. This allows LogScale to efficiently and ruthlessly discard large swaths of events that you are not interested in.

For example, in the following queries we're looking for events that map an aid to a ComputerName. These events have multiple associated tags, primarily #kind and #event_simpleName. The #kind tag has two values: Primary and Secondary. With few exceptions, almost every Falcon event will be a Primary or Secondary data type. with #kind, we're dealing with very large "buckets of data, while #event_simpleName is specific to certain data types.

Using the larger "bucket" of #kind events:

logscale
// This is our tag.
#kind=Secondary
// This is a filter within that tag looking for a specific data type.
| SecondaryEventType=aidmaster
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The net result of that query- 124k work units. Remember: we're going through very large amounts of data when using the #kind tag.

Instead, let's look at another version of the query by narrowing it down to two very specific tags: #event_simpleName=AgentOnline or #event_simpleName=HostnameChanged.

logscale
// Look at these extremely specific event types that have similar data. 
#event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The result? Only 4K work units, versus the previous 124K work units. That's a 31x reduction in the amount of resources used by the query. This translates to significant time savings when dealing with extremely large data sets.

For a more in-depth discussion on Event Tags and how they may impact your search capabilities, visit our documentation here:Event Tags

Capitalization and Queries

When specifying fields and values, LogScale is case-sensitive- be aware of case-sensitivity in commonly used fields like event_platform. However, this feature can be overridden using regex delimiters in the form of forward slashes and a case insensitivity modifier i.

For example:

logscale
// This is case-sensitive                  
event_platform=Lin
logscale
// This is case-insensitive                  
event_platform=/Lin/i

Without the flag, only exact match entries will be returned. With it, entries containing all variations of capitalization are returned. This work-around resolves the problem for a single query. For a more permanent solution:

  1. Using the query panel, complete a search similar to the example above. Note that the results of the query are not important here.

  2. Click Save as near the top right corner of the user interface.

  3. Name the query (preferably something like Text Insensitive) and click Save.

  4. Go to the Settings tab. In the Basic Information section, set the 'Default Query' to your newly saved query.

Now when you open the Search tab the query panel will be set with your default query, which is set to case-insensitive. Double click the default text within the query and change it to your needs. To remove case-sensitivity, simply remove the case-sensitivity modifier (the letter 'i'). To reset your default query, go back to the 'Settings' tab and select another query, or no query by selecting the 'blank' choice.