Query Writing Best Practices

Specificity matters when writing LogScale queries. The more specific you can be when writing a query, the fewer results you will have to sort through and the faster the query will run. Writing queries that run efficiently and reduce the cost of CPU and memory usage must follow an ordered flow through the query pipeline, starting with the repository.

Following these rules in order provides the best results:

  • Reduce the dataset with filters to have results limited to a smaller pool of information.

  • Postpone transformations after the aggregator whenever possible.

  • Always aggregate.

With more detail, the process should look like this:

  1. Set one or more tag filters to narrow the search to only certain events.

  2. Filter the fields (for example, with *, "", !=).

  3. Add any other filter that may add, remove or modify fields, using transformation functions such as format(), eval() (or any function with an 'as' parameter).

  4. Aggregate the filtered results using the aggregate functions (such as sum(), top(), or groupBy()).

  5. Post process the aggregate result, using functions such as table() or format(), among others.

The syntax in code format looks like this:

logscale
tag-filters
| field filters
| transformations
| aggregate function
| post processing of aggregate result

Remember that the order of operations for queries is important. Let's look at an example of a resource-costly query:

| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| formattime("%F %T.%Q", as=Timestamp, field=@timestamp, timezone=UTC)
| table([fStart, Timestamp, fEnd])

This query is particularly costly because it formats timestamps for all events in the defined timespan.

Specifying a limit of 200 events prior to output transformation will make the query run much faster because it displays only 200 events, thus performing fewer format operations:

| table([@timestamp],limit=200) // selects the last 200 timestamps by timestamp
| eval(fStart = @timestamp - 600)
| eval(fEnd = @timestamp + 600)
| format("%1$TF %1$TT.%1$TQ", field=[@timestamp], as=Timestamp), timezone=UTC)
| table([fStart, Timestamp, fEnd])

Tip: To check which query format is less expensive, run them both using the same data set. Then check LogScale's measure of their cost by looking at the 'Work' report, found below the query result. The lower the number, the better the result!

Using Tags in Queries

Tags are essential when developing queries, particularly at scale. Tags tell LogScale where to search for data. The more specific the tag, the better the search will perform.

You can also apply tags to fields. Doing so allows LogScale to quickly and efficiently organize, include, and/or exclude large collections of events as you search. Applying tags to raw telemetry is done for you transparently via the parser when dealing with Falcon LTR data. Two important fields are already tagged in LogScale: event_simpleName and cid. To ensure you make best use of the tags, be sure to place a # in front of each of them. This allows LogScale to efficiently and ruthlessly discard large swaths of events that you are not interested in.

For example, in the following queries we're looking for events that map an aid to a ComputerName. These events have multiple associated tags, primarily #kind and #event_simpleName. The #kind tag has two values: Primary and Secondary. With few exceptions, almost every Falcon event will be a Primary or Secondary data type. with #kind, we're dealing with very large "buckets of data, while #event_simpleName is specific to certain data types.

Using the larger "bucket" of #kind events:

// This is our tag.
#kind=Secondary
// This is a filter within that tag looking for a specific data type.
| SecondaryEventType=aidmaster
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The net result of that query- 124k work units. Remember: we're going through very large amounts of data when using the #kind tag.

Instead, let's look at another version of the query by narrowing it down to two very specific tags: #event_simpleName=AgentOnline or #event_simpleName=HostnameChanged.

// Look at these extremely specific event types that have similar data. 
#event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The result? Only 4K work units, versus the previous 124K work units. That's a 31x reduction in the amount of resources used by the query. This translates to significant time savings when dealing with extremely large data sets.

For a more in-depth discussion on Event Tags and how they may impact your search capabilities, visit our documentation here:Event Tags

Capitalization and Queries

When specifying fields and values, LogScale is case-sensitive- be aware of case-sensitivity in commonly used fields like event_platform. However, this feature can be overridden using regex delimiters in the form of forward slashes and a case insensitivity modifier in the form of the letter 'i'.

For example:

// This is case-sensitive                  
event_platform=Lin
// This is case-insensitive                  
event_platform=/Lin/i

Without the flag, only exact match entries will be returned. With it, entries containing all variations of capitalization are returned. This work-around resolves the problem for a single query. For a more permanent solution:

  1. Using the query panel, complete a search similar to the example above. Note that the results of the query are not important here.

  2. Click 'Save as' near the top right corner of the user interface.

  3. Name the query (preferably something like 'Text Insensitive') and click 'Save'.

  4. Go to the 'Settings' tab. In the 'Basic Information' section, set the 'Default Query' to your newly saved query.

Now when you open the 'Search' tab the query panel will be set with your default query, which is set to case-insensitive. Double click the default text within the query and change it to your needs. To remove case-sensitivity, simply remove the case-sensitivity modifier (the letter 'i'). To reset your default query, go back to the 'Settings' tab and select another query, or no query by selecting the 'blank' choice.