Best Practice: Using Tags in Queries

Tags are particularly important when using queries, especially when operating at scale. Tags tell LogScale where to search for data- think of it as a way for you to bookmark where to check for particular buckets of data. For a bit more context, see CrowdStrike's https://www.crowdstrike.com/blog/what-makes-crowdstrike-falcon-logscale-so-fastblog post on the topic/ulink>.

The more specific the tag, the more optimized the search will be. The efficiency of the query can be measured by looking at the Work: value at the bottom of the query. Work: values are scored in reverse, where lower numbers are better.

Let's look at an example. In the following queries, we're looking for events that map an aid to a ComputerName. These events have multiple tags associated with them, primarily #kind and #event_simpleName. The #kind tag has two values: Primary and Secondary. Almost every Falcon event will be a Primary or Secondary data type. With #kind, we're dealing with extremely large buckets of data. On the other hand, #event_simpleName is extremely specific to certain data types.

Using the larger bucket of #kind events:

// This is our tag.
#kind=Secondary
// This is a filter within that tag looking for a specific data type.
| SecondaryEventType=aidmaster
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The query results? 124,000 work units. Remember- we're digging through extremely large buckets of data with the #kind tag.

Now let's try another version of that query, but instead of using a broad tag like #kind=Secondary, we use much more specific tags to narrow down the results: #event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged.

// Look at these extremely specific event types that have similar data. 
#event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The result? Only 4K work units, versus the previous 124K work units. That's a 31x reduction in the amount of resources used. This translates to a ton of time savings when you're dealing with extremely large data sets.

Remember, when developing queries:

  1. Always use tags.

  2. The more specific the tag, the more optimized the query will likely be.

  3. Pay attention to the work units at the bottom. Lower is better.