Best Practice: Using Tags in Queries

Tags are particularly important when using queries, especially when operating at scale. Tags tell LogScale where to search for data- think of it as a way for you to bookmark where to check for particular buckets of data. For a bit more context, see CrowdStrike's blog post on the topic.

The more specific the tag, the more optimized the search will be. The efficiency of the query can be measured by looking at the Work: value at the bottom of the query. Work: values are scored in reverse, where lower numbers are better.

Let's look at an example. In the following queries, we're looking for events that map an aid to a ComputerName. These events have multiple tags associated with them, primarily #kind and #event_simpleName. The #kind tag has two values: Primary and Secondary. Almost every Falcon event will be a Primary or Secondary data type. With #kind, we're dealing with extremely large buckets of data. On the other hand, #event_simpleName is extremely specific to certain data types.

Using the larger bucket of #kind events:

// This is our tag.
#kind=Secondary
// This is a filter within that tag looking for a specific data type.
| SecondaryEventType=aidmaster
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The query results? 124,000 work units. Remember- we're digging through extremely large buckets of data with the #kind tag.

Now let's try another version of that query, but instead of using a broad tag like #kind=Secondary, we use much more specific tags to narrow down the results: #event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged.

// Look at these extremely specific event types that have similar data. 
#event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)

The result? Only 4K work units, versus the previous 124K work units. That's a 31x reduction in the amount of resources used. This translates to a ton of time savings when you're dealing with extremely large data sets.

Remember, when developing queries:

  1. Always use tags.

  2. The more specific the tag, the more optimized the query will likely be.

  3. Pay attention to the work units at the bottom. Lower is better.