Best Practice: Using Tags in Queries
Tags are particularly important when using queries, especially when operating at scale. Tags tell LogScale where to search for data- think of it as a way for you to bookmark where to check for particular buckets of data. For a bit more context, see CrowdStrike's https://www.crowdstrike.com/blog/what-makes-crowdstrike-falcon-logscale-so-fastblog post on the topic/ulink>.
The more specific the tag, the more optimized the search will be. The
efficiency of the query can be
measured by looking at the Work:
value
at the bottom of the query. Work:
values
are scored in reverse, where lower numbers are better.
Let's look at an example. In the following queries, we're looking for
events that map an aid to a
ComputerName
. These events
have multiple tags associated with them, primarily
#kind and
#event_simpleName. The
#kind tag has two values:
Primary
and
Secondary
. Almost every Falcon event
will be a Primary
or
Secondary
data type. With
#kind, we're dealing with
extremely large buckets of data. On the other hand,
#event_simpleName is extremely
specific to certain data types.
Using the larger bucket of #kind events:
// This is our tag.
#kind=Secondary
// This is a filter within that tag looking for a specific data type.
| SecondaryEventType=aidmaster
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)
The query results? 124,000 work units. Remember- we're digging through extremely large buckets of data with the #kind tag.
Now let's try another version of that query, but instead of using a broad tag like #kind=Secondary, we use much more specific tags to narrow down the results: #event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged.
// Look at these extremely specific event types that have similar data.
#event_simpleName=AgentOnline OR #event_simpleName=HostnameChanged
// Group the results by aid and show the last ComputerName value for each.
| groupBy(aid, function=selectLast(ComputerName), limit=max)
The result? Only 4K work units, versus the previous 124K work units. That's a 31x reduction in the amount of resources used. This translates to a ton of time savings when you're dealing with extremely large data sets.
Remember, when developing queries:
Always use tags.
The more specific the tag, the more optimized the query will likely be.
Pay attention to the work units at the bottom. Lower is better.