Common Query Examples

This section will provide you with several common query examples for your use. It's important to note that because every user's data, repository, and setup is different, these examples may need modification in order to work effectively. For more information on LogScale's query language and best practices beyond this tutorial, refer to our documentation here: Writing Queries

groupBy() Examples

groupBy() groups together events by one or more specified fields, which is similar to the GROUP BY method in SQL databases. It can also be used to execute aggregate functions on individual groups, and results are returned in the field parameter for each aggregate function. A good example of this is the _count field if the count() function is used.

Adding a Single Field to groupBy() Results

Let's say you need to use groupBy() for aggregation, but also need to include fields such as the aid, aip, event_simpleName, etc. This is where the stats() and collect() functions come into play.

Here's a simple example of adding DomainName to the results of grouping by ComputerName and FileName. This says "group everything by unique pairs of ComputerName and FileName, and then collect all of the DomainName values from each of those unique pairings:

logscale
groupBy([ComputerName, FileName], function=collect(DomainName))

The results look like this:

Adding Additional Fields to groupBy() Results

Maybe instead of a single field, you'd like to collect multiple fields. The correct way of doing this is by using an array:

logscale
groupBy([ComputerName, FileName], function=collect([DomainName, MD5HashData, SHA256HashData]))

Notice the [ ... ]? That says "pass all of these into the function, and display the output". This is the result:

Adding a count() to groupBy() Results When Using collect()

By default, groupBy() without a function argument includes a count() function. The result is the groupBy() results have a _count field that displays the total count for each aggregation. However, once you add a function to that, the count() is no longer in the results. The solution? Simply add the count() function into an array.

logscale
groupBy(ComputerName, function=[collect(DomainName), count()])

The results now include count() and look like this:

Passing a groupBy() Result to timeChart()

The groupBy() function removes the @timestamp field by default. This generally doesn't matter unless you're trying to use something like timeChart() which requires @timestamp to work correctly.

logscale
groupBy(ComputerName, function=selectLast(@timestamp))
| timechart(ComputerName)

If you need to include other fields along with the timestamp, you can pass them all in an array:

logscale
groupBy(ComputerName, function=[collect([UserName, DomainName]), selectLast(@timestamp)])
| timechart(ComputerName)
Getting the First and Last Event of a groupBy()

Let's say you're doing a groupBy() for a particular event, but you'd like to see the time of the first and last occurrence in the results. You'd do this:

logscale
groupBy(UserName, function=[max(@timestamp, as=lastSeen), min(@timestamp, as=firstSeen)])
| firstSeen:=formatTime(field=firstSeen, format="%Y/%m/%d %H:%M:%S") 
| lastSeen:=formatTime(field=lastSeen, format="%Y/%m/%d %H:%M:%S")

Keep in mind that @timestamp is epoch, which means you can basically search for the "smallest" which is the oldest, or the "largest" which is the most recent. That query says "group the results by UserName, find the smallest/oldest timestamp, find the largest/newest timestamp, and then reformat the epoch times into something readable."