Best Practice: Format query output using groupBy()

One of the more powerful aggregate functions in LogScale is the use of groupBy(). groupBy() is akin to stats() in Event Search. One thing to keep in mind when using groupBy() is the use of parentheses and square brackets. To invoke an aggregate function, you open with parentheses. To perform that aggregation on multiple fields, you encase your fields or conditions in square brackets.

logscale
#event_simpleName=ProcessRollup2 event_platform=Win ImageFileName=/\\powershell\.exe/i
| groupBy(SHA256HashData, function=([count(aid, distinct=true, as=uniqueEndpoints), count(aid, as=totalExecutions), collect(CommandLine)]))
Example of an aggregated query using groupby()

If we were to isolate the groupBy() statement above to make the clustering a little easier to understand, it would look like this:

logscale
| groupBy(SHA256HashData, function=([count(aid, distinct=true, as=uniqueEndpoints), count(aid, as=totalExecutions), collect(CommandLine)]))

Note the use of the square brackets after invoking function. This is because we want to use multiple aggregations in this groupBy() query.

If you wanted to use groupBy() for multiple fields, you would also use square brackets. As an example:

logscale
| groupBy([SHA256HashData, FileName], function=([count(aid, distinct=true, as=uniqueEndpoints), count(aid, as=totalExecutions), collect(CommandLine)]))

Note the first two fields specified immediately after groupBy().

The same principle would be applied if we wanted to collect multiple fields.

logscale
| groupBy([SHA256HashData, FileName], function=([count(aid, distinct=true, as=uniqueEndpoints), count(aid, as=totalExecutions), collect([CommandLine, UserSid])]))

Note how:

logscale
collect(CommandLine)

Becomes:

logscale
collect([CommandLine, UserSid])

This takes a little practice, but once mastered the syntax is logical and easy to interpret. To assist, LogScale will insert a closing parenthesis or closing square bracket when you open one.