Determines the set union of array values over input events.
Used to compute the values that occur in any of the events supplied to
this function. The output order of the values is not defined. If no arrays
are found, the output is empty.
Deduplicating fields of information where there are multiple
occurences of a value in a single field, maybe separated by a
single character can be achieved in a variety of ways. This
solution uses array:union() and
split create a unique array and then split
the content out to a unique list.
For example, when examining the humio and looking
for the browsers or user agents that have used your instance,
the UserAgent data will
contain the browser and toolkits used to support them, for
example:
Raw Events
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
The actual names are the
Name/Version pairs showing
compatibility with different browser standards. Resolving this
into a simplified list requires splitting up the list,
simplifying (to remove duplicates), filtering, and then
summarizing the final list.
Step-by-Step
Starting with the source repository events.
logscale
splitString(field=userAgent,by=" ",as=agents)
First we split up the
userAgent field using
a call to splitString() and place the
output into the array field
agents
This will create individual array entries into the
agents array for each
event:
Using array:union() we aggregate the list
of user agents across all the events to create a list of
unique entries. This will eliminate duplicates where the value
of the user agent is the same value.
The event data now looks like this:
browsers[0]
browsers[1]
browsers[2]
Gecko/20100101
Safari/537.36
AppleWebKit/605.1.15
An array of the individual values.
logscale
|split(browsers)
Using the split() will split the array
into individual events, turning:
browsers[0]
browsers[1]
browsers[2]
Gecko/20100101
Safari/537.36
AppleWebKit/605.1.15
into:
_index
row[1]
0
Gecko/20100101
1
Safari/537.36
2
AppleWebKit/605.1.15
Event Result set.
Summary and Results
The resulting output from the query is a list of events with
each event containing a matching _index and
browser. This can be useful if you want to perform further
processing on a list of events rather than an array of values.
Find Union of Array Over multiple Events
Find union of an array over multiple events using the array:union() function
Query
logscale
array:union(mailto,as=unique_mails)
Introduction
Arrays are handy when you want to work with multiple values of
the same data type. The array:union()
function is used to find distinct values of an array over
multiple events. One important feature of UNION is, that it
removes duplicate rows from the combined data meaning if there
are repetitions, then only one element occurrence should be in
the union.
Example incoming data might look like this:
mailto[0]
mailto[1]
foo@example.com
bar@example.com
bar@example.com
Step-by-Step
Starting with the source repository events.
logscale
array:union(mailto,as=unique_mails)
Searches in the mailto
array across multiple events and returns the union of element
values in a new array, where the unique emails will appear only
once. In this case creating a unique list of email addresses in
a single array.
Event Result set.
Summary and Results
The query is used to search for and eliminate duplicates of
e-mail addresses in arrays/combined datasets.