Matches or joins data from query results with a table. The table can be provided either as a LookUp file — CSV file or through a limited form of JSON file, uploaded using Lookup Files — or, from LogScale 1.163, as an ad-hoc table Using Ad-hoc Tables.
If you are looking for match expressions, see Match Statements.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
column | string or array | optional[a] | field parameter | Which column in the file to use for the match. A single column or an array or columns can be specified. |
field | string or array | required | Which field in the event (log line) must match the given column value. A single field or an array of fields can be specified. Field and column must have the same length, are matched in order and must all match. | |
file [b] | string | required | Specifies the source file (when using Look Up files) or the name of the ad-hoc table. The file name should be specified with .csv or .json suffix. | |
glob (deprecated) | boolean | optional[a] | false | This parameter is deprecated. Use mode parameter with the glob option instead. (deprecated in 1.23.0) |
ignoreCase | boolean | optional[a] | false | If true, ignore case when matching against the CSV data. |
include | string or array | optional[a] | The columns to include. If no argument is given, include all columns from the corresponding row in the output event. | |
mode | string | optional[a] | string | The function to use when matching against keys. |
Valid Values | ||||
cidr | The key is interpreted as a CIDR subnet and the event is matched if the field contains an IP within the subnet. If multiple subnets match, the most specific one is selected or an arbitrary one if there are multiple equally specific subnets. | |||
glob | The key is interpreted as a globbing pattern with
* and matched accordingly, for
example, a CSV key value of
*thisMatch* would match the
field value of
123thisMatch456 . | |||
string | The matching is done using exact string matching. | |||
nrows | string | optional[a] | 1 | The maximum number of rows an event can match with in a CSV file. Valid inputs are any positive number up to 500 or max where max is equivalent to 500. |
strict | boolean | optional[a] | true | If true (the default) selects only the fields that match a key in the file; if false lets all events through (works like the deprecated lookup() ). |
[a] Optional parameters use their default value unless explicitly set. |
When lookup information from files are loaded from a package, the package name should be specified in addition to the filename. For example:
match("falcon/investigate/logoninfo.csv",field="loookupname")
For more information on referring to package resources, see Referencing Package Assets.
The default behavior of this function — when
strict
is set to true
— works like an INNER
JOIN
. When
strict
is set to
false
the function enriches
events.
When using mode=glob
,
the underlying CSV is limited to 20,000 rows/lines.
For self-hosted customers, the maximum value for glob matches is
configurable using GLOB_MATCH_LIMIT
.
Using match()
with Ad-hoc Tables
match()
can be used to perform a join
using ad-hoc tables. See
Using Ad-hoc Tables for more
information.
Using match()
with Lookup Files
When using match()
for joining with
Look Up files, two file formats are supported:
CSV. The recommended format, ensures better performance, allows for additional functionalities (using parameters) and is compatible with related functions, like
readFile()
. See CSV File Formats for more information.JSON. Supports object and array-based formats. See JSON File Formats for more information.
CSV File Formats
For Comma Separated Values (CSV) files, whitespace gets
included in the keys and values. To include the separator
","
in a value,
quote using the "
character. The following file is a valid CSV file:
userid,name
1,chr
2,krab
"4","p,m"
7,mgr
The first line is interpreted as the column title. When querying, the column in the field should be used to identify which column to match against.
When using match()
with a single
column:
match(test.csv, field=somefield, column=column1)
the last matching row in
test.csv
is used.
Since the function supports the selection of an array of columns, you can match multiple pairs of fields and columns against a CSV file. For example, with these events and fields:
{
field1: c,
field2: f
},
{
field1: c,
field2: e
}
and the following
test.csv
match
file:
column1, column2, column3
a, b, d
c, d, a
c, e, f
The example query:
match(test.csv, field=[field1, field2], column=[column1, column2])
will produce the following output:
item | value |
---|---|
field1 | c |
field2 | e |
column3 | f |
Similar to when matching against a single column, in the case of multiple columns, the function will use the last matching row in the file. This behavior is applied starting from version 1.145.
match()
supports matching on multiple
rows, meaning that you can match a single event with
multiple rows. Each matching row will generate a separate
event.
Example 1 with
nrows=2
. With event:
{
field1: c
}
and the following
test.csv
match
file:
column1, column2, column3
c, b, a
a, b, d
c, d, a
c, e, f
The example query:
match(test.csv, field=field1, column=column1, nrows=2)
will generate the following output:
item | value |
---|---|
field1 | c |
column2 | e |
column3 | f |
field1 | c |
column2 | d |
column3 | a |
Example 2 with
nrows=max
. With event:
{
field1: c
}
and the following
test.csv
match
file:
column1, column2, column3
c, b, a
a, b, d
c, d, a
c, e, f
The example query:
match(some.csv, field=field1, column=column1, nrows=max)
will generate the following output:
item | value |
---|---|
field1 | c |
column2 | e |
column3 | f |
field1 | c |
column2 | d |
column3 | a |
field1 | c |
column2 | b |
column3 | a |
Example 3 with
nrows=2
. Given weblog
data, HTTP methods could be matched to multiple rows for
the type of output:
method=POST
| match(file="methods.csv",nrows=2,field=method,column="method")
The following
methods.csv
match
file:
method,description
POST,Send Data
POST,Transfer Data in
GET,Retrieve Data
GET,Get Data
could be matched with a longer description, showing two different rows for each matching source row:
method | description | url |
---|---|---|
POST | Send Data | /humio/api/v1/ingest/elastic-bulk |
POST | Transfer Data in | /humio/api/v1/ingest/elastic-bulk |
POST | Transfer Data in | /humio/api/v1/ingest/elastic-bulk |
POST | Send Data | /humio/api/v1/ingest/elastic-bulk |
POST | Transfer Data in | /humio/api/v1/ingest/elastic-bulk |
POST | Send Data | /humio/api/v1/ingest/elastic-bulk |
When matching multiple rows, multiple events, matching the
number of matched
nrows
, will be
generated for each corresponding event and matched lookup
entry.
JSON File Formats
For JSON files, two formats are supported:
Object-based, where the lookup field does not have an explicit name
Array-based, where the information is an array of objects
In the Object-based variant, the lookup values are declared as an object with a key and embedded fields, the key field does not have a name.
{
"1": { "name": "chr" },
"2": { "name": "krab" },
"4": { "name": "pmm" },
"7": { "name": "mgr" }
}
When matching against a file in this case, the name of the field in the JSON object does not need to be used; the key for each value is used instead. For example:
groupBy(@timezone)
| count(@timezone)
| match(file="short.json",field=_count)
In the above, the value of _count will be matched, outputting the match value:
_count | name |
---|---|
2 | krab |
In the array-based variant, the lookup values are declared
as an array of objects, you select which field is the key
using the field
parameter in
match()
.
[
{ "userid": "1", "name": "chr" },
{ "userid": "2", "name": "krab" },
{ "userid": "4", "name": "pmm" },
{ "userid": "7", "name": "mgr" }
]
When using this version, the name of the column to be
matched must be specified using the
column
argument
to match()
:
groupBy(@timezone)
| count(@timezone)
| match(file="long.json",field=_count,column="userid")
This behavior also means that any field in the JSON file can be used as the match value. For example:
...
| match(file="long.json",field=codename,column="name")
This can be useful if you have a JSON file that contains multiple possible lookup values for given records.
For
Important
The match()
does not report an
error if the file format cannot be parsed.
match()
Examples
Matches events for which the id field matches the value of the column in the table "users.csv". Does not add any columns.
match(file="users.csv", column=userid, field=id, include=[])
Matches events for which the
id field is matched
case-insensitive by the glob-pattern in the column
userid
in the table
users.csv
, and add
all other columns of the first matching row to those events.
id =~ match(file="users.csv", column=userid, mode=glob, ignoreCase=true)
Let all events pass through, but events for which the
id field matches the
value of the userid
column in the table
users.csv
will be
enriched with all other columns of the matching row.
id =~ match(file="users.csv", column=userid, strict=false)
Matches events for which the
ip field matches the
CIDR subnet of the
cidr-block column in
the table
cidr-file.csv
. Only
adds the columns
info and
type from the first
matching row.
match(file="cidr-file.csv", column="cidr-block", field=ip, mode=cidr, include=["info","type"])
The function allows for matching multiple pairs of fields and columns against a CSV file.
Given an event with the following fields:
[
{
"field1" : "c",
"field2" : "f"
},
{
"field2" : "e",
"field1" : "c"
}
]
and a test.csv
file:
column1, column2, column3
a, b, d
c, d, a
c, e, f
The query:
match(test.csv, field=[field1, field2], column=[column1, column2])
will produce the following output:
column3 | field1 | field2 |
---|---|---|
f | c | e |