This function both works as a filter and can extract new fields using a
regular expression. The regular expression can contain one or more named
capturing groups. Fields with the names of the groups will be added to the
events. Using "
in already quoted
strings requires escaping. This is sometimes necessary when writing
regular expressions. LogScale uses JitRex which closely follows the syntax
of re2j regular
expressions which has a syntax very close to Java's regular
expressions. Check out
the syntax.
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
field | string | false | @rawstring | Specifies the field to run the regular expression against. Default is running against @rawstring . |
flags | string | false | m | Specifies other regex flags. |
Valid Values | d | Period (.) includes newline characters | ||
i | Ignore case for matched values | |||
m | Multi-line parsing of regular expressions | |||
limit | number | false | false | Defines the maximum number of events to produce (defaults to 100). A warning is produced if this limit is exceeded, unless the parameter is specified explicitly. |
regex | string | true | Specifies a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events. [a] | |
repeat | boolean | false | false | If set to true, multiple matches yields multiple events. |
Valid Values | false | Match at most one event | ||
true | Match multiple events | |||
strict | boolean | false | true | Specifies if events not matching the regular expression should be filtered out of the result set. Strict is the default. |
Important
The regex()
provides similar functionality to the
/regex/
syntax, however, the
regex()
function searches specific fields (and only
@rawstring by default). In contrast, the
/regex/
syntax searches all sent
and parsed fields and @rawstring. The difference in
search scope between the two regex syntax operations introduces a
significant performance difference between the two. Using
regex()
searches only the specified field
(@rawstring by default) and can be sginificantly
more performant than the /regex/
syntax depending on
the number of fields in the dataset.
When performing queries, the g
option — used for global, as in repeating — is allowed in a
query, but is not an acceptable option for the
flags
parameter. To use one of the parameters for
multiple matches, you should instead set the repeat
parameter to true
.
Examples
Extract the domain name of the http referrer field. Often this field contains a full url, so we can have many different URLs from the same site. In this case we want to count all referrals from the same domain. This will add a field named refdomain to events matching the regular expression.
regex("https?://(www.)?(?<refdomain>.+?)(/|$)", field=referrer)
| groupby(refdomain, function=count()) | sort(field=_count, type=number, reverse=true)
Extract the user id from the url field. New fields are stored in a field named userid.
regex(regex="/user/(?userid
\\S+)/pay", field=url)
Show how to escape "
in the
regular expression. This is necessary because the regular expression is
itself in quotes. Extract the user and message from events like:
Peter: "hello"
and
Bob: "good morning"
.
regex("(?<name>\\S+): \"(?<msg>\\S+)\"")
Note that the default flags for a regular expression is no flags, so that:
@rawstring=/expression/
Is syntactically equivalent to:
regex("expression")
Or:
regex("expression", flags="")
When using flags:
@rawstring=/expression/m
Is syntactically equivalent to:
regex("expression", flags="m")