This function both works as a filter and can extract new fields using a
regular expression. The regular expression can contain one or more named
capturing groups. Fields with the names of the groups will be added to the
events. Using "
in already quoted strings requires escaping.
This is sometimes necessary when writing regular expressions.
LogScale uses JitRex which closely follows — but does not entirely replicate — the syntax of RE2J regular expressions, which is very close to Java's regular expressions. See Regular Expression Syntax for more information.
Note
To ensure compatibilty, it is recommended always testing your regex inside LogScale, instead of in any 3rd party regex tool.
Function Traits: Negatable
, Transformation
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
field | string | optional | @rawstring | Specifies the field to run the regular expression against. |
flags | string | optional | m | Specifies regex modifier flags. |
Valid Values | d | Period (.) also includes newline characters | ||
i | Ignore case for matched values | |||
m | Multi-line parsing of regular expressions | |||
limit | number | optional | 100 | Defines the maximum number of events to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly. |
regex [a] | string | required | Specifies a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events. | |
repeat | boolean | optional | false | If set to true, multiple matches yields multiple events. |
Valid Values | false | Match at most one event | ||
true | Match multiple events | |||
strict | boolean | optional | true | Specifies if events not matching the regular expression should be filtered out of the result set. |
Valid Values | false | Events not matching the regular expression are not filtered out then the regex matches. | ||
true | Events not matching the regular expression are filtered out of the result set. | |||
The parameter name for regex
can be omitted; the following forms are equivalent:
regex("value")
and:
regex(regex="value")
Important
The regex()
provides similar functionality to the
/regex/
syntax, however, the regex()
function searches specific fields (and only
@rawstring by default). In contrast, the
/regex/
syntax searches all sent and
parsed fields and @rawstring.
If you specify a field with the
/regex/
syntax, the search is limited only to those field,
for example:
| sessionid = /sess/
Limits the search to only the specified field.
The difference in search scope between the two regex syntax operations
introduces a significant performance difference between the two. Using
regex()
searches only the specified field
(@rawstring by default) and can be significantly
more performant than the /regex/
syntax depending on the
number of fields in the dataset.
Attention
When performing queries, the g
option
— used for global, as in repeating — is allowed in a query,
but is not an acceptable option for the flags
parameter. To use one of the parameters for multiple matches, you should
instead set the repeat
parameter to
true
.
For more information, see Global (Repeating) Matches.
regex()
Examples
Extract the domain name of the http referrer field. Often this field contains a full url, so we can have many different URLs from the same site. In this case we want to count all referrals from the same domain. This will add a field named refdomain to events matching the regular expression.
regex("https?://(www.)?(?<refdomain>.+?)(/
| $)", field=referrer)
| groupby(refdomain, function=count())
| sort(field=_count, type=number, reverse=true)
Extract the user id from the url field. New fields are stored in a field named userid.
regex(regex="/user/(?<userid
>\\S+)/pay", field=url)
Show how to escape "
in the regular
expression. This is necessary because the regular expression is itself
in quotes. Extract the user and message from events like:
Peter: "hello"
and
Bob: "good morning"
.
regex("(?<name>\\S+): \"(?<msg>\\S+)\"")
Note
Note that the default flags for a regular expression is no flags, so that:
@rawstring=/expression/
Is syntactically equivalent to:
regex("expression")
Or:
regex("expression", flags="")
When using flags:
@rawstring=/expression/m
Is syntactically equivalent to:
regex("expression", flags="m")