The regex()
works both as a filter and can extract
new fields using a regular expression. The regular expression can contain
one or more named capturing groups. Fields with the names of the groups
will be added to the events.
Parameter | Type | Required | Default Value | Description |
---|---|---|---|---|
field | string | optional[a] | @rawstring | Specifies the field to run the regular expression against. |
flags | string | optional[a] | m | Specifies regex modifier flags. |
Valid Values | ||||
F | Use the LogScale Regex Engine v2 | |||
d | Period (.) also includes newline characters | |||
i | Ignore case for matched values | |||
m | Multi-line parsing of regular expressions | |||
limit | integer | optional[a] | 100 | Defines the maximum number of events to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly. |
regex [b] | string | required | Specifies a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events. | |
repeat | boolean | optional[a] | false | If set to true, multiple matches yields multiple events. |
Valid Values | ||||
false | Match at most one event | |||
true | Match multiple events | |||
strict | boolean | optional[a] | true | Specifies if events not matching the regular expression should be filtered out of the result set. |
Valid Values | ||||
false | Events not matching the regular expression are not filtered out then the regex matches. | |||
true | Events not matching the regular expression are filtered out of the result set. | |||
[a] Optional parameters use their default value unless explicitly set. |
Hide omitted argument names for this function
Omitted Argument NamesThe argument name for
regex
can be omitted; the following forms of this function are equivalent:logscale Syntaxregex("value")
and:
logscale Syntaxregex(regex="value")
These examples show basic structure only.
Hide negatable operation for this function
Negatable Function OperationThis function is negatable, implying the inverse of the result. For example:
logscale Syntax!regex()
Or:
logscale Syntaxnot regex()
For more information, see Negating the Result of Filter Functions.
Regular expressions in LogScale allow you search (filter) and extract information and are a very common part of the LogScale language and syntax.
LogScale uses JitRex which closely follows — but does not entirely replicate — the syntax of RE2J regular expressions, which is very close to Java's regular expressions. See Regular Expression Syntax for more information.
Note
To ensure compatibilty, it is recommended to always test your regular expressions inside LogScale, instead of a 3rd party regex tool.
Escaping Characters
Care needs to be taken when escaping characters in the regular
expression submitted to the regex()
function. The
functions uses the \
backslash character
to indicate when an individual character needs to be escaped, which is
used in many common situations to indicate the original character. This
works for all characters except the backslash itself. Within
regex()
you must double-escape the backslash; this
is because it needs to be escaped for definition within the string, and
then again when the regular expressed is parsed.
This can cause complexities when looking for filenames that use the
backslash (for example, Windows filename
\Windows\tmp\myfile.txt
). The
following regular expression will not work as expected:
regex("\\(?<file_name>[^\\]+$)")
The regular expression is trying to identify all the text between the
\
character. However, because we are
submitting a string to the regex()
, the regular
exprssion will be expanded to:
\(?<file_name>[^\]+$)
Because the backslash is only escaped once the expression will fail. Instead, escape the backslash twice:
regex("\\\\(?<file_name>[^\\\\]+$)")
Two alternatives exist to avoid this:
Use the ASCII character code (
\x5c
) to specify the backslash:logscaleregex("\x5c\x5c(?<file_name>[^\x5c\x5c]+$)")
Use the
/regex
which is only parsed once and so only needs to be escaped once:logscale Syntax/\\(?<file_name>[^\\]+$)
The operation of regex()
and /regex/
are summarized in the table below:
Operation |
regex()
|
/regex/
|
---|---|---|
Default search | @rawstring | All defined or parsed fields and @rawstring (not tags, @id or timestamp fields) |
Specific field search |
Using field parameter
|
Using field = /regex/
|
Note that:
foo = /regex/
andregex("regex", field=foo)
are equivalent; the latter has the benefit that more parameters can be used to refine the search. Specifically, it allows for specifyingstrict=false
. The former has the benefit that the regular expression is not written as a string and therefore there are elements that don't need escaping./regex/
specifies free-text search which searches all fields. Wehn used in a query it searches exactly the fields as they were in the original event, and it works only before the first aggregator.
The difference in search scope between the two regex syntax operations
introduces a significant performance difference between the two. Using
regex()
searches only the specified field
(@rawstring by default) and can be significantly
more performant than the /regex/
syntax depending on the
number of fields in the dataset.
Using g
in flags
When performing queries, the g
option
— used for global, as in repeating — is allowed in a query,
but is not an acceptable option for the
flags
parameter. To use one
of the parameters for multiple matches, you should instead set the
repeat
parameter to
true
.
For more information, see Global (Repeating) Matches.
regex()
Examples
Extract the domain name of the http referrer field. Often this field contains a full url, so we can have many different URLs from the same site. In this case we want to count all referrals from the same domain. This will add a field named refdomain to events matching the regular expression.
regex("https?://(www.)?(?<refdomain>.+?)(/
| $)", field=referrer)
| groupby(refdomain, function=count())
| sort(field=_count, type=number, reverse=true)
Extract the user id from the url field. New fields are stored in a field named userid.
regex(regex="/user/(?<userid
>\\S+)/pay", field=url)
Show how to escape "
in the regular
expression. This is necessary because the regular expression is itself
in quotes. Extract the user and message from events like:
Peter: "hello"
and
Bob: "good morning"
.
regex("(?<name>\\S+): \"(?<msg>\\S+)\"")
Note
There are no default flags for a regular expression. For example:
@rawstring=/expression/
Is syntactically equivalent to:
regex("expression")
Or:
regex("expression", flags="")
When using flags:
@rawstring=/expression/m
Is syntactically equivalent to:
regex("expression", flags="m")