The regex() works both as a filter and can extract new fields using a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events.

ParameterTypeRequiredDefaultDescription
fieldstringoptional[a]@rawstring Specifies the field to run the regular expression against.
flagsstringoptional[a]m Specifies regex modifier flags.
  Valid Values
   dPeriod (.) also includes newline characters
   iIgnore case for matched values
   mMulti-line parsing of regular expressions
limitintegeroptional[a]100 Defines the maximum number of events to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly.
regex[b]stringrequired  Specifies a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events.
repeatbooleanoptional[a]false If set to true, multiple matches yields multiple events.
  Valid Values
   falseMatch at most one event
   trueMatch multiple events
strictbooleanoptional[a]true Specifies if events not matching the regular expression should be filtered out of the result set.
  Valid Values
   falseEvents not matching the regular expression are not filtered out then the regex matches.
   trueEvents not matching the regular expression are filtered out of the result set.

[a] Optional parameters use their default value unless explicitly set

[b] The argument name regex can be omitted.

Omitted Argument Names

The argument name for regex can be omitted; the following forms of this function are equivalent:

logscale
regex("value")

and:

logscale
regex(regex="value")

Regular expressions in LogScale allow you search (filter) and extract information and are a very common part of the LogScale language and syntax.

LogScale uses JitRex which closely follows — but does not entirely replicate — the syntax of RE2J regular expressions, which is very close to Java's regular expressions. See Regular Expression Syntax for more information.

Note

To ensure compatibilty, it is recommended to always test your regular expressions inside LogScale, instead of a 3rd party regex tool.

Escaping Characters

Care needs to be taken when escaping characters in the regular expression submitted to the regex() function. The functions uses the \ backslash character to indicate when an individual character needs to be escaped, which is used in many common situations to indicate the original character. This works for all characters except the backslash itself. Within regex() you must double-escape the backslash; this is because it needs to be escapef for definition within the string, and then again when the regular expressed is parsed.

This can cause complexities when looking for filenames that use the backslash (e.g. Windows filename \Windows\tmp\myfile.txt). The following regular expression will not work as expected:

logscale
regex("\\(?<file_name>[^\\]+$)")

The regular expression is trying to identify all the text between the \ character. However, because we are submitting a string to the regex(), the regular exprssion will be expanded to:

logscale
\(?<file_name>[^\]+$)

Because the backslash is only escaped once the expression will fail. Instead, escape the backslash twice:

logscale
regex("\\\\(?<file_name>[^\\\\]+$)")

Two alternatives exist to avoid this:

  • Use the ASCII character code (\x5c) to specify the backslash:

    logscale
    regex("\x5c\x5c(?<file_name>[^\x5c\x5c]+$)")
  • Use the /regex which is only parsed once and so only needs to be escaped once:

    logscale
    /\\(?<file_name>[^\\]+$)

Comparing regex() and /regex/ Syntax

The regex() provides similar functionality to the /regex/ syntax, however, the regex() function searches specific fields (and only @rawstring by default). In contrast, the /regex/ syntax searches all sent and parsed fields and @rawstring.

If you specify a field with the /regex/ syntax, the search is limited only to those field, for example:

| sessionid = /sess/

Limits the search to only the specified field.

The difference in search scope between the two regex syntax operations introduces a significant performance difference between the two. Using regex() searches only the specified field (@rawstring by default) and can be significantly more performant than the /regex/ syntax depending on the number of fields in the dataset.

Using g in flags

When performing queries, the g option — used for global, as in repeating — is allowed in a query, but is not an acceptable option for the flags parameter. To use one of the parameters for multiple matches, you should instead set the repeat parameter to true.

For more information, see Global (Repeating) Matches.

regex() Examples

Extract the domain name of the http referrer field. Often this field contains a full url, so we can have many different URLs from the same site. In this case we want to count all referrals from the same domain. This will add a field named refdomain to events matching the regular expression.

logscale
regex("https?://(www.)?(?<refdomain>.+?)(/
| $)", field=referrer)
| groupby(refdomain, function=count())
| sort(field=_count, type=number, reverse=true)

Extract the user id from the url field. New fields are stored in a field named userid.

logscale
regex(regex="/user/(?<userid>\\S+)/pay", field=url)

Show how to escape " in the regular expression. This is necessary because the regular expression is itself in quotes. Extract the user and message from events like: Peter: "hello" and Bob: "good morning".

logscale
regex("(?<name>\\S+): \"(?<msg>\\S+)\"")

Note

There are no default flags for a regular expression. For example:

logscale
@rawstring=/expression/

Is syntactically equivalent to:

logscale
regex("expression")

Or:

logscale
regex("expression", flags="")

When using flags:

logscale
@rawstring=/expression/m

Is syntactically equivalent to:

logscale
regex("expression", flags="m")