The regex() works both as a filter and can extract new fields using a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events.

ParameterTypeRequiredDefault ValueDescription
fieldstringoptional[a] @rawstring Specifies the field to run the regular expression against.
flagsstringoptional[a] m Specifies regex modifier flags.
   Values
   dPeriod (.) also includes newline characters
   iIgnore case for matched values
   mMulti-line parsing of regular expressions
limitintegeroptional[a] 100 Defines the maximum number of events to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly.
regex[b]stringrequired   Specifies a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events.
repeatbooleanoptional[a] false If set to true, multiple matches yields multiple events.
   Values
   falseMatch at most one event
   trueMatch multiple events
strictbooleanoptional[a] true Specifies if events not matching the regular expression should be filtered out of the result set.
   Values
   falseEvents not matching the regular expression are not filtered out then the regex matches.
   trueEvents not matching the regular expression are filtered out of the result set.

[a] Optional parameters use their default value unless explicitly set.

[b] The parameter name regex can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

Hide negatable operation for this function

Show negatable operation for this function

Regular expressions in LogScale allow you search (filter) and extract information and are a very common part of the LogScale language and syntax.

LogScale uses JitRex which closely follows — but does not entirely replicate — the syntax of RE2J regular expressions, which is very close to Java's regular expressions. See Regular Expression Syntax for more information.

Note

To ensure compatibility, it is recommended to always test your regular expressions inside LogScale, instead of a 3rd party regex tool.

Escaping Characters

Care needs to be taken when escaping characters in the regular expression submitted to the regex() function. The functions uses the \ backslash character to indicate when an individual character needs to be escaped, which is used in many common situations to indicate the original character. This works for all characters except the backslash itself. Within regex() you must double-escape the backslash; this is because it needs to be escaped for definition within the string, and then again when the regular expressed is parsed.

This can cause complexities when looking for filenames that use the backslash (for example, Windows filename \Windows\tmp\myfile.txt). The following regular expression will not work as expected:

logscale Syntax
regex("\\(?<file_name>[^\\]+$)")

The regular expression is trying to identify all the text between the \ character. However, because we are submitting a string to the regex(), the regular exprssion will be expanded to:

logscale Syntax
\(?<file_name>[^\]+$)

Because the backslash is only escaped once the expression will fail. Instead, escape the backslash twice:

logscale
regex("\\\\(?<file_name>[^\\\\]+$)")

Two alternatives exist to avoid this:

  • Use the ASCII character code (\x5c) to specify the backslash:

    logscale
    regex("\x5c\x5c(?<file_name>[^\x5c\x5c]+$)")
  • Use the /regex which is only parsed once and so only needs to be escaped once:

    logscale Syntax
    /\\(?<file_name>[^\\]+$)

Comparing regex() and /regex/ Syntax

The operation of regex() and /regex/ are summarized in the table below:

Operation regex() /regex/
Default search @rawstring All defined or parsed fields and @rawstring (not tags, @id or timestamp fields)
Specific field search Using field parameter Using field = /regex/

Note that:

  • foo = /regex/ and regex("regex", field=foo) are equivalent; the latter has the benefit that more parameters can be used to refine the search. Specifically, it allows for specifying strict=false. The former has the benefit that the regular expression is not written as a string and therefore there are elements that don't need escaping.

  • /regex/ specifies free-text search which searches all fields. Wehn used in a query it searches exactly the fields as they were in the original event, and it works only before the first aggregator.

The difference in search scope between the two regex syntax operations introduces a significant performance difference between the two. Using regex() searches only the specified field (@rawstring by default) and can be significantly more performant than the /regex/ syntax depending on the number of fields in the dataset.

Using g in flags

When performing queries, the g option — used for global, as in repeating — is allowed in a query, but is not an acceptable option for the flags parameter. To use one of the parameters for multiple matches, you should instead set the repeat parameter to true.

For more information, see Global (Repeating) Matches.

regex() Syntax Examples

Extract the domain name of the http referrer field. Often this field contains a full url, so we can have many different URLs from the same site. In this case we want to count all referrals from the same domain. This will add a field named refdomain to events matching the regular expression.

logscale
regex("https?://(www.)?(?<refdomain>.+?)(/
| $)", field=referrer)
| groupBy(refdomain, function=count())
| sort(field=_count, type=number, reverse=true)

Extract the user id from the url field. New fields are stored in a field named userid.

logscale
regex(regex="/user/(?<userid>\\S+)/pay", field=url)

Show how to escape " in the regular expression. This is necessary because the regular expression is itself in quotes. Extract the user and message from events like: Peter: "hello" and Bob: "good morning".

logscale Syntax
regex("(?<name>\\S+): \"(?<msg>\\S+)\"")

Note

There are no default flags for a regular expression. For example:

logscale Syntax
@rawstring=/expression/

Is syntactically equivalent to:

logscale Syntax
regex("expression")

Or:

logscale Syntax
regex("expression", flags="")

When using flags:

logscale Syntax
@rawstring=/expression/m

Is syntactically equivalent to:

logscale Syntax
regex("expression", flags="m")

regex() Examples

Click + next to an example below to get the full details.

Extract the Top Most Viewed Pages of a Website

Query
logscale
regex(regex="/.*/(?<url_page>\S+\.page)", field=url)
| top(url_page, limit=12, rest=others)
Introduction

Your LogScale repository is ingesting log entries from a web server for a photography site. On this site there are several articles about photography. The URL for articles on this site ends with the extension, .page instead of .html.

You want to extract the page users viewed and then list the top most viewed pages.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    regex(regex="/.*/(?<url_page>\S+\.page)", field=url)

    Extracts the page viewed by users by returning the name of the file from the url field and storing that result in a field labeled, url_page.

  3. logscale
    | top(url_page, limit=12, rest=others)

    Lists the top most viewed pages. The first parameter given is that url_page field coming from the first line of the query. The second parameter is to limit the results to the top twelve — instead of the default limit of ten. Because we're curious of how many pages were viewed during the selected period that were not listed in the top twelve, the rest parameter is specified with the label to use.

  4. Event Result set.

Summary and Results

The table displays the matches from the most viewed pages during the selected period to the least — limited to the top twelve.

url_page_count
home.page51
index.page21
home-studio.page10
a-better-digital-camera.page7
is-film-better.page6
leica-q-customized.page6
student-kit.page4
focusing-screens.page4
changing-images-identity.page2
others27

Filter Out Based on a Non-Matching Regular Expression (Function Format)

Query
logscale
responsesize > 2000
| not regex("/falcon-logscale-.*/",field=url)
Introduction

This example searches weblog data looking for large log entries that are larger than a specified size but not in a specific directory.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    responsesize > 2000

    Fine

  3. logscale
    | not regex("/falcon-logscale-.*/",field=url)

    Negates the regular expression match, here filtering out any filename that contains the prefix falcon-logscale, but returning all other matching URLs.

  4. Event Result set.

Summary and Results

For example, given the following events:

@timestamp#repo#type@id@ingesttimestamp@rawstring@timestamp.nanos@timezoneclienthttpversionmethodresponsesizestatuscodeurluserid
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6401_17199827432024-07-03T04:59:41192.168.1.240 - - [03/07/2024:04:59:03 +0000] "GET /js/htmllinkhelp.js HTTP/1.1" 200 230Z192.168.1.240HTTP/1.1GET23200/js/htmllinkhelp.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6400_17199827432024-07-03T04:59:41192.168.1.24 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis-1.100/css-images/external-link.svg HTTP/1.1" 200 10720Z192.168.1.24HTTP/1.1GET1072200/data-analysis-1.100/css-images/external-link.svg-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6399_17199827432024-07-03T04:59:41192.168.1.209 - - [03/07/2024:04:59:03 +0000] "GET /js/htmllinkhelp.js HTTP/1.1" 304 -0Z192.168.1.209HTTP/1.1GET-304/js/htmllinkhelp.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6398_17199827432024-07-03T04:59:41192.168.1.39 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis/js/java.min.js HTTP/1.1" 304 -0Z192.168.1.39HTTP/1.1GET-304/data-analysis/js/java.min.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6397_17199827432024-07-03T04:59:41192.168.1.62 - - [03/07/2024:04:59:03 +0000] "GET /falcon-logscale-cloud/js/php.min.js HTTP/1.1" 200 63970Z192.168.1.62HTTP/1.1GET6397200/falcon-logscale-cloud/js/php.min.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6396_17199827432024-07-03T04:59:41192.168.1.206 - - [03/07/2024:04:59:03 +0000] "GET /integrations/js/theme.js HTTP/1.1" 200 148450Z192.168.1.206HTTP/1.1GET14845200/integrations/js/theme.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6395_17199827432024-07-03T04:59:41192.168.1.1 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis/js/json.min.js HTTP/1.1" 200 4960Z192.168.1.1HTTP/1.1GET496200/data-analysis/js/json.min.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_0_6394_17199827432024-07-03T04:59:41192.168.1.252 - - [03/07/2024:04:59:03 +0000] "GET /falcon-logscale-cloud/js/java.min.js HTTP/1.1" 200 27390Z192.168.1.252HTTP/1.1GET2739200/falcon-logscale-cloud/js/java.min.js-

Might return the following values:

@timestamp#repo#type@id@ingesttimestamp@rawstring@timestamp.nanos@timezoneclienthttpversionmethodresponsesizestatuscodeurluserid
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_2_6541_17199827432024-07-03T05:03:48192.168.1.231 - - [03/07/2024:04:59:03 +0000] "GET /logscale-repo-schema/js/corp.js HTTP/1.1" 200 186450Z192.168.1.231HTTP/1.1GET18645200/logscale-repo-schema/js/corp.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_2_6538_17199827432024-07-03T05:03:48192.168.1.69 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis-1.100/images/dashboards.png HTTP/1.1" 200 1525900Z192.168.1.69HTTP/1.1GET152590200/data-analysis-1.100/images/dashboards.png-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_2_6535_17199827432024-07-03T05:03:47192.168.1.154 - - [03/07/2024:04:59:03 +0000] "GET /integrations/js/theme.js HTTP/1.1" 200 148450Z192.168.1.154HTTP/1.1GET14845200/integrations/js/theme.js-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_2_6534_17199827432024-07-03T05:03:47192.168.1.58 - - [03/07/2024:04:59:03 +0000] "GET /integrations/images/extrahop.png HTTP/1.1" 200 102610Z192.168.1.58HTTP/1.1GET10261200/integrations/images/extrahop.png-
2024-07-03T04:59:03weblogshttpsimpMqHKxw2QoBPZyNqbJRRs4ECC_2_6527_17199827432024-07-03T05:03:47192.168.1.164 - - [03/07/2024:04:59:03 +0000] "GET /integrations/images/zeek.png HTTP/1.1" 200 43920Z192.168.1.164HTTP/1.1GET4392200/integrations/images/zeek.png-

Filter Out Based on a Non-Matching Regular Expression (Syntax)

Query
logscale
method != /(PUT
| POST)/
Introduction

This example searches weblog data looking for events where the method does not match a specified value.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    method != /(PUT
    | POST)/

    This line performs a negative regular expression match, returning only the events where the method does not match either PUT or POST.

  3. Event Result set.

Summary and Results

This format of the query can be a simple way to perform a negative regular expression match, or more specifically, returning a list of the events that do not match the given regular expression.

Search for Command Line String

Search for command line string after / and before @ using a regular expression

Query
logscale
#event_simpleName=ProcessRollup2
| CommandLine=/@/
| CommandLine=/\/.*@/
Introduction

A regular expression can be used to run a query that looks for command line strings containing any characters after / and before @. It is important to perform as much filtering as possible to not exceed resource limits.

In this example, a regular expression is used to filter and search for specific process events in the CrowdStrike Falcon platform. Note that the query filters on the @ alone first to perform as much filtering as possible.

Step-by-Step
  1. Starting with the source repository events.

  2. logscale
    #event_simpleName=ProcessRollup2

    Filters for events of the type ProcessRollup2 in the #event_simpleName field.

  3. logscale
    | CommandLine=/@/

    Filters for any command line containing the @ symbol.

  4. logscale
    | CommandLine=/\/.*@/

    Uses a regular expression to search the returned results for command lines that contain a forward slash (/) followed by any number of characters, and then a @ symbol.

  5. Event Result set.

Summary and Results

The query is used to search for command line strings that contain any characters after / and before @. The query could, for example, be used to help security analysts identify potentially suspicious processes that might be interacting with email addresses or using email-like syntax in their command lines.