regex()

`regex()`

The regex() works both as a filter and can extract new fields using a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events.

Parameter	Type	Required	Default Value	Description
field	string	optional^[a]	`@rawstring`	Specifies the field to run the regular expression against.
flags	string	optional^[a]	`m`	Specifies regex modifier flags.
			Values
			`F`	Use the LogScale Regex Engine v2
			`d`	Period (.) also includes newline characters
			`i`	Ignore case for matched values
			`m`	Multi-line parsing of regular expressions
limit	integer	optional^[a]	`100`	Defines the maximum number of events to produce. A warning is produced if this limit is exceeded, unless the parameter is specified explicitly.
regex^[b]	string	required		Specifies a regular expression. The regular expression can contain one or more named capturing groups. Fields with the names of the groups will be added to the events.
repeat	boolean	optional^[a]	`false`	If set to true, multiple matches yields multiple events.
			Values
			`false`	Match at most one event
			`true`	Match multiple events
strict	boolean	optional^[a]	`true`	Specifies if events not matching the regular expression should be filtered out of the result set.
			Values
			`false`	Events not matching the regular expression are not filtered out then the regex matches.
			`true`	Events not matching the regular expression are filtered out of the result set.
^[a]Optional parameters use their default value unless explicitly set. ^[b]The parameter name `regex` can be omitted.

Hide omitted argument names for this function

Show omitted argument names for this function

Hide negatable operation for this function

Show negatable operation for this function

Regular expressions in LogScale allow you search (filter) and extract information and are a very common part of the LogScale language and syntax.

LogScale uses JitRex which closely follows — but does not entirely replicate — the syntax of RE2J regular expressions, which is very close to Java's regular expressions. See Regular Expression Syntax for more information.

Note

To ensure compatibility, it is recommended to always test your regular expressions inside LogScale, instead of a 3rd party regex tool.

Escaping Characters

Care needs to be taken when escaping characters in the regular expression submitted to the regex() function. The functions uses the \ backslash character to indicate when an individual character needs to be escaped, which is used in many common situations to indicate the original character. This works for all characters except the backslash itself. Within regex() you must double-escape the backslash; this is because it needs to be escaped for definition within the string, and then again when the regular expressed is parsed.

This can cause complexities when looking for filenames that use the backslash (for example, Windows filename \Windows\tmp\myfile.txt). The following regular expression will not work as expected:

logscale Syntax

regex("\\(?<file_name>[^\\]+$)")

The regular expression is trying to identify all the text between the \ character. However, because we are submitting a string to the regex(), the regular exprssion will be expanded to:

logscale Syntax

\(?<file_name>[^\]+$)

Because the backslash is only escaped once the expression will fail. Instead, escape the backslash twice:

logscale

regex("\\\\(?<file_name>[^\\\\]+$)")

Two alternatives exist to avoid this:

Use the ASCII character code (\x5c) to specify the backslash:
logscale
```
regex("\x5c\x5c(?<file_name>[^\x5c\x5c]+$)")
```
Use the /regex which is only parsed once and so only needs to be escaped once:
logscale Syntax
```
/\\(?<file_name>[^\\]+$)
```

Comparing `regex()` and `/regex/` Syntax

The operation of regex() and /regex/ are summarized in the table below:

Operation	`regex()`	`/regex/`
Default search	@rawstring	All defined or parsed fields and @rawstring (not tags, @id or timestamp fields)
Specific field search	Using `field` parameter	Using `field = /regex/`

Note that:

foo = /regex/ and regex("regex", field=foo) are equivalent; the latter has the benefit that more parameters can be used to refine the search. Specifically, it allows for specifying strict=false. The former has the benefit that the regular expression is not written as a string and therefore there are elements that don't need escaping.
/regex/ specifies free-text search which searches all fields. Wehn used in a query it searches exactly the fields as they were in the original event, and it works only before the first aggregator.

The difference in search scope between the two regex syntax operations introduces a significant performance difference between the two. Using regex() searches only the specified field (@rawstring by default) and can be significantly more performant than the /regex/ syntax depending on the number of fields in the dataset.

Using `g` in `flags`

When performing queries, the g option — used for global, as in repeating — is allowed in a query, but is not an acceptable option for the flags parameter. To use one of the parameters for multiple matches, you should instead set the repeat parameter to true.

For more information, see Global (Repeating) Matches.

`regex()` Syntax Examples

Extract the domain name of the http referrer field. Often this field contains a full url, so we can have many different URLs from the same site. In this case we want to count all referrals from the same domain. This will add a field named refdomain to events matching the regular expression.

logscale

regex("https?://(www.)?(?<refdomain>.+?)(/
| $)", field=referrer)
| groupBy(refdomain, function=count())
| sort(field=_count, type=number, reverse=true)

Extract the user id from the url field. New fields are stored in a field named userid.

logscale

regex(regex="/user/(?<userid>\\S+)/pay", field=url)

Show how to escape " in the regular expression. This is necessary because the regular expression is itself in quotes. Extract the user and message from events like: Peter: "hello" and Bob: "good morning".

logscale Syntax

regex("(?<name>\\S+): \"(?<msg>\\S+)\"")

Note

There are no default flags for a regular expression. For example:

logscale Syntax

@rawstring=/expression/

Is syntactically equivalent to:

logscale Syntax

regex("expression")

Or:

logscale Syntax

regex("expression", flags="")

When using flags:

logscale Syntax

@rawstring=/expression/m

Is syntactically equivalent to:

logscale Syntax

regex("expression", flags="m")

`regex()` Examples

Click + next to an example below to get the full details.

Extract the Top Most Viewed Pages of a Website

url_page	_count
home.page	51
index.page	21
home-studio.page	10
a-better-digital-camera.page	7
is-film-better.page	6
leica-q-customized.page	6
student-kit.page	4
focusing-screens.page	4
changing-images-identity.page	2
others	27

Filter Out Based on a Non-Matching Regular Expression (Function Format)

@timestamp	#repo	#type	@id	@ingesttimestamp	@rawstring	@timezone	client	httpversion	method	responsesize	statuscode	url	userid
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6401_1719982743	2024-07-03T04:59:41	192.168.1.240 - - [03/07/2024:04:59:03 +0000] "GET /js/htmllinkhelp.js HTTP/1.1" 200 23	Z	192.168.1.240	HTTP/1.1	GET	23	200	/js/htmllinkhelp.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6400_1719982743	2024-07-03T04:59:41	192.168.1.24 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis-1.100/css-images/external-link.svg HTTP/1.1" 200 1072	Z	192.168.1.24	HTTP/1.1	GET	1072	200	/data-analysis-1.100/css-images/external-link.svg	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6399_1719982743	2024-07-03T04:59:41	192.168.1.209 - - [03/07/2024:04:59:03 +0000] "GET /js/htmllinkhelp.js HTTP/1.1" 304 -	Z	192.168.1.209	HTTP/1.1	GET	-	304	/js/htmllinkhelp.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6398_1719982743	2024-07-03T04:59:41	192.168.1.39 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis/js/java.min.js HTTP/1.1" 304 -	Z	192.168.1.39	HTTP/1.1	GET	-	304	/data-analysis/js/java.min.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6397_1719982743	2024-07-03T04:59:41	192.168.1.62 - - [03/07/2024:04:59:03 +0000] "GET /falcon-logscale-cloud/js/php.min.js HTTP/1.1" 200 6397	Z	192.168.1.62	HTTP/1.1	GET	6397	200	/falcon-logscale-cloud/js/php.min.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6396_1719982743	2024-07-03T04:59:41	192.168.1.206 - - [03/07/2024:04:59:03 +0000] "GET /integrations/js/theme.js HTTP/1.1" 200 14845	Z	192.168.1.206	HTTP/1.1	GET	14845	200	/integrations/js/theme.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6395_1719982743	2024-07-03T04:59:41	192.168.1.1 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis/js/json.min.js HTTP/1.1" 200 496	Z	192.168.1.1	HTTP/1.1	GET	496	200	/data-analysis/js/json.min.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_0_6394_1719982743	2024-07-03T04:59:41	192.168.1.252 - - [03/07/2024:04:59:03 +0000] "GET /falcon-logscale-cloud/js/java.min.js HTTP/1.1" 200 2739	Z	192.168.1.252	HTTP/1.1	GET	2739	200	/falcon-logscale-cloud/js/java.min.js	-

@timestamp	#repo	#type	@id	@ingesttimestamp	@rawstring	@timezone	client	httpversion	method	responsesize	statuscode	url	userid
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_2_6541_1719982743	2024-07-03T05:03:48	192.168.1.231 - - [03/07/2024:04:59:03 +0000] "GET /logscale-repo-schema/js/corp.js HTTP/1.1" 200 18645	Z	192.168.1.231	HTTP/1.1	GET	18645	200	/logscale-repo-schema/js/corp.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_2_6538_1719982743	2024-07-03T05:03:48	192.168.1.69 - - [03/07/2024:04:59:03 +0000] "GET /data-analysis-1.100/images/dashboards.png HTTP/1.1" 200 152590	Z	192.168.1.69	HTTP/1.1	GET	152590	200	/data-analysis-1.100/images/dashboards.png	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_2_6535_1719982743	2024-07-03T05:03:47	192.168.1.154 - - [03/07/2024:04:59:03 +0000] "GET /integrations/js/theme.js HTTP/1.1" 200 14845	Z	192.168.1.154	HTTP/1.1	GET	14845	200	/integrations/js/theme.js	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_2_6534_1719982743	2024-07-03T05:03:47	192.168.1.58 - - [03/07/2024:04:59:03 +0000] "GET /integrations/images/extrahop.png HTTP/1.1" 200 10261	Z	192.168.1.58	HTTP/1.1	GET	10261	200	/integrations/images/extrahop.png	-
2024-07-03T04:59:03	weblogs	httpsimp	MqHKxw2QoBPZyNqbJRRs4ECC_2_6527_1719982743	2024-07-03T05:03:47	192.168.1.164 - - [03/07/2024:04:59:03 +0000] "GET /integrations/images/zeek.png HTTP/1.1" 200 4392	Z	192.168.1.164	HTTP/1.1	GET	4392	200	/integrations/images/zeek.png	-

Filter Out Based on a Non-Matching Regular Expression (Syntax)

Search for Command Line String

Search for command line string after / and before @ using a regular expression

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Query Language Syntax

Query Joins and Lookups

Query Functions

Dashboards & Widgets

Automation

Template Language

Keyboard Shortcuts

Note

Escaping Characters

Comparing regex() and /regex/ Syntax

Using g in flags

regex() Syntax Examples

Note

regex() Examples

Extract the Top Most Viewed Pages of a Website

Query

Introduction

Step-by-Step

Summary and Results

Filter Out Based on a Non-Matching Regular Expression (Function Format)

Query

Introduction

Step-by-Step

Summary and Results

Filter Out Based on a Non-Matching Regular Expression (Syntax)

Query

Introduction

Step-by-Step

Summary and Results

Search for Command Line String

Query

Introduction

Step-by-Step

Summary and Results

Enter search term

`regex()`

Comparing `regex()` and `/regex/` Syntax

Using `g` in `flags`

`regex()` Syntax Examples

`regex()` Examples