Regular Expression Engine V2 Syntax Patterns

Available: Regular Expression Engine V2 v1.227

Regular expression engine V2 is the default regex engine from 1.227.

The following tables detail the supported functionality within LogScale using the LogScale Regular Expression Engine V2

The following syntax constructions match a specific character.

Table: Single Character Constructs

Syntax Description
x The literal character x
\ The backslash character
\xnn The character with hexadecimal value nn, where 0 <= n <= F. E.g. \x21 matches the ! character.
\x{nnnn} The character with hexadecimal value nnnn, where 0 <= n <= F. Support for up to four digits for use with Unicode characters. E.g. \x{123} matches the ģ character.
\unnnn The character with hexadecimal value nnnn, where 0 <= n <= F. E.g. \u0023 matches the # character.
\a The alert/bell character (BEL) with hexadecimal value 08.
\t The horizontal tab character (HT) with hexadecimal value 09.
\n The newline character (LF) with hexadecimal value 0A.
\f The form feed character (FF) with hexadecimal value 0C.
\r The carriage return character (CR) with hexadecimal 0D.
\e The escape character (ESC) with hexadecimal value 1B.
\cX The control character ^X. E.g. \cH matches the backspace character (BS).

The following syntax constructions match a single character from a set of possible characters.

Table: Character Class Constructs

Syntax Description
[abc] Matches either a, b, or c.
[^abc] Matches any character that is not a, b, or c (negated class)
[a-z] Matches any character in the range from a through z

There are several predefined character classes available.

Table: Predefined Character Classes

Syntax Description
. Matches any character except newline (unless the single-line flag is given).
\d Matches digit character 0 through 9. Equivalent to [0-9].
\D Matches any character that is not a digit character. Equivalent to [^0-9]
\w Matches an ASCII word character. Equivalent to [a-zA-Z0-9_].
\W Matches any character that is not an ASCII word character. Equivalent to [^\w].
\h Matches a horizontal whitespace character. Equivalent to [\t\x20\xA0\u180e\u2000-\u200a\u202f\u205f\u3000]
\H Matches any character that is not a horizontal whitespace character. Equivalent to [^\h]
\v Matches a vertical whitespace character. Equivalent to [\u000a-\u000d\u0085\u2028\u2029]
\V Matches any character that is not a vertical whitespace character. Equivalent to [^\v]
\s Matches any whitespace character, as defined by the Unicode White_Space general category. Equivalent to [\h\v].
\S Matches any non-whitespace character. Equivalent to [^\s].
\p{X} Matches a character in the Unicode General Category abbreviated X. Supported categories are Letters (L), Symbols (S), Punctuation (P), and Control Characters (Cc). Case-insensitivity is not supported for unicode general category matches.
\P{X} Matches any character that is not in the Unicode General Category abbreviated X. Case-insensitivity is not supported for unicode general category matches.

There are two primitive operations in LogScale's regex syntax. These are used to express more complicated patterns than single character matches.

Table: Primitive Operations

Syntax Description
XY Concatenation. Joins the regex patterns X and Y end-to-end. E.g. ab matches a followed by b.
X|Y Alternation. Matches either regex pattern X or pattern Y. E.g. ab|cd matches either ab or cd.

Anchors and boundary syntax constructions match boundaries of text in-place and not characters.

Table: Anchors / Boundaries

Syntax Description
^ When used outside of character classes, ^ matches the beginning of a line. See the m flag for what consitutes a line.
$ Matches the end of a line. See the m flag for what constitutes a line.
\b Matches an ASCII word (\w) boundary in-place. For instance, \bKingdom matches Kingdom only if Kingdom is preceded by a character that is not in \w. For example, this regex pattern it matches Kingdom in the text The Feathered Kingdom, but not Kingdom in the text 007Kingdom.
\B Matches a non-ASCII word boundary. Explicitly, \Bher matches her only if her is preceded by a character that is in \w. For example \Bher matches her in dispatcher.
\A Matches the start of the input. E.g. \AKingdom matches Kingdom in the text Kingdom Come.
\Z Matches the end of the input except for the final terminator if one exists. For example, Kingdom\Z matches Kingdom in the text The Feathered Kingdom\n.
\z Matches the end of the input. Like \Z but it does not match if a final terminator exists. E.g. Kingdom\z matches Kingdom in the text The Feathered Kingdom, but not in the text The Feathered Kingdom\n.

Quantifiers allow for matching the preceding pattern a number of times. Quantifiers fall into three categories; greedy, non-greedy, and possesive:

  • Greedy quantifiers try to match the given pattern as many times as possible.

  • Non-greedy quantifiers try to match as few times as possible.

  • Possesive quantifiers try to match as many times as possible, but upon finding the longest possible match, do not try shorter matches if the rest of the regex does not match.

Table: Quantifiers

Syntax Description Category
X? Makes X optional. Greedy, so it will prefer a match containing X. Greedy
X* Matches X zero or more times. Greedy, so it will prefer the match with the most repetitions of X. Greedy
X+ Matches X one or more times. Greedy
X{n} Matches X exactly n times, where n is a number between 0 and 14748364.  
X{n,} Matches X n or more times, where n is a number between 0 and 14748364. Greedy
X{n,m} Matches X at least n times and at most m times, where 0 <= n,m <= 14748364. Greedy
X?? Makes X option, Non-greedy, so it will prefer a match that does not contain X. Non-greedy
X*? Matches X zero or more times. Non-greedy, so it will prefer the match with the least repetition of X. Non-greedy
X+? Matches X one or more times. Non-greedy
X{n}? Matches X exactly n times.  
X{n,}? Matches X at least n times. Non-greedy
X{n,m}? Matches X at least n times and at most m times. Non-greedy
X?+ Makes X optional. Possesive, so it will prefer a match containing X, but if X matched, and the rest of the regex after X?+ did not match, it will not try again without X. Possessive
X*+ Matches X zero or more times. Possessive, so it will prefer the match with the most repetitions of X, and will not try the rest of the regex on any other Possessive
X++ Matches X one or more times. Possessive
X{n}+ Matches X exactly n times.  
X{n,}+ Matches X at least n times. Possessive
X{n,m}+ Matches X at least n times and at most m times. Possessive

Groups and backreferences allow you to treat a given pattern as one unit, and allows you to apply operators to the entire grouped pattern. They also allow you to capture the text matched by the regex inside the group, and to control the behaviour of the regex engine when matching the grouped pattern. Special groups also allow for more advance behaviour, such as lookarounds.

Table: Groups and Backreferences

Syntax Description
(X) A numbered capture group of X. Parentheses group the regex X between them, and capture the text that the pattern XY matches. It also allows you to repeat the entire group, e.g. (abc){3} matches "abcabcabc". The group captures the final occurrence of abc. Numbered capture groups are numbered from left-to-right by their opening parentheses.
(?X) A named capture group of X. Named capture groups are also numbered. Named capture groups perform field extraction in LogScale.
(?PX) A named capture group of X.
(?:X) A non-capturing group of X.
(?flags:X) Sets regex flags flags for the group (non-capturing). Supported flags are i, m, and s.
(?flags)X Sets the regex flags flags for X. Applies across concatenations and alternation branches, but does not escape groups.
(?=X) Zero-width positive lookahead for X. E.g. abc(?=d) matches abc only if it is followed by d.
(?!X) Zero-width negative lookahead for X. E.g. abc(?!d) matches abc only if it is not followed by d.
(?<=X) Zero-width positive lookbehind for X. E.g. (?<=d)abc matches abc only if it is preceeded by d.
(?<!X) Zero-width positive lookbehind for X. E.g. (?<!d)abc matches abc only if it is not preceeded by d.
(?>X) Atomic group for X. Atomic groups prevents the regex engine from backtracking into the group after a match has been found for the group. The engine can backtrack over the group or to something pprior to the atomic group, but it cannot backtrack into the group and try other permutations.
\n Backreference. Matches what was captured by the nth group, where 1 <= n <= number of groups. For example, (abc)\1 matches abcabc.

Certain constructions allow you to quote regex meta-characters.

Table: Quotation

Syntax Description
\X Quoutes X, where X is a regex meta-character. E.g. (a) matches (a) and is not a capturing group over a.
\Q Quoutes all character succeeding it until reaching \E.
\E Ends quotation started with \Q.