Regular Expression Engine V2 Syntax Patterns
Available: Regular Expression Engine V2 v1.227
Regular expression engine V2 is the default regex engine from 1.227.
The following tables detail the supported functionality within LogScale using the LogScale Regular Expression Engine V2
The following syntax constructions match a specific character.
Table: Single Character Constructs
| Syntax | Description |
|---|---|
x
|
The literal character x
|
\
| The backslash character |
\xnn
|
The character with hexadecimal value
nn, where 0
<= n <= F. E.g. \x21
matches the ! character.
|
\x{nnnn}
|
The character with hexadecimal value
nnnn, where
0 <= n <= F. Support for up to
four digits for use with Unicode characters. E.g.
\x{123} matches the
ģ character.
|
\unnnn
|
The character with hexadecimal value
nnnn, where
0 <= n <= F. E.g.
\u0023 matches the
# character.
|
\a
|
The alert/bell character (BEL) with hexadecimal value
08.
|
\t
|
The horizontal tab character (HT) with hexadecimal value
09.
|
\n
|
The newline character (LF) with hexadecimal value
0A.
|
\f
|
The form feed character (FF) with hexadecimal value
0C.
|
\r
|
The carriage return character (CR) with hexadecimal
0D.
|
\e
|
The escape character (ESC) with hexadecimal value
1B.
|
\cX
|
The control character ^X. E.g.
\cH matches the backspace character
(BS).
|
The following syntax constructions match a single character from a set of possible characters.
Table: Character Class Constructs
| Syntax | Description |
|---|---|
[abc]
|
Matches either a,
b, or
c.
|
[^abc]
|
Matches any character that is not a,
b, or
c (negated class)
|
[a-z]
|
Matches any character in the range from
a through
z
|
There are several predefined character classes available.
Table: Predefined Character Classes
| Syntax | Description |
|---|---|
.
| Matches any character except newline (unless the single-line flag is given). |
\d
|
Matches digit character 0 through 9. Equivalent to
[0-9].
|
\D
|
Matches any character that is not a digit character. Equivalent to
[^0-9]
|
\w
|
Matches an ASCII word character. Equivalent to
[a-zA-Z0-9_].
|
\W
|
Matches any character that is not an ASCII word character.
Equivalent to [^\w].
|
\h
|
Matches a horizontal whitespace character. Equivalent to
[\t\x20\xA0\u180e\u2000-\u200a\u202f\u205f\u3000]
|
\H
|
Matches any character that is not a horizontal whitespace character.
Equivalent to [^\h]
|
\v
|
Matches a vertical whitespace character. Equivalent to
[\u000a-\u000d\u0085\u2028\u2029]
|
\V
|
Matches any character that is not a vertical whitespace character.
Equivalent to [^\v]
|
\s
|
Matches any whitespace character, as defined by the Unicode
White_Space general category. Equivalent to
[\h\v].
|
\S
|
Matches any non-whitespace character. Equivalent to
[^\s].
|
\p{X}
|
Matches a character in the Unicode General Category abbreviated
X. Supported categories are Letters
(L), Symbols (S), Punctuation
(P), and Control Characters
(Cc). Case-insensitivity is
not supported for unicode general category matches.
|
\P{X}
|
Matches any character that is not in the Unicode General Category
abbreviated X.
Case-insensitivity is not supported for unicode general category
matches.
|
There are two primitive operations in LogScale's regex syntax. These are used to express more complicated patterns than single character matches.
Table: Primitive Operations
| Syntax | Description |
|---|---|
XY
|
Concatenation. Joins the regex patterns X and Y end-to-end. E.g.
ab matches
a followed by
b.
|
X|Y
|
Alternation. Matches either regex pattern X or pattern Y. E.g.
ab|cd matches either
ab or
cd.
|
Anchors and boundary syntax constructions match boundaries of text in-place and not characters.
Table: Anchors / Boundaries
| Syntax | Description |
|---|---|
^
|
When used outside of character classes,
^ matches the beginning of a line.
See the m flag for what consitutes a
line.
|
$
|
Matches the end of a line. See the m
flag for what constitutes a line.
|
\b
|
Matches an ASCII word (\w) boundary
in-place. For instance, \bKingdom
matches Kingdom only if
Kingdom is preceded by a character
that is not in \w. For example, this
regex pattern it matches Kingdom in
the text The Feathered Kingdom, but not
Kingdom in the text
007Kingdom.
|
\B
|
Matches a non-ASCII word boundary. Explicitly,
\Bher matches
her only if
her is preceded by a character that
is in \w. For example
\Bher matches
her in
dispatcher.
|
\A
|
Matches the start of the input. E.g.
\AKingdom matches
Kingdom in the text Kingdom
Come.
|
\Z
|
Matches the end of the input except for the final terminator if one
exists. For example, Kingdom\Z
matches Kingdom in the text
The Feathered Kingdom\n.
|
\z
|
Matches the end of the input. Like
\Z but it does not match if a final
terminator exists. E.g. Kingdom\z
matches Kingdom in the text
The Feathered Kingdom, but not in the text
The Feathered Kingdom\n.
|
Quantifiers allow for matching the preceding pattern a number of times. Quantifiers fall into three categories; greedy, non-greedy, and possesive:
Greedy quantifiers try to match the given pattern as many times as possible.
Non-greedy quantifiers try to match as few times as possible.
Possesive quantifiers try to match as many times as possible, but upon finding the longest possible match, do not try shorter matches if the rest of the regex does not match.
Table: Quantifiers
| Syntax | Description | Category |
|---|---|---|
X?
| Makes X optional. Greedy, so it will prefer a match containing X. | Greedy |
X*
| Matches X zero or more times. Greedy, so it will prefer the match with the most repetitions of X. | Greedy |
X+
| Matches X one or more times. | Greedy |
X{n}
|
Matches X exactly n times, where
n is a number between 0 and
14748364.
| |
X{n,}
|
Matches X n or more times, where
n is a number between 0 and
14748364.
| Greedy |
X{n,m}
|
Matches X at least n times and at
most m times, where 0 <= n,m
<= 14748364.
| Greedy |
X??
| Makes X option, Non-greedy, so it will prefer a match that does not contain X. | Non-greedy |
X*?
| Matches X zero or more times. Non-greedy, so it will prefer the match with the least repetition of X. | Non-greedy |
X+?
| Matches X one or more times. | Non-greedy |
X{n}?
|
Matches X exactly n times.
| |
X{n,}?
|
Matches X at least n times.
| Non-greedy |
X{n,m}?
|
Matches X at least n times and at
most m times.
| Non-greedy |
X?+
| Makes X optional. Possesive, so it will prefer a match containing X, but if X matched, and the rest of the regex after X?+ did not match, it will not try again without X. | Possessive |
X*+
| Matches X zero or more times. Possessive, so it will prefer the match with the most repetitions of X, and will not try the rest of the regex on any other | Possessive |
X++
| Matches X one or more times. | Possessive |
X{n}+
|
Matches X exactly n times.
| |
X{n,}+
|
Matches X at least n times.
| Possessive |
X{n,m}+
|
Matches X at least n times and at
most m times.
| Possessive |
Groups and backreferences allow you to treat a given pattern as one unit, and allows you to apply operators to the entire grouped pattern. They also allow you to capture the text matched by the regex inside the group, and to control the behaviour of the regex engine when matching the grouped pattern. Special groups also allow for more advance behaviour, such as lookarounds.
Table: Groups and Backreferences
| Syntax | Description |
|---|---|
(X)
|
A numbered capture group of X. Parentheses group the regex
X between them, and capture the text
that the pattern XY matches. It also allows you to repeat the entire
group, e.g. (abc){3} matches
"abcabcabc". The group captures the final occurrence of
abc. Numbered capture groups are
numbered from left-to-right by their opening parentheses.
|
(?X)
| A named capture group of X. Named capture groups are also numbered. Named capture groups perform field extraction in LogScale. |
(?PX)
| A named capture group of X. |
(?:X)
| A non-capturing group of X. |
(?flags:X)
|
Sets regex flags flags for the group
(non-capturing). Supported flags are
i,
m, and
s.
|
(?flags)X
|
Sets the regex flags flags for X.
Applies across concatenations and alternation branches, but does not
escape groups.
|
(?=X)
|
Zero-width positive lookahead for X. E.g.
abc(?=d) matches
abc only if it is followed by
d.
|
(?!X)
|
Zero-width negative lookahead for X. E.g.
abc(?!d) matches
abc only if it is not followed by
d.
|
(?<=X)
|
Zero-width positive lookbehind for X. E.g.
(?<=d)abc matches
abc only if it is preceeded by
d.
|
(?<!X)
|
Zero-width positive lookbehind for X. E.g.
(?<!d)abc matches
abc only if it is not preceeded by
d.
|
(?>X)
| Atomic group for X. Atomic groups prevents the regex engine from backtracking into the group after a match has been found for the group. The engine can backtrack over the group or to something pprior to the atomic group, but it cannot backtrack into the group and try other permutations. |
\n
|
Backreference. Matches what was captured by the
nth group, where
1 <= n <= number of groups.
For example, (abc)\1 matches
abcabc.
|
Certain constructions allow you to quote regex meta-characters.
Table: Quotation
| Syntax | Description |
|---|---|
\X
|
Quoutes X, where X is a regex meta-character. E.g.
(a) matches (a)
and is not a capturing group over a.
|
\Q
|
Quoutes all character succeeding it until reaching
\E.
|
\E
|
Ends quotation started with \Q.
|