Regular Expression Syntax Patterns
The following tables detail the supported functionality within LogScale compared to the standard JitRex or RE2J implementations. The tables detail each syntax and whether it's supported by LogScale.
The following table lists the single characters supported:
Table: Supported Characters
Syntax | Description | Supported |
---|---|---|
x
| The character x | Yes |
\\
| The backslash character | Yes |
\0n
| The character with octal value 0n (0 <= n <= 7) | Yes |
\0nn
| The character with octal value 0nn (0 <= n <= 7) | Yes |
\0mnn
| The character with octal value 0nn (0 <= n <= 7) | Yes |
\xhh
| The character with hexadecimal value 0xhh | Yes |
\uhhhh
| The character with hexadecimal value 0xhhhh | Yes |
\u{hh..hh}
| No | No |
\N{
|
The unicode character named name.
| No |
\t
| The tab character | Yes |
\n
| The newline character | Yes |
\r
| The carriage-return character | Yes |
\f
| The form-feed character | Yes |
\a
| The alert (bell) character | Yes |
\e
| The escape character | No |
\cK
| The control character ^K | Yes |
\C
| A single byte even in UTF-8 mode | No |
The following table lists the different character classes and ranges supported when needing to match multiple characters.
Table: Character Classes
Syntax | Description | Supported |
---|---|---|
[abc]
| a, b, or c (simple class) | Yes |
[^abc]
| Not a, b, or c (negated class) | Yes |
[a-zA-Z]
| a through z or A through Z, inclusive (range) | Yes |
[a-d[m-p]]
| a through d or m through p (union) | No |
[[:
| Named ASCII class inside character class | No |
Pre-defined character classes cover multi-character groups such as whitespace, words or non-letter/digit characters.
Table: Predefined character classes
Syntax | Description | Supported |
---|---|---|
.
| Any character except newline (unless given flag DOT_ALL) | Yes |
\d
| A digit: [0-9] | Yes |
\D
| A non-digit: [^0-9] | Yes |
\h
| A horizontal whitespace character | No |
\H
| A non-horizontal whitespace character: [^\h] | No |
\s
| A whitespace character | Yes |
\S
| A non-whitespace character | Yes |
\v
| A vertical whitespace character | No |
\V
| A non-vertical whitespace character | No |
\w
|
A word character: [a-zA-Z_0-9]
| Yes |
\W
|
A non-word character (inverse of above, equivalent to
[^a-zA-Z0-0] )
| Yes |
Posix character classes are not supported in LogScale's regex implementation.
Table: Posix character classes
Syntax | Description | Supported |
---|---|---|
\p{Lower}
| A lowercase alphabetic character | No |
\p{Upper}
| An uppercase alphabetic character | No |
\p{ASCII}
| All ASCII | No |
\p{Alpha}
| An alphabetic character | No |
\p{Digit}
| A decimal digit | No |
\p{Alnum}
| An alphanumeric character | No |
\p{Punct}
| A punctuation character | No |
\p{Graph}
| A visible character | No |
\p{Print}
| A printable character | No |
\p{Blank}
| A space or tab | No |
\p{Xdigit}
| A hexadecimal digit | No |
\p{Space}
| A whitespace character | No |
Unicode character classes are not supported in LogScale's regex implementation.
Table: Classes for Unicode scripts, blocks, categories and binary properties
Syntax | Description | Supported |
---|---|---|
\p{IsLatin}
| A Latin script character | No |
\p{InGreek}
| A character in the Greek block | No |
\p{Lu}
| An uppercase letter | No |
\p{IsAlphabetic}
| An alphabetic character (binary property) | No |
\p{Sc}
| A currency symbol | No |
\P{InGreek}
| Any character except one in the Greek block (negation) | No |
[\p{L}&&[^\p{Lu}]]
| Any letter except an uppercase letter (subtraction) | No |
The following table lists the supported boundary matchers, such as word or line boundaries.
Table: Boundary matchers
Syntax | Description | Supported |
---|---|---|
^
| The beginning of a line | Yes |
$
| The end of a line | Yes |
\b
| A word boundary | Yes |
\b{g}
| A Unicode extended grapheme cluster boundary | Yes |
\B
| A non-word boundary | Yes |
\A
| The beginning of the input | Yes |
\G
| The end of the previous match | No |
\Z
| The end of the input but for the final terminator, if any | Yes |
\z
| The end of the input | Yes |
The following classes cover extended Unicode classes.
Table: Other matchers
Syntax | Description | Supported |
---|---|---|
\R
| Any Unicode linebreak sequence | No |
\X
| Any Unicode extended grapheme cluster | No |
Quantifiers provide numeric validation to a given character or character class.
Table: Quantifiers
Syntax | Description | Supported |
---|---|---|
X?
| X, once or not at all | Yes |
X*
| X, zero or more times | Yes |
X+
| X, one or more times | Yes |
X{n}
| X, exactly n times | Yes |
X{n,}
| X, at least n times | Yes |
X{n,m}
| X, at least n times but not more than m times | Yes |
X??
| X, once or not at all, prefer less | Yes |
X*?
| X, zero or more times, prefer less | Yes |
X+?
| Not supported | Yes |
X{n}?
| X, exactly n times | Yes |
X{n,}?
| X, at least n times, prefer less | Yes |
X{n,m}?
| X, at least n times but not more than m times, prefer less | Yes |
X?+
| X, once or not at all, possessive | No |
X*+
| X, zero or more times, possessive | No |
X++
| X, one or more times, possessive | No |
X{n}+
| X, exactly n times, possessive | Yes |
X{n,}+
| X, at least n times, possessive | Yes |
X{n,m}+
| X, at least n times but not more than m times, possessive | Yes |
The following table lists logical matching operators.
Table: Logical operators
Syntax | Description | Supported |
---|---|---|
XY
| X followed by Y | Yes |
X|Y
| Either X or Y | Yes |
Groups and backreferences support special matching rules for adding further explicit qualification within the regular expression.
Table: Groups and backreferences
Syntax | Description | Supported |
---|---|---|
(X)
| X, as a numbered capturing group | Yes |
(?<name>X)
| X, as a named capturing group |
Yes. Note that LogScale supports a broader set of names than
is usual (e.g. containing |
(?P<name>X)
| X, as a named capturing group | Yes |
\n
|
Numbered backreference: Whatever the nth capturing group matched
(0 <= n < infinity), starting at 1
| Yes |
\k<
|
Named backreference. Whatever the named-capturing group name matched. | No |
(?:X)
| X, as a non-capturing group | Yes |
(?flags)
| Sets flags in the group | Yes, (see Regular Expression Flags) |
(?flags:X)
| Sets flags in X | Yes, but uses different flags to LogScale (see Regular Expression Flags) |
(?=X)
| X, as a zero-width positive lookahead | Yes |
(?!X)
| X, as a zero-width negative lookahead | Yes |
(?<=X)
| X, as a zero-width positive lookbehind | No |
(?<!X)
| X, as a zero-width negative lookbehind | No |
(?>X)
| X, as an independent, non-capturing (atomic) group. | No |
Methods for quoting special characters and regex characters within a regular expression.
Table: Quotation
Syntax | Description | Supported |
---|---|---|
\
| Quotes the following character for certain characters (i.e. regex meta-characters). | Yes |
\Q
| Quotes all characters until \E | Yes |
\E
| Ends quoting started by \Q | Yes |