Regular Expression Syntax Patterns

The following tables detail the supported functionality within LogScale compared to the standard JitRex or RE2J implementations. The tables detail each syntax and whether it's supported by LogScale.

The following table lists the single characters supported:

Table: Supported Characters

Syntax Description Supported
x The character x Yes
\\ The backslash character Yes
\0n The character with octal value 0n (0 <= n <= 7) Yes
\0nn The character with octal value 0nn (0 <= n <= 7) Yes
\0mnn The character with octal value 0nn (0 <= n <= 7) Yes
\xhh The character with hexadecimal value 0xhh Yes
\uhhhh The character with hexadecimal value 0xhhhh Yes
\u{hh..hh} No No
\N{name} The unicode character named name. No
\t The tab character Yes
\n The newline character Yes
\r The carriage-return character Yes
\f The form-feed character Yes
\a The alert (bell) character Yes
\e The escape character No
\cK The control character ^K Yes
\C A single byte even in UTF-8 mode No

The following table lists the different character classes and ranges supported when needing to match multiple characters.

Table: Character Classes

Syntax Description Supported
[abc] a, b, or c (simple class) Yes
[^abc] Not a, b, or c (negated class) Yes
[a-zA-Z] a through z or A through Z, inclusive (range) Yes
[a-d[m-p]] a through d or m through p (union) No
[[:name:]] Named ASCII class inside character class No

Pre-defined character classes cover multi-character groups such as whitespace, words or non-letter/digit characters.

Table: Predefined character classes

Syntax Description Supported
. Any character except newline (unless given flag DOT_ALL) Yes
\d A digit: [0-9] Yes
\D A non-digit: [^0-9] Yes
\h A horizontal whitespace character No
\H A non-horizontal whitespace character: [^\h] No
\s A whitespace character Yes
\S A non-whitespace character Yes
\v A vertical whitespace character No
\V A non-vertical whitespace character No
\w A word character: [a-zA-Z_0-9] Yes
\W A non-word character (inverse of above, equivalent to [^a-zA-Z0-0]) Yes

Posix character classes are not supported in LogScale's regex implementation.

Table: Posix character classes

Syntax Description Supported
\p{Lower} A lowercase alphabetic character No
\p{Upper} An uppercase alphabetic character No
\p{ASCII} All ASCII No
\p{Alpha} An alphabetic character No
\p{Digit} A decimal digit No
\p{Alnum} An alphanumeric character No
\p{Punct} A punctuation character No
\p{Graph} A visible character No
\p{Print} A printable character No
\p{Blank} A space or tab No
\p{Xdigit} A hexadecimal digit No
\p{Space} A whitespace character No

Unicode character classes are not supported in LogScale's regex implementation.

Table: Classes for Unicode scripts, blocks, categories and binary properties

Syntax Description Supported
\p{IsLatin} A Latin script character No
\p{InGreek} A character in the Greek block No
\p{Lu} An uppercase letter No
\p{IsAlphabetic} An alphabetic character (binary property) No
\p{Sc} A currency symbol No
\P{InGreek} Any character except one in the Greek block (negation) No
[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction) No

The following table lists the supported boundary matchers, such as word or line boundaries.

Table: Boundary matchers

Syntax Description Supported
^ The beginning of a line Yes
$ The end of a line Yes
\b A word boundary Yes
\b{g} A Unicode extended grapheme cluster boundary Yes
\B A non-word boundary Yes
\A The beginning of the input Yes
\G The end of the previous match No
\Z The end of the input but for the final terminator, if any Yes
\z The end of the input Yes

The following classes cover extended Unicode classes.

Table: Other matchers

Syntax Description Supported
\R Any Unicode linebreak sequence No
\X Any Unicode extended grapheme cluster No

Quantifiers provide numeric validation to a given character or character class.

Table: Quantifiers

Syntax Description Supported
X? X, once or not at all Yes
X* X, zero or more times Yes
X+ X, one or more times Yes
X{n} X, exactly n times Yes
X{n,} X, at least n times Yes
X{n,m} X, at least n times but not more than m times Yes
X?? X, once or not at all, prefer less Yes
X*? X, zero or more times, prefer less Yes
X+? Not supported Yes
X{n}? X, exactly n times Yes
X{n,}? X, at least n times, prefer less Yes
X{n,m}? X, at least n times but not more than m times, prefer less Yes
X?+ X, once or not at all, possessive No
X*+ X, zero or more times, possessive No
X++ X, one or more times, possessive No
X{n}+ X, exactly n times, possessive Yes
X{n,}+ X, at least n times, possessive Yes
X{n,m}+ X, at least n times but not more than m times, possessive Yes

The following table lists logical matching operators.

Table: Logical operators

Syntax Description Supported
XY X followed by Y Yes
X|Y Either X or Y Yes

Groups and backreferences support special matching rules for adding further explicit qualification within the regular expression.

Table: Groups and backreferences

Syntax Description Supported
(X) X, as a numbered capturing group Yes
(?<name>X) X, as a named capturing group

Yes. Note that LogScale supports a broader set of names than is usual (e.g. containing # and @).

(?P<name>X) X, as a named capturing group Yes
\n Numbered backreference: Whatever the nth capturing group matched (0 <= n < infinity), starting at 1 Yes
\k<name>

Named backreference. Whatever the named-capturing group name matched.

No
(?:X) X, as a non-capturing group Yes
(?flags) Sets flags in the group Yes, (see Regular Expression Flags)
(?flags:X) Sets flags in X Yes, but uses different flags to LogScale (see Regular Expression Flags)
(?=X) X, as a zero-width positive lookahead Yes
(?!X) X, as a zero-width negative lookahead Yes
(?<=X) X, as a zero-width positive lookbehind No
(?<!X) X, as a zero-width negative lookbehind No
(?>X) X, as an independent, non-capturing (atomic) group. No

Methods for quoting special characters and regex characters within a regular expression.

Table: Quotation

Syntax Description Supported
\ Quotes the following character for certain characters (i.e. regex meta-characters). Yes
\Q Quotes all characters until \E Yes
\E Ends quoting started by \Q Yes