Differences from Other Regex Implementations

There are many differences between the implementation within LogScale and other environments. These are categorized into two major groups; differences in how the regular expression is applied and how matching values are returned, and differences in the regular expression syntax:

  • Match String

    In other regular expression environments the string or variable to be searched be explicitly named. Within LogScale the string used when matching a regular expression depends on the syntax or function being used:

    • @rawstring

      The raw string of an event is used by default when using /regex/ or regex().

    • All fields

      The /regex/ form searches all fields in an event.

    • Single field

      A single field can be specified when using regex() and the field:

      logscale
      regex("fatal", field=errormsg)

      This will explicitly search only the errormsg field within an event.

      Alternatively, you can use a regex match against a single field using:

      logscale
      errormsg = /fatal/
    • Array

      If you have an array, the array:regex() function can be used to search using a regular expression across each element of the array.

  • Return Values

    As there is no implied language or procedure within the LogScale event search system, how the regular expressions are used and applied is adapted to apply within the LogScale Query Language.

    • Event Filtering

      The simplest method of using and applying a regular expression is as a searching mechanism, when the regular expression is used to include or exclude events in the filter results. Within LogScale, the incoming stream of events is used as the basis of the comparison to the regular expression, with a matching event being included in the result set, and non-matching events dropped. This is identical within a procedural environment as an if or other logical comparison.

      For example, within Perl you might use:

      perl
      if ($string =~ /orgName/)
      ...

      Or within Java:

      java
      Pattern pattern = Pattern.compile("orgName");
      Matcher matcher = pattern.matcher('{"actor":{"ip":"172.17.0.1","orgRoot"...');
      boolean matchFound = matcher.find();
      if (matchFound) ...

      In both cases this results in a logical result.

      Within LogScale, matching against the incoming stream of events is implied, so the LQL would simply be:

      logscale
      /orgName/

      Note that the above searches the @rawstring of an event by default. To apply a regular expression to a specific field you would use:

      logscale
      LogScale = /orgName/

      The output will only include events that match the supplied regular expression.

    • Data Extraction

      When extracting data with a regular expression the basic method is to identify a group within the regular expression to capture/extract the corresponding text string. Within LogScale the output must be a field in the event stream, since this is the only method of sharing a named value. Within a procedural language you would assign the value to a variable, like this expression within JavaScript:

      javascript
      const myRe = new RegExp("orgName=(b+)", "g");
      const myArray = myRe.exec(rawstring");

      The myArray variable now contains zero or more matches of the value of the orgName=value key/value pair.

      Within LogScale the name of the field for the capture group must be defined within the regular expression, since a the regex() or /regex/ syntax does not return a value. Hence the equivalent expression within LQL is:

      logscale
      /orgName=(?<myorgName>\w+?)/

      This will extract the data into a new field from the original @rawstring.

      The resultant event stream now contains a new named field myorgName.

      Note that the field name orgName is placed before the named group expression capturing the value after the = sign.

  • Regex Flags

    LQL regular expressions supports these flags, d, g, i, and m. Many other regular expression implementations support many more flags.

    For some flags, this requires a different or more explicit approach.

Perl Compatible Regular Expressions (PCRE) Differences

As an interpreted language, the main differences between regular expressions in Perl and LogScale are about the implementation of the string used to when applying the regular expression, and how explicit elements are extracted. These differences are listed below:

  • Group Matching

    In scripting language environments, extracting of data into a named variable is performed through group definition within the regular expression and then capturing the groups when the regex is executed. For example, in Perl:

    perl
    my ($orgid, $orgname) = ($event =~ m{orgId=([^\s]+?) orgName=([^\s]+?)});

    This code fragment would extract the orgId and orgName by matching the non-whitespace characters after the =, returning the two matching groups in order, which are then assigned to the two variables $orgid and $orgname.

    Within LogScale, fieldnames are specified within the regular expression as part of a named field extension. The equivalent in LQL would be:

    logscale
    /orgId=(?<orgId>[^\s]+) orgName=(?<orgName>[^\s]+)/

    The name of the fields is in the angle brackets, and note that the (?<name>) is the group specification like the () syntax in Perl.

  • Global (Repeating) Matches

    Perl regular expressions support the notion of 'global' or repeating matches where a regular expression can match multiple times returning every occurrence by using the g flag. For example:

    perl
    my (@errors) = ($string =~ m/error=([a-z]+)/g);

    Here the g indicates that Perl should match multiple times against the given expression, returning each match in the original string to the array.

    The g flag is supported within LogScale, but only when used against specific fields, or when extracting a value that creates a field. It cannot be used to match multiple times in the same event.

    Within LQL you can also use the repeat argument:

    logscale
    regex("error=(?<errortext>[a-z]+)",repeat=true)

    Will create multiple error fields in the event.

    Important

    The regex() does not support the /g regular expression flag; instead the repeat parameter must be used for global or repeating matches.

JavaScript Regular Expression Differences

  • Within Javascript the same could be achieved with the following code:

    javascript
    const regexp = /orgId=([^\s]+?) orgName=([^\s]+?)/;
    const matches = event.matchAll(regexp);

    Here the matches variable is an array with each matching item.

    Within LQL we can explicitly name the group matches:

    logscale
    /orgId=(?<orgId>[^\s]+) orgName=(?<orgName>[^\s]+)/

    The name of the fields is in the angle brackets, and note that the (?<name>) is the group specification like the () syntax in Perl.

re2 Regular Expressions Differences

Among the main differences between Google RE2 regular expression syntax and LogScale syntax there is:

  • Name Group Matching

    Google RE2 uses this syntax for named group matches:

    (?P<orgName>)

    LogScale does not support this notation, but does support the simpler form:

    logscale
    (?<fieldname>)