Creating a Parser

Security Requirements and Controls

A parser consists of a script, plus a few related settings. The parser script is the main part of the parser, as this defines how a single incoming event is transformed before it becomes a searchable event. LogScale has built-in parsers for common log formats like accesslog. The goal for a parser script is to extract the correct timestamp from the incoming event and set fields that you want to use frequently in your searches.

The following diagram provides an overview of where parsers fit in the configuration flow to ingest data using LogScale.

graph LR; A["Install and Configure LogScale"]--> B B["Create a Repository"]--> C C["Configure Data Ingest"]--> D D["Parse and Filter Data"]--> E E["Enrich Data"]--> F F["Query Data"] style D fill:#A6A0D2

Figure 42. Flow


If you have checked the available options for parsers to select, and found that you would like to create your own (or edit an existing one perhaps), then this guide will help you understand how to do so best.

Creating a New Parser

In this section we will go through the steps of creating a parser from scratch.

Parser Overview

Figure 43. Parser Overview


  1. Go to the Repository and Views page.

  2. Select a Repository.

  3. Click Parsers to reach the parser overview and then click + New Parser, see Figure 43, “Parser Overview”.

  4. Insert a name for you parser: only alphanumeric characters, underscore and hyphen are allowed, and the name must be unique inside the repository.

  5. Select how to create the parser:

    • Empty Parser – Select Empty Parser and click Create.

    • Clone Existing – Select Clone Existing, choose a parser from the drop-down menu and click Create.

    • From Template – Select From Template, browse for or drag and drop a parser and click Create.

    • From Package – Select From Package and click Create.

Writing a Parser

Once you have created your parser, you will be presented with a code editor.

Writing a Parser

Figure 44. Writing a Parser


Parser Editor - a simple parser and two test cases.

The programming language used for creating a parser is the same as you use to write queries on the search page.

Important

The main difference between writing a parser and writing a search query is that you cannot use aggregate functions like groupBy(), as the parser acts on one event at a time.

The input data is usually log lines or JSON objects, but could be any text format like a stack trace or CSV.

When sending data to LogScale, the text string for the input is put in the field @rawstring. Depending on how data is shipped to LogScale, other fields can be set as well. For example when sending data with Filebeat, the fields @host and @source will also be set. And it is possible to add more fields using this log shipper.

Using the Parser Code Editor

The editor allows you to create and edit parsers code and run test for your parsers.

  1. To access the editor go to Parsers and select an existing parser from the list or click + New parser to create a new parser. The code editor is displayed.

  2. Write the script for your parser or edit an existing parser in the Parser script area, see the following for examples:

  3. Click Save to save your changes.

  4. Optionally, you can export, duplicate or add a test.

When editing a parser script, you can use the same autocompletion that is also available from the Search page. This allows you to autocomplete function and field names. The suggested field names are taken from the test cases of the parser, so any fields getting outputted for a test case is available for autocompletion:

Autocompletion in Parser Script

Figure 45. Autocompletion in Parser Script


Creating an Event from Incoming Data

The parser converts the data in @rawstring into an event. That means the parser should:

  • Assign the special @timestamp and @timezone fields.

  • Extract additional fields that should be stored along with your event.

Let's take a look at a couple of parsers to understand how they work.

Example: Parsing Log Lines

Assume we have a system producing logs like the following two lines:

ini
2018-10-15T12:51:40+00:00 [INFO] This is an example log entry. id=123 fruit=banana
2018-10-15T12:52:42+01:30 [ERROR] Here is an error log entry. class=c.o.StringUtil fruit=pineapple

We want the parser to produce two events (one per line) and use the timestamp of each line as the time at which the event occurred; that is, assign it to the field @timestamp, and then extract the "fields" which exist in the logs to actual LogScale fields.

To do this, we will write a parser, and we'll start by setting the correct timestamp. To extract the timestamp, we need to write a regular expression like the following:

logscale
@rawstring = /^(?<temp_timestamp>\S+)/ 
| parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=temp_timestamp)
| drop(temp_timestamp)

This creates a field named temp_timestamp using a "named group" in the regular expression, which contains every character from the original event up until the first space, i.e. the original timestamp. The regular expression reads from the @rawstring field, but it doesn't modify it; it only copies information out.

With the timestamp extracted into a field of its own, we can call parseTimestamp() on it, specifying the format of the original timestamp, and it will convert that to a UNIX timestamp and assign it to @timestamp for us. With @timestamp now set up, we can drop temp_timestamp again, as we have no further need for it.

In addition to the timestamp, the logs contain more information. Looking at the message

ini
2018-10-15T12:51:40+00:00 [INFO] This is an example log entry. id=123 fruit=banana

We can see:

  • The log level INFO

  • The message This is an example log entry

  • The id 123

  • The fruit banana

To extract all of this, we can expand our regular expression to something like:

logscale
/^(?<temp_timestamp>\S+) \[(?<logLevel>\w+)\] (? <message>.*?)\. (?<temp_kvPairs>.*)/

The events will now have additional fields called logLevel (with value INFO) and message (with value This is an example log entry), which we can use as is. The event also has a temp_kvPairs field, containing the additional fields which are present after the message i.e. id=123 fruit=banana. So we still need to extract more fields from temp_kvPairs, and we can use the kvParse() function for that, and drop temp_kvPairs once we are finished.

As a result, our final parser will look like this:

logscale
@rawstring = /^(? <temp_timestamp>\S+) \[(? <logLevel>\w+)\] (? <message>.*?)\. (? <temp_kvPairs>.*)/


| parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=temp_timestamp)
| drop(temp_timestamp)

| kvParse(temp_kvPairs)
| drop(temp_kvPairs)

Example: Parsing JSON

We've seen how to create a parser for unstructured log lines. Now let's create a parser for JSON logs based on the following example input:

javascript
{
  "ts": 1539602562000,
  "message": "An error occurred.",
  "host": "webserver-1"
}
{
  "ts": 1539602572100,
  "message": "User logged in.",
  "username": "sleepy",
  "host": "webserver-1"
}

Each object is a separate event and will be parsed separately, as with unstructured logs.

The JSON is accessible as a string in the field @rawstring. We can extract fields from the JSON by using the parseJson() function. It takes a field containing a JSON string (in this case @rawstring) and extracts fields automatically, like this:

logscale
parseJson(field=@rawstring) 
| @timestamp := ts

This will result in events with a field for each property in the input JSON, like username and host, and will use the value of ts as the timestamp. As ts already has a timestamp in the UNIX format, we don't need to call parseTimestamp() on it.

Named Capture Groups

LogScale extracts fields using named capture groups — a feature of regular expressions that allows you to name sub-matches, for example:

logscale
/(?<firstname>\S+)\s(?<lastname>\S+)/

This defines a regex that expects the input to contain a first name and a last name. It then extracts the names into two fields firstname and lastname. The \S means any character that is not a whitespace and \s is any whitespace character.

Next Steps

Once you have your parser script created you can start using it by Ingest Tokens.

You can also learn about how parsers can help speed up queries by Event Tags.