Creating a Parser

A parser is a piece of code that transforms incoming data into Events. LogScale has built-in parsers for common log formats like accesslog. But if none of the built-in parsers fit your data format or you want to extract more fields, do transformations on the data, or assign datasources, you can build your own parser.

The following diagram provides an overview of where parsers fit in the configuration flow to ingest data using LogScale.

graph LR; A["Install and Configure LogScale"]--> B B["Create a Repository"]--> C C["Configure Data Ingest"]--> D D["Parse and Filter Data"]--> E E["Enrich Data"]--> F F["Query Data"] style D fill:#A6A0D2

Figure 41. Flow


In this guide we will go through the steps of creating a parser from scratch.

Creating a Parser

Figure 42. Creating a Parser


Creating a New Parser

Security Requirements and Controls
  • Change parsers permission

  1. Go to the Repository and Views page.

  2. Select a Repository.

  3. Click Parsers and then click + New Parser

  4. Insert a name for you parser (only alphanumeric characters, underscore and hyphen are allowed), the name is important as it is used by the API to uniquely identify the parser.

  5. Select how to create the Parser:

    • Empty Parser – Select Empty Parser and click Create.

    • Clone Existing – Select Clone Existing choose a parser from the drop-down menu and click Create.

    • From Template – Select From Template browse for or drag and drop a parser and click Create.

    • From Package – Select From Package and click Create.

Writing a Parser

Once you have created your parser, you will be presented with a code editor.

Writing a Parser

Figure 43. Writing a Parser


Parser Editor - a simple parser and two test cases.

The programming language used for creating a parser is the same as you use to write queries on the search page. The main difference between writing a parser and writing a search query is that you cannot use aggregate functions like groupBy(), as the parser acts on one event at a time.

The input data is usually log lines or JSON objects, but could be any text format like a stack trace or CSV.

When sending data to LogScale, the text string for the input is put in the field @rawstring. Depending on how data is shipped to LogScale, other fields can be set as well. For example when sending data with Filebeat, the fields @host and @source will also be set. And it is possible to add more fields using the Filebeat.

Using the Parser Code Editor

The editor allows you to create and edit parsers code and run test for your parsers.

  1. To access the editor go to Parsers and select an existing parser from the list or click + New parser to create a new parser. The code editor is displayed.

  2. Write the script for your parser or edit an existing parser in the Parser script area, see the following for examples:

  3. Click Save to save your changes.

  4. Optionally, you can export, duplicate or add a test.

Creating an Event from Incoming Data

The parser converts the data in @rawstring into an event. That means the parser should:

  • Assign the special @timestamp and @timezone fields.

  • Extract additional fields that should be stored along with your event.

Let's take a look at a couple of parsers to understand how they work.

Example: Parsing Log Lines

Assume we have a system producing logs like the following two lines:

ini
2018-10-15T12:51:40+00:00 [INFO] This is an example log entry. id=123 fruit=banana
2018-10-15T12:52:42+01:30 [ERROR] Here is an error log entry. class=c.o.StringUtil fruit=pineapple

We want the parser to produce two events (one per line) and use the timestamp of each line as the time at which the event occurred; that is, assign it to the field @timestamp and @timezone.

To do this we could write a parser. Create field ts by extracting the first part of each log line using a regular expression. See regex(). The syntax ?<ts> is called a named group. It means whatever is matched will produce a field with that name — in this case a field named ts.

logscale
/^(?<ts>\S+)/ 
| 
parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=ts)

To set the timestamp for the event, use the function parseTimestamp(). It uses the field ts we just extracted and parses the string value into a timestamp. It sets the timestamp for the event by setting the field @timestamp. Note the timezone is also parsed and set using the field @timezone.

This parser assigns the @timestamp and @timezone fields, which is the minimum you can do to create events from the examples above. At this point we have a fully valid parser.

The two log lines contain more useful information, like the INFO and ERROR log levels. We can extract those by extending the regular expression:

logscale
//first the timestamp is extracted. Then the regex matches the loglevel. For example [INFO] or [ERROR]
/^(?<ts>\S+) \[(?<loglevel>[^\]]+)\]/ 
| 
@timestamp := parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=ts) 
| 
// The next line finds key value pairs and creates a field for each
kvParse()

The events will now have a field called loglevel.

At the bottom of the parser we also added the function kvParse(). This function will look for key-value pairs in the log line and extract them into fields, like id=123 and fruit=banana.

Parsing JSON

We've seen how to create a parser for unstructured log lines. Now let's create a parser for JSON logs based on the following example input:

javascript
{
  "ts": 1539602562000,
  "message": "An error occurred.",
  "host": "webserver-1"
}
{
  "ts": 1539602572100,
  "message": "User logged in.",
  "username": "sleepy",
  "host": "webserver-1"
}

Each object is a separate event and will be parsed separately, as with unstructured logs.

The JSON is accessible as a string in the field @rawstring. We can extract fields from the JSON by using the parseJson() function. It takes a field containing a JSON string (in this case @rawstring) and extracts fields automatically, like this:

logscale
parseJson(field=@rawstring) 
| 
@timestamp := ts 
| 
@timezone := "Z"

This will result in events with a field for each property in the input JSON, like username and host, and will use the value of ts as the timestamp. If the timestamp is a string it can be parsed using the parseTimestamp() function.

Named Capture Groups

LogScale extracts fields using named capture groups — a feature of regular expressions that allows you to name sub-matches, for example:

logscale
/(?<firstname>\S+)\s(?<lastname>\S+)/

This defines a regex that expects the input to contain a first name and a last name. It then extracts the names into two fields firstname and lastname. The \S means any character that is not a whitespace and \s is any whitespace character.

Next Steps

Once you have your parser script created you can start using it by Ingest Tokens.

You can also learn about how parsers can help speed up queries by Event Tags.