Creating a Parser
A parser consists of a script and parser settings like Event Tags and Fields to Remove. The parser script, written in the LogScale Query Language, defines how a single incoming event is transformed before it becomes one (or more) searchable event. LogScale has built-in parsers for common log formats like accesslog.
The goal for a parser script is to:
Extract the correct timestamp from the event
Set the fields you want to use frequently in your searches
The following diagram provides an overview of where parsers fit in the configuration flow to ingest data using LogScale.
Figure 43. Flow
The main text of the ingested event is present in the field @rawstring, and many functions used for parsing will default to using @rawstring if no field is specified, so a parser may easily parse the incoming data without ever referring explicitly to @rawstring in the script.
Other fields may also be present though, depending on how logs are sent to
LogScale. For example, Falcon Log Collector will add a few
fields such as @collect.timestamp
which are present and
usable in the parser. In other words, an input event for a parser is
really a collection of key-value pairs. The main key is
@rawstring, but others can be present from the
beginning as well, and the parser can use those as it would do with any
other fields.
The contents of @rawstring can also be any kind of text value. It's common to see e.g. JSON objects or single log lines;@rawstring doesn't require any specific format, and you can send whatever data you like.
Setting the correct timestamp is important, as LogScale relies on this field to find the right results when you search in a given time interval. You do this by assigning the timestamp to the @timestamp field, formatted as a UNIX timestamp. Functions such as parseTimestamp() are designed to make this easy. See https://library.humio.com/data-analysis/parsers-parsing-timestamps.html.
Setting fields you want to search for in the parser is optional, though highly recommended. That's because fields can also be extracted at search time, so the parser does not need to meticulously set every field you might want to use. However, searching on fields which have been set by the parser is generally easier, in terms of writing queries, and also performs better, in terms of search speed.
If you have checked the list of preconfigured parsers and found that nothing quite matches what you need, then this guide will show you how to create your own (or edit an existing one) parser.
Creating a New Parser
Security Requirements and Controls
Change parsers
permission
This section describes how to create a parser from scratch.
Figure 44. Parser Overview
Go to
page and select the repository where you want to create a parser.Click Figure 44, “Parser Overview”.
to reach the parser overview, and then click , seeIn the New parser dialog box, enter a name for the parser: only alphanumeric characters, underscore and hyphen are allowed, and the name must be unique inside the repository.
Select how to create the parser:
Empty Parser – Select Empty parser and click .
Clone Existing – Select Duplicate existing, select a parser from the Duplicate Template list and click .
From Template – Select From template, browse for or drag and drop a parser and click .
From Package – Select From package and click .
Clicking write a script for the parser.
will open a code editor where you can
Writing a Parser
Once you have created your parser, you will be presented with a code editor.
Figure 45. Writing a Parser
Parser Editor - a simple parser and two test cases.
The programming language used for creating a parser is the same as you use to write queries on the search page.
Important
The main difference between writing a parser and writing a search
query is that you cannot use aggregate functions like
groupBy()
, as the parser acts on one event at a
time.
The input data is usually log lines or JSON objects, but could be any text format like a stack trace or CSV.
When sending data to LogScale, the text string for the input is put in the field @rawstring. Depending on how data is shipped to LogScale, other fields can be set as well. For example when sending data with Filebeat, the fields @host and @source will also be set. And it is possible to add more fields using this log shipper.
Using the Parser Code Editor
The editor allows you to create and edit parsers code and run test for your parsers.
To access the editor go to create a new parser. The code editor is displayed.
and select an existing parser from the list or click toWrite the script for your parser or edit an existing parser in the Parser script area, see the following for examples:
When editing a parser script, you can use the same autocompletion that is also available from the
page. This allows you to autocomplete function and field names. The suggested field names are taken from the test cases of the parser, so any fields getting outputted for a test case is available for autocompletion:Autocompletion in Parser Script
Click
to save your changes.Click.
to add test case to your parser or the ellipsis button to export or duplicate the parser.You can validate your test data against the CPS schema to insure consistency.
Creating an Event from Incoming Data
The parser converts the data in @rawstring into an event. That means the parser should:
Assign the special @timestamp and @timezone fields.
Extract additional fields that should be stored along with your event.
Let's take a look at a couple of parsers to understand how they work.
Example: Parsing Log Lines
Assume we have a system producing logs like the following two lines:
2018-10-15T12:51:40+00:00 [INFO] This is an example log entry. server_id=123 fruit=banana
2018-10-15T12:52:42+01:30 [ERROR] Here is an error log entry. class=c.o.StringUtil fruit=pineapple
We want the parser to produce two events (one per line) and use the timestamp of each line as the time at which the event occurred; that is, assign it to the field @timestamp, and then extract the "fields" which exist in the logs to actual LogScale fields.
To do this, we will write a parser, and we'll start by setting the correct timestamp. To extract the timestamp, we need to write a regular expression like the following:
@rawstring = /^(?<temp_timestamp>\S+)/
| parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=temp_timestamp)
| drop(temp_timestamp)
This creates a field named temp_timestamp using a "named group" in the regular expression, which contains every character from the original event up until the first space, i.e. the original timestamp. The regular expression reads from the @rawstring field, but it doesn't modify it; it only copies information out.
With the timestamp extracted into a field of its own, we can call parseTimestamp() on it, specifying the format of the original timestamp, and it will convert that to a UNIX timestamp and assign it to @timestamp for us. With @timestamp now set up, we can drop temp_timestamp again, as we have no further need for it.
In addition to the timestamp, the logs contain more information. Looking at the message
2018-10-15T12:51:40+00:00 [INFO] This is an example log entry. server_id=123 fruit=banana
We can see:
The log level
INFO
The message
This is an example log entry
The server_id
123
The fruit
banana
To extract all of this, we can expand our regular expression to something like:
/^(?<temp_timestamp>\S+) \[(?<logLevel>\w+)\] (? <message>.*?)\. (?<temp_kvPairs>.*)/
The events will now have additional fields called logLevel (with value
INFO
) and message (with value This is an example log
entry
), which we can use as is. The event also has a
temp_kvPairs field, containing the additional
fields which are present after the message i.e. id=123
fruit=banana
. So we still need to extract more fields from
temp_kvPairs, and we can use the
kvParse()
function for that, and drop
temp_kvPairs once we are finished.
As a result, our final parser will look like this:
@rawstring = /^(? <temp_timestamp>\S+) \[(? <logLevel>\w+)\] (? <message>.*?)\. (? <temp_kvPairs>.*)/
| parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=temp_timestamp)
| drop(temp_timestamp)
| kvParse(temp_kvPairs)
| drop(temp_kvPairs)
Example: Parsing JSON
We've seen how to create a parser for unstructured log lines. Now let's create a parser for JSON logs based on the following example input:
{
"ts": 1539602562000,
"message": "An error occurred.",
"host": "webserver-1"
}
{
"ts": 1539602572100,
"message": "User logged in.",
"username": "sleepy",
"host": "webserver-1"
}
Each object is a separate event and will be parsed separately, as with unstructured logs.
The JSON is accessible as a string in the field
@rawstring. We can extract fields from the JSON by
using the parseJson()
function. It takes a field
containing a JSON string (in this case @rawstring)
and extracts fields automatically, like this:
parseJson(field=@rawstring)
| @timestamp := ts
This will result in events with a field for each property in the input
JSON, like username and host,
and will use the value of ts as the timestamp. As
ts already has a timestamp in the UNIX format, we
don't need to call parseTimestamp()
on it.
Named Capture Groups
LogScale extracts fields using named capture groups
—
a feature of regular expressions that allows you to name sub-matches,
for example:
/(?<firstname>\S+)\s(?<lastname>\S+)/
This defines a regex that expects the input to contain a first name and a last name. It then extracts the names into two fields firstname and lastname. The \S means any character that is not a whitespace and \s is any whitespace character.
Next Steps
Once you have your parser script created you can start using it by Ingest Tokens.
You can also learn about how parsers can help speed up queries by Event Tags.
Parsers Validation Errors
The errors listed here do not represent ingest errors. These errors are related to the content of your parser, serving as guidelines as to how you should write your parser, to make sure that it is consistent and functions works as intended. The errors you encounter here will (in many cases, some exceptions if it is a parser error) not be visible on events that are actually ingested.
Static Output validation
These validations are run against all test cases and cannot be disabled. Their purpose is to make sure that queries against your event will work.
Message | Description | Solution |
myFieldName is not searchable. When searching for this field, LogScale will not find this event, because fields starting with # are searched as tags." | The field has been falsely tagged. The symbol # has manually been prefixed to the field's name, however, this does not create a tagged field, but instead a field with that name. This affects searching as queries will search for a tagged field, which cannot be found, as no such tag exists. | Remove the # prefix from your field changing it from #myFieldName to myFieldName. Then go into Parser > Settings> Fields to Tag and add myField to the list if you intend to tag the field. The parser script will now produce a field named #myFieldName and remove myFieldName from the event |
The array myArrayName has gaps, which
affects searching with array functions. There are elements
missing at indexes: indexnumber, and
larger gaps between: gap
|
There are missing indices and larger gaps in the array. Arrays must be without any gaps and start at index 0 for array functions to work correctly. For more information see, array syntax at Array Syntax. |
Remove the gaps in your array. The recommended approach is
to use the array:append() function to
construct arrays. This ensures that there are no gaps in the
output.
|
The array myArrayName has gaps, which
affects searching with array functions. There are gaps
between these indexes: indexnumber .
|
There are gaps in the array. Arrays must be without any gaps and start at index 0 for array functions to work correctly. For more information see, array syntax at Array Syntax. |
Remove the gaps in your array. The recommended approach is
to use the array:append() function to
construct arrays. This ensures that there are no gaps in the
output.
|
The array myArrayName has gaps, which
affects searching with array functions. There are elements
missing at indexes: indexnumber .
|
There are some indices missing in the array. Arrays must be without any gaps and start at index 0 for array functions to work correctly. For more information see, array syntax at Array Syntax. |
Remove the gaps in your array. The recommended approach is
to use the array:append() function to
construct arrays. This ensures that there are no gaps in the
output.
|