Example: Parsing Log Lines

Assume we have a system producing logs like the following two lines:

ini
2018-10-15T12:51:40+00:00 [INFO] This is an example log entry. server_id=123 fruit=banana
2018-10-15T12:52:42+01:30 [ERROR] Here is an error log entry. class=c.o.StringUtil fruit=pineapple

We want the parser to produce two events (one per line) and use the timestamp of each line as the time at which the event occurred; that is, assign it to the field @timestamp, and then extract the "fields" which exist in the logs to actual LogScale fields.

To do this, we will write a parser, and we'll start by setting the correct timestamp. To extract the timestamp, we need to write a regular expression like the following:

logscale
@rawstring = /^(?<temp_timestamp>\S+)/ 
| parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=temp_timestamp)
| drop(temp_timestamp)

This creates a field named temp_timestamp using a "named group" in the regular expression, which contains every character from the original event up until the first space, i.e. the original timestamp. The regular expression reads from the @rawstring field, but it doesn't modify it; it only copies information out.

With the timestamp extracted into a field of its own, we can call parseTimestamp() on it, specifying the format of the original timestamp, and it will convert that to a UNIX timestamp and assign it to @timestamp for us. With @timestamp now set up, we can drop temp_timestamp again, as we have no further need for it.

In addition to the timestamp, the logs contain more information. Looking at the message

ini
2018-10-15T12:51:40+00:00 [INFO] This is an example log entry. server_id=123 fruit=banana

We can see:

  • The log level INFO

  • The message This is an example log entry

  • The server_id 123

  • The fruit banana

To extract all of this, we can expand our regular expression to something like:

logscale
/^(?<temp_timestamp>\S+) \[(?<logLevel>\w+)\] (?<message>.*?)\. (?<temp_kvPairs>.*)/

The events will now have additional fields called logLevel (with value INFO) and message (with value This is an example log entry), which we can use as is. The event also has a temp_kvPairs field, containing the additional fields which are present after the message i.e. id=123 fruit=banana. So we still need to extract more fields from temp_kvPairs, and we can use the kvParse() function for that, and drop temp_kvPairs once we are finished.

As a result, our final parser will look like this:

logscale
@rawstring = /^(?<temp_timestamp>\S+) \[(?<logLevel>\w+)\] (?<message>.*?)\. (?<temp_kvPairs>.*)/
| parseTimestamp("yyyy-MM-dd'T'HH:mm:ss[.SSS]XXX", field=temp_timestamp)
| drop(temp_timestamp)
| kvParse(temp_kvPairs)
| drop(temp_kvPairs)