Ingestion: Ingest Phase
Data sent to LogScale is received by the ingest layer. The ingest layer handles the incoming request. First data is matched and validated for the given ingest protocol. Using the protocols incoming data is turned into log events. Typically, users will create parsers that will be applied to structure and enrich the incoming data. When data is parsed, it is put on a Kafka ingest queue and an acknowledgement is returned in the response to a client.
Validating the input
Extracting timestamps, or adding them if not available
Parsing of the data using a user-defined parser to extract fields or reformat data
Completed events are placed into the Kafka ingest queue
Parsers During Ingestion
Whether you are ingesting structured or formatted data that has already identified the fields and information, or raw text log lines where the information needs to be extracted, the role of the parser is to extract, format and augment or enrich the incoming data stream for storage.
Parsers within LogScale allow for the following operations during ingestion:
Identify specific fields according to the source data type
Identify metadata fields such as the timestamp and translate them to the LogScale standard
Augment the information, for example formatting fields into a standard format, or resolving IP addresses
Assign key fields to a standardized format to allow data from different source formats to be queried using the same field names
The process of parsing is one of enrichment of the data. Most log data is free text, but storing the information in a fixed field improves the ability to query and process the information during search. LogScale always stores the original raw text along with any extracted field data.
For a more detailed example, let's look at the output from an HTTP web server. Each line represents a request/response from the web server and will be turned into an event just consisting of the raw string. The raw string for the web server is typically structured in a well known format and contains information on HTTP status code, HTTP method, response time, URL, user agent etc. It is possible to create parsers that can parse a given structure. For a web server this could look like this:
47.29.201.179 - - [28/Feb/2019:13:17:10 +0000] "GET /?p=1 HTTP/2.0" 200 5316 "https://domain1.com/?p=1" "Mozilla/5.0 (Windows NT 6.1)"
The structure of the data contains a lot of information that we can parse and extract:
Creating a parser for this format will add the following fields to the LogScale event and give it structure:
@rawstring | 47.29.201.179 - - [28/Feb/2019:13:17:10 +0000] ...(full text) |
@timestamp | cluster ingest time (not event time) |
This raw data will be parsed into the following fields:
@rawstring | 47.29.201.179 - - [28/Feb/2019:13:17:10 +0000] ... |
@timestamp | 28/Feb/2019:13:17:10 +0000 |
method | GET |
version | 2.0 |
status | 200 |
size | 5316 |
url | https://domain1.com/?p=1 |
user-agent | Mozilla/5.0 (Windows NT 6.1) |
Parsers are written using the LogScale Query Language (LQL). Using the same language as you use for querying enables you to use the same functions and constructs as when querying data. In addition, because you can always parse and extract information during the query process by re-examining the original @rawstring all of the principles are familiar.
Typically, a parser makes use of the following functions and syntax:
Parsing functions for specific data types, such as
parseJSON()
,parseXml()
,parseCsv()
andparseTimestamp()
Regular expressions using the
/regex/
orregex()
functions to identify key information and place it into fieldsStatements for selecting the processing method like
case
orif
With the full suite of LQL tools available, you can also perform enrichment of the data. In the above example, it is possible to make the parser geocode the client ip address, enriching the event with the country/area/city of the request:
...| ipLocation(clientIP)
For more information on writing and creating parsers, see Parsing Data.
In the parser configuration it is also possible to specify which fields in the event should be tags. Tags are discussed in Tag Fields and Datasources.
Ingest tokens
LogScale requires clients to provide an Ingest Token. Ingest tokens are created in LogScale and support:
Authorization and authentication; data can only be ingested if a valid Ingest Token has been used.
Ingest tokens are unique to the repository where the data will be stored. You cannot use an Ingest Token for repository A to ingest data into repository B.
Tokens are associated with a specific parser
Using an Ingest Token means having a unique string for communicating a specific type of data and processing (through the parser) for that data. To limit the ingestion of data to specific hosts, create a unique Ingest Token for each host and parser configuration.
Alternatively, the parser can choose how the incoming data is parsed and processed based on a field within the source log file or data.