Ingest API

While Falcon Log Collector can be used for ingesting data into Falcon LogScale, and is the recommended approach when ingesting from Linux, macOS and Windows systems, it is also possible to use the Ingest APIs to ingest data into LogScale. These are useful in the following use cases:

  • On platforms and devices where the use of Falcon Log Collector is not currently supported. For example, on resource-constrained systems such as IoT devices, and embedded operating systems.

  • In situations where you do not control the event message format. This is usually the case when interfacing with systems that provide data to be ingested using a callback, or where cloud services use a webhook to supply data.

  • If you want backward-compatibility with your Splunk tools, scripts, and collectors, you could use the HEC endpoint.

  • If you have existing OpenTelemetry-compatible data feeds that you want to ingest into Falcon LogScale you could use the OpenTelemetry endpoint.

  • For compatibility with tools that use the Elastic Bulk API, such as the Elastic Beats range of Open Source log shippers.

This documentation explains how to ingest data into Falcon LogScale using the HTTP API endpoints. The use cases for these different endpoints are explained in later sections.

Note

The Ingest APIs can only be used with LogScale and can't be used with NG-SIEM. NG-SIEM uses Connectors for data ingestion, and the API endpoints described here are not exposed.

Ingest Tokens and Parsers

When using the Ingest APIs, all requests need to be authenticated. This is achieved using a Bearer Token, called an Ingest Token. The Ingest Token for your client is created in LogScale. There are two key points to be noted here:

  1. By default, there is no parser attached to an ingest token.

  2. When you create an ingest token, you can specify an associated parser to be used when requests are authenticated using that token. You can also change the assigned parser at a later stage by editing the token.

These points are especially important considerations when dealing with unstructured data. As by default there is no parser, in this situation the data would be ingested, but not parsed. This means that the data would be present in LogScale, and could be searched using the LogScale querying functionality, but that the data would not be automatically parsed into fields. To make querying more efficient, and leverage its more powerful features, the data ideally needs to be parsed into fields, using a parser.

When using the structured data endpoints, this is less important, as the data is organized in the request payload in a way that enables LogScale to parse the data into appropriate fields. The different endpoints are covered in more detail in later sections.

Parsers can be of three main types:

  1. Built-in - provided by LogScale, these are available by default.

  2. Marketplace - provided by third-parties to support specialized parsing requirements.

  3. Custom - create your own parser where one is not available for your data format.

Endpoints

There are three main classes of endpoint in the Ingest APIs. These are summarized in the following table:

Type Description
Structured These endpoints are used when you strcutured data such as JSON, and have control over building the data to be ingested. In this case you can format your data into a structure that can be parsed by the system, without you needing to specify a parser. Examples include the humio-structured and HEC endpoints.
Unstructured These endpoints are used when your client has some control over your data format, and you are parsing unstructured text logs such as syslogs, accesslogs, or logs from applications. When you create the ingest token, you associate a suitable parser with it, and this is then used by your HTTP client when posting data to LogScale. Examples include humio-unstructured and HEC endpoints.
Raw The "raw" endpoints are typically used when you have no control over the structure of the message. This happens when the endpoint is invoked by callback from another system, or when a webhook for a cloud service is invoked on the endpoint. Examples include the raw, HEC raw, and JSON raw endpoints.

The available endpoints for ingesting data are shown in the following table:

Name Endpoint Use Case
Unstructured ${HOST}:${PORT}/api/v1/ingest/humio-unstructured Use where the data is unstructured, and you have control over formation of the message to be ingested. Typically used for unstructured text logs.
Structured ${HOST}:${PORT}/api/v1/ingest/humio-structured Use where the data is structured, and you have control over formation of the message to be ingested. Typically used for structured data, such as JSON.
Raw ${HOST}:${PORT}/api/v1/ingest/raw Use where the data is structured or unstructured, but you have no control over formation of the message to be ingested. You can specify a suitable parser and associate it with the ingest token. Typically used when the inbound message is generated by a callback or webhook.
JSON Raw ${HOST}:${PORT}/api/v1/ingest/json Use where the data is JSON, but you have no control over formation of the message to be ingested. Typically used when the inbound message is generated by a callback or webhook, but is in JSON format.
Elastic Bulk Ingest ${HOST}:${PORT}/api/v1/ingest/elastic-bulk Is primarily provided for compatibility with log shippers that use the Elastic Bulk API, of which Falcon LogScale supports a subset. Further information is provided in the documentation on Elastic Beats.
OpenTelemetry ${HOST}:${PORT}/api/v1/ingest/otlp Provided to support software and devices that use the OpenTelemetry Protocol (OTLP). For example, OpenTelemetry Collector can be used to ship log and other data to LogScale.
HEC - HTTP Event Collector ${HOST}:${PORT}/api/v1/ingest/hec Can handle both structured and unstructured data. Is provided primarily for Splunk compatibility.
HEC Raw ${HOST}:${PORT}/api/v1/ingest/hec/raw Provides a simple line-delimited ingest endpoint for unstructured log data. Retains some Splunk compatibility, for example X-Splunk-Request-Channel is supported.

More information on endpoints in general can be found in LogScale URLs & Endpoints.

Request format

Requests to one of the endpoints listed previously have the basic structure of a POST to the endpoint, as shown in the following example:

http
POST /api/v1/ingest/humio-unstructured

A practical request contains additional data including the bearer token (an ingest token), additional headers, and the request payload. The request payload typically consists of the following:

  1. Metadata - which is optional unless otherwise stated. Typically translated to tags or user fields on ingestion.

  2. The raw event message - the actual message such as a log line or structured data. Typically translated to @rawstring on ingestion.

See later sections for Examples.

Response Codes

The Ingest API responds with standard HTTP response codes:

Code Description
200 Data has been received and committed.
201-399 An error has occurred, check the error text to confirm the error and then retried if possible.
401 or 403 Indicates that the authorization token is incorrect. The operation can be retried if a new API token is used.
4xx (excluding 401 and 403) Cannot be retried.
5xx Errors in the 5xx range can be retried as it may be a temporary error.

Removing fields for ingest cost optimization

For most endpoints, the metadata that can be associated with an event is entirely optional, but is available to help optimize later querying in LogScale. The exception to this are the raw endpoints, where the entire request payload is translated into the @rawstring on ingestion.

Note that fields that are translated into @rawstring on ingestion, cannot be removed with the remove field functionality in a parser. For each endpoint optional metadata fields are clearly indicated, as is the element of the request payload translated into the @rawstring.

Best Practice

When sending POST requests with logs for ingesting, CrowdStrike recommends that logs be batched together in single requests, as sending one request per log message won't scale well.

A good strategy to start with, is to batch log messages up in five second windows, and send all log messages from that time frame in one request.

However, requests can also grow too large. CrowdStrike recommends that ingest requests should have no more than 5000 events and take up no more than 5 MB of space (uncompressed). If your requests grow larger than this during your batching time frame, it's better to break the logs into multiple requests. Please refer to Limits & Standards when ingesting large bulks of events.

Important

When the configured event size max is reached, either in @rawstring and/or in other fields, the overall data will be truncated. Fields will be removed entirely, and @rawstring will be truncated down to the allowed max size with added ... at the end, such that the of all other fields + size of @rawstring is less than the configured max event size. Only @rawstring, @timestamp and @timezone are added when truncation occurs. For more information, see Limits & Standards.