Ingest API
While Falcon Log Collector can be used for ingesting data into Falcon LogScale, and is the recommended approach when ingesting from Linux, macOS and Windows systems, it is also possible to use the Ingest APIs to ingest data into LogScale. These are useful in the following use cases:
On platforms and devices where the use of Falcon Log Collector is not currently supported. For example, on resource-constrained systems such as IoT devices, and embedded operating systems.
In situations where you do not control the event message format. This is usually the case when interfacing with systems that provide data to be ingested using a callback, or where cloud services use a webhook to supply data.
If you want backward-compatibility with your Splunk tools, scripts, and collectors, you could use the HEC endpoint.
If you have existing OpenTelemetry-compatible data feeds that you want to ingest into Falcon LogScale you could use the OpenTelemetry endpoint.
For compatibility with tools that use the Elastic Bulk API, such as the Elastic Beats range of Open Source log shippers.
This documentation explains how to ingest data into Falcon LogScale using the HTTP API endpoints. The use cases for these different endpoints are explained in later sections.
Note
The Ingest APIs can only be used with LogScale and can't be used with NG-SIEM. NG-SIEM uses Connectors for data ingestion, and the API endpoints described here are not exposed.
Ingest Tokens and Parsers
When using the Ingest APIs, all requests need to be authenticated. This is achieved using a Bearer Token, called an Ingest Token. The Ingest Token for your client is created in LogScale. There are two key points to be noted here:
By default, there is no parser attached to an ingest token.
When you create an ingest token, you can specify an associated parser to be used when requests are authenticated using that token. You can also change the assigned parser at a later stage by editing the token.
These points are especially important considerations when dealing with unstructured data. As by default there is no parser, in this situation the data would be ingested, but not parsed. This means that the data would be present in LogScale, and could be searched using the LogScale querying functionality, but that the data would not be automatically parsed into fields. To make querying more efficient, and leverage its more powerful features, the data ideally needs to be parsed into fields, using a parser.
When using the structured data endpoints, this is less important, as the data is organized in the request payload in a way that enables LogScale to parse the data into appropriate fields. The different endpoints are covered in more detail in later sections.
Parsers can be of three main types:
Built-in - provided by LogScale, these are available by default.
Marketplace - provided by third-parties to support specialized parsing requirements.
Custom - create your own parser where one is not available for your data format.
Endpoints
There are three main classes of endpoint in the Ingest APIs. These are summarized in the following table:
Type | Description |
---|---|
Structured |
These endpoints are used when you strcutured data such as JSON,
and have control over building the data to be ingested. In this
case you can format your data into a structure that can be parsed
by the system, without you needing to specify a parser. Examples
include the humio-structured and HEC endpoints.
|
Unstructured |
These endpoints are used when your client has some control over
your data format, and you are parsing unstructured text logs such
as syslogs, accesslogs, or logs from applications. When you create
the ingest token, you associate a suitable parser with it, and
this is then used by your HTTP client when posting data to
LogScale. Examples include humio-unstructured
and HEC endpoints.
|
Raw |
The "raw" endpoints are typically used when you have no control
over the structure of the message. This happens when the endpoint
is invoked by callback from another system, or when a webhook for
a cloud service is invoked on the endpoint. Examples include the
raw , HEC raw, and JSON raw endpoints.
|
The available endpoints for ingesting data are shown in the following table:
Name | Endpoint | Use Case |
---|---|---|
Unstructured | ${HOST}:${PORT}/api/v1/ingest/humio-unstructured | Use where the data is unstructured, and you have control over formation of the message to be ingested. Typically used for unstructured text logs. |
Structured | ${HOST}:${PORT}/api/v1/ingest/humio-structured | Use where the data is structured, and you have control over formation of the message to be ingested. Typically used for structured data, such as JSON. |
Raw | ${HOST}:${PORT}/api/v1/ingest/raw | Use where the data is structured or unstructured, but you have no control over formation of the message to be ingested. You can specify a suitable parser and associate it with the ingest token. Typically used when the inbound message is generated by a callback or webhook. |
JSON Raw | ${HOST}:${PORT}/api/v1/ingest/json | Use where the data is JSON, but you have no control over formation of the message to be ingested. Typically used when the inbound message is generated by a callback or webhook, but is in JSON format. |
Elastic Bulk Ingest | ${HOST}:${PORT}/api/v1/ingest/elastic-bulk | Is primarily provided for compatibility with log shippers that use the Elastic Bulk API, of which Falcon LogScale supports a subset. Further information is provided in the documentation on Elastic Beats. |
OpenTelemetry | ${HOST}:${PORT}/api/v1/ingest/otlp | Provided to support software and devices that use the OpenTelemetry Protocol (OTLP). For example, OpenTelemetry Collector can be used to ship log and other data to LogScale. |
HEC - HTTP Event Collector | ${HOST}:${PORT}/api/v1/ingest/hec | Can handle both structured and unstructured data. Is provided primarily for Splunk compatibility. |
HEC Raw | ${HOST}:${PORT}/api/v1/ingest/hec/raw |
Provides a simple line-delimited ingest endpoint for unstructured
log data. Retains some Splunk compatibility, for example
X-Splunk-Request-Channel is supported.
|
More information on endpoints in general can be found in LogScale URLs & Endpoints.
Request format
Requests to one of the endpoints listed previously have the basic
structure of a POST
to the endpoint, as shown in the
following example:
POST /api/v1/ingest/humio-unstructured
A practical request contains additional data including the bearer token (an ingest token), additional headers, and the request payload. The request payload typically consists of the following:
Metadata - which is optional unless otherwise stated. Typically translated to tags or user fields on ingestion.
The raw event message - the actual message such as a log line or structured data. Typically translated to @rawstring on ingestion.
See later sections for Examples.
Response Codes
The Ingest API responds with standard HTTP response codes:
Code | Description |
---|---|
200
| Data has been received and committed. |
201-399
| An error has occurred, check the error text to confirm the error and then retried if possible. |
401 or
403
| Indicates that the authorization token is incorrect. The operation can be retried if a new API token is used. |
4xx (excluding
401 and
403 )
| Cannot be retried. |
5xx
| Errors in the 5xx range can be retried as it may be a temporary error. |
Removing fields for ingest cost optimization
For most endpoints, the metadata that can be associated with an event is entirely optional, but is available to help optimize later querying in LogScale. The exception to this are the raw endpoints, where the entire request payload is translated into the @rawstring on ingestion.
Note that fields that are translated into @rawstring on ingestion, cannot be removed with the remove field functionality in a parser. For each endpoint optional metadata fields are clearly indicated, as is the element of the request payload translated into the @rawstring.
Best Practice
When sending POST
requests with logs for
ingesting, CrowdStrike recommends that logs be batched together in single
requests, as sending one request per log message won't scale well.
A good strategy to start with, is to batch log messages up in five second windows, and send all log messages from that time frame in one request.
However, requests can also grow too large. CrowdStrike recommends that ingest requests should have no more than 5000 events and take up no more than 5 MB of space (uncompressed). If your requests grow larger than this during your batching time frame, it's better to break the logs into multiple requests. Please refer to Limits & Standards when ingesting large bulks of events.
Important
When the configured event size max is reached, either in
@rawstring and/or in other fields, the overall data
will be truncated. Fields will be removed entirely, and
@rawstring will be truncated down to the allowed
max size with added ...
at the end, such that the of all
other fields + size of @rawstring is less than the
configured max event size. Only @rawstring,
@timestamp and @timezone are
added when truncation occurs. For more information, see
Limits & Standards.