Measure Data Ingest

A key consideration is that LogScale attempts to count only the data you're ingesting toward license usage. The raw event string may have additional fields and tags added before LogScale ingests it. These are referred to as explicit fields and explicit tags.

Data is then passed to the optional field removal phase. Explicit fields can be removed during this phase. Explicit field removal is a parser setting described in the Removing Fields documentation. Field removal occurs before data parsing. Any fields removed at this stage do not count toward usage. After field removal, the ingested data is parsed. Any fields or tags the parser derives from the raw event string do not count toward usage.

Important

NG-SIEM does not currently support the field removal feature in parser settings, only Falcon LogScale does. LogScale recommends that you drop unwanted data elsewhere, for example in the Log Collector.

A summary of the LogScale data ingest flow is as follows:

  1. Data is submitted from the client with the optional addition of explicit fields and tags.

  2. LogScale receives data and adds the necessary fields and tags for storage. For example, if the submitted data has no timestamp, LogScale adds @timestamp.

  3. The optional field removal filter processes data if configured in the parser settings. (Falcon LogScale only)

  4. LogScale calculates usage cost.

  5. A parser processes the events, and extracts or normalizes fields.

Note

For details of possible exceptions to the previously described flow, see the table “Usage Calculations and Exclusions”.

The usage cost is determined by the size of the incoming data (@rawstring). It also includes the fields needed for storage and classification, such as @timestamp and any tags. Standard fields not extracted or derived from @rawstring are also included.

LogScale uses the following formula when calculating the amount of data ingested:

ingestAfterFieldRemovalSize = @rawstring + explicit fields + explicit tags - removed explicit fields

Each of the items in this calculation are described in the following table, along with any exclusions that may apply:

Table: Usage Calculations and Exclusions

Item Included in usage calculation Exclusions
@rawstring The length of the raw event string. All content of @rawstring counts towards usage. None. Note, data can't be removed from the @rawstring using field removal functionality.
fields Any fields (keys and values) added by the client. For example, fields added when using the structured or HEC endpoints. Fields added by Log Collector. Fields derived from the @rawstring. Fields added by a parser.
tags Any tags (keys and values) added by the client. For example, tags added when using the structured or HEC endpoints. Tags derived from @rawstring. Tags added by a parser.
Removed fields Fields removed using the parser field removal functionality do not count towards usage costs. See Optimize Ingestion for more details. Tags can't be removed using field removal. Fields derived from @rawstring can't be removed using this feature.

Note

Your data needs a @timestamp field to be searchable in LogScale. If the field does not exist, LogScale adds it. This field counts as an ingest cost even if extracted from @rawstring.

To measure the ingest amount, query the humio-usage repository for ingestAfterFieldRemovalSize. See The humio-usage Repository for more details.

To monitor current usage in Organization Settings, see Usage Page.

Data Not Measured

The following fields do not count towards the usage cost calculation:

  • If you use Falcon LogScale Collector, metadata (fields) added by the Log Collector. These typically start with @collect.*.

  • Fields that the field removal feature in parser settings removes.

  • Fields and tags derived by the parser from the @rawstring.

  • Fields added by parsers, including CPS parsers. Examples include fields such as Cps.version and Parser.version. See CrowdStrike Parsing Standard (CPS) 1.0 for further details.

Examples

The following sections describe some example events and indicate what data would be included for usage costs, and any exceptions that might apply.

Simple Rawstring

In this scenario, LogScale receives syslog entries containing a timestamp. A CPS-compliant parser parses them. The client adds no explicit fields, and field removal is not used. The event splits over two lines, so LogScale treats it as two events.

Example event:

1,2022/09/08 10:30:27,7200002624,GTP,end,0,2022/09/08 10:30:26,165.225.32.114,172.20.0.10,0:0:303:303::,0:0:404:404::,test,test,test,test,vsys1,test,test,ethernet1/123,ae124.1,logfwd,1969/12/31
16:00:00,0,1,12345,54321,12345,54321,0xffffffff,tcp,allow,Abnormal GTP-U message with out of order IE

The following table shows the costs and any exclusions:

Included in cost Excluded
Two events. First line is 193 bytes. Second line is 101 bytes. One @timestamp is added for each event (line) at 33 bytes each. Total is 360 bytes. Fields added by the parser, except @timestamp.
Event With Explicit Fields Added

In this scenario, a Cribl instance sends raw log data to NG-SIEM. This includes _raw with a timestamp and additional explicit fields. A CPS-compliant parser processes the event. On NG-SIEM there is no field removal option for the parser.

Example event:

json
[
  {
    "_raw": "178.146.234.131 - Qiv41020 [31/Jan/2025:16:12:11 +0000] \"GET /efficient/wireless/relationships/deliver\" 204 2338",
    "host": "web03.cribl.io",
    "sourcetype": "access_common",
    "source": "/var/log/apache/access.log",
    "_time": 1738339931.877,
    "cribl_test": "cribl-zz_macgyver_dev_linux_vm_debug_logs_241231"
  }
]

The following table shows the costs and any exclusions:

Included in cost Excluded
Entire rawstring, _raw, in bytes + 33 bytes for added @timestamp + all of the additional (explicit) fields (key value pairs) in bytes. Fields added by the CPS-compliant parser, except @timestamp.