Measure Data Ingest
A key consideration is that LogScale attempts to consider only the data you're ingesting as counting towards license usage. The raw event string may have additional fields and tags added to it before it is ingested by LogScale - these are referred to as explicit fields and explicit tags.
Data is then passed to the optional field removal phase, where explicit fields can be removed. Explicit field removal is a setting in the parser, and is described in the Removing Fields documentation. Note that field removal occurs before parsing of the data, and any fields removed at this stage do not then count towards usage. The ingested data is then parsed. Any fields or tags derived by the parser from the raw event string are also not counted towards usage.
Important
NG-SIEM does not currently support the field removal feature in parser settings, only Falcon LogScale does. We recommend that you drop unwanted data elsewhere, for example in the Log Collector.
A summary of the LogScale data ingest flow is as follows:
Data is submitted from the client with the optional addition of explicit fields and tags.
LogScale receives data and adds the necessary fields and tags required for storage. For example, if there is no timestamp in the submitted data, then @timestamp is added.
Data is processed by the optional field removal filter if configured in the parser settings. (Falcon LogScale only)
Usage cost calculation is carried out.
A parser processes the events, and extracts or normalizes fields.
Note
See the table later in this section for details of possible exceptions to the previous flow.
The usage cost is determined by the size of the incoming data (@rawstring) plus the fields needed for storage and classification, such as @timestamp and any tags and standard fields not extracted or derived from @rawstring.
LogScale uses the following formula when calculating the amount of data ingested:
ingestAfterFieldRemovalSize = @rawstring + explicit fields + explicit tags - removed explicit fields
Each of the items in this calculation are described in the following table, along with any exclusions that may apply:
Item | Included in usage calculation | Exclusions |
---|---|---|
@rawstring | The length of the raw event string. All content of @rawstring counts towards usage. | None. Note, data can't be removed from the @rawstring using field removal functionality. |
fields | Any fields (keys and values) added by the client. For example, fields added when using the structured or HEC endpoints. | Fields added by Log Collector. Fields derived from the @rawstring. Fields added by a parser. |
tags | Any tags (keys and values) added by the client. For example, tags added when using the structured or HEC endpoints. | Tags derived from @rawstring. Tags added by a parser. |
Removed fields | Fields removed using the parser field removal functionality do not count towards usage costs. See Optimize Ingestion for more details. | Tags can't be removed using field removal. Fields derived from @rawstring can't be removed using this feature. |
Note
Your data needs to have a @timestamp field in order to be searchable in LogScale. If it is not present, it is added. This field counts as an ingest cost even if it is extracted from the @rawstring.
To measure the ingest amount, query the humio-usage repository for ingestAfterFieldRemovalSize. See The humio-usage Repository for more details.
To monitor current usage in Organization
Settings
, see
Usage Page.
Data Not Measured
The following fields do not count towards the usage cost calculation:
If you use Falcon LogScale Collector, metadata (fields) added by the Log Collector. These typically start with @collect.*.
Fields that are removed using the field removal feature in parser settings.
Fields and tags derived by the parser from the @rawstring.
Fields added by parsers, including CPS parsers. Examples include fields such as Cps.version and Parser.version. See CrowdStrike Parsing Standard (CPS) 1.0 for further details.
Examples
The following sections describe some example events and indicate what data would be included for usage costs, and any exceptions that might apply.
Simple Rawstring
In this scenario, LogScale receives syslog entries containing a timestamp, which are parsed by a CPS-compliant parser. There are no explicit fields added by the client, and field removal is not used. The event is split over two lines, so is treated as two events.
Example event:
1,2022/09/08 10:30:27,7200002624,GTP,end,0,2022/09/08 10:30:26,165.225.32.114,172.20.0.10,0:0:303:303::,0:0:404:404::,test,test,test,test,vsys1,test,test,ethernet1/123,ae124.1,logfwd,1969/12/31
16:00:00,0,1,12345,54321,12345,54321,0xffffffff,tcp,allow,Abnormal GTP-U message with out of order IE
The following table shows the costs and any exclusions:
Included in cost | Excluded |
---|---|
Two events. First line is 193 bytes. Second line is 101 bytes. One @timestamp is added for each event (line) at 33 bytes each. Total is 360 bytes. | Fields added by the parser, except @timestamp. |
Event With Explicit Fields Added
In this scenario, a Cribl instance sends raw log data,
_raw
including a timestamp, including additional
explicit fields, to NG-SIEM. The event is processed by a
CPS-compliant parser. On NG-SIEM there is no
field removal option for the parser.
Example event:
[
{
"_raw": "178.146.234.131 - Qiv41020 [31/Jan/2025:16:12:11 +0000] \"GET /efficient/wireless/relationships/deliver\" 204 2338",
"host": "web03.cribl.io",
"sourcetype": "access_common",
"source": "/var/log/apache/access.log",
"_time": 1738339931.877,
"cribl_test": "cribl-zz_macgyver_dev_linux_vm_debug_logs_241231"
}
]
The following table shows the costs and any exclusions:
Included in cost | Excluded |
---|---|
Entire rawstring, _raw , in bytes + 33 bytes for
added @timestamp + all of the
additional (explicit) fields (key value pairs) in bytes.
| Fields added by the CPS-compliant parser, except @timestamp. |