Digest Capacity

Digest processes incoming data from Kafka, assembles it into segments, and then distributes and stores the data on storage nodes. Digest capacity can have an impact on :

  • Live query performance, which is processed while segments are being stored

  • Storage of incoming data, including replicas of the information across the cluster

  • Preparation of segment data (including compression) so that it can be offloaded to bucket storage

Digest capacity is impacted by:

  • Incoming data rate

  • Tags and datasources and sharding

    The combination of tags, datasources and the sharding of the data implies the number of the CPU cores required to digest it. Typically one core is able process 256GB/day for a given shard on a datasource. Increasing the datasources (for example through using tags with a higher number of unique values) requires a higher number of cores to process.

  • CPU cores

    As noted above, datasources and shards require processing by a CPU core. If there are not enough CPU cores to process the incoming data, it can slow down digest and therefore ingest of information.

When calculating and monitoring the digest capacity, you should monitor:

  • Overall CPU performance

    If the CPU usage is very high it may indicate that the.

  • Check the Datasources Increasing or Decreasing Auto-Sharing and Datasources Hitting Max Auto-Shards widgets within the humio/insights Ingest dashboard.

  • Number of repositories

    All of the above metrics are impacted by repository; if ingesting into multiple repositories also check the Number of Datasource per Repository to determine if a single repository has a higher number of datasources than another.

  • Digest and Storage nodes

    The number of digest nodes (and the CPU and datasources for each) impacts the amount of data that can be digested. Increasing the number of the digest nodes, or adding more nodes with the digest role, increases the digest capability across each datasource. Increasing the number of storage nodes has an impact on digest overall as it affects how the segments are replicated across the cluster, and how the segment data will be stored, affecting the duration of the data and when it will be offloaded to bucket storage.