Storage Capacity

The digestion capacity affects how the data is stored on the individual nodes within the system and how this data is replicated and stored.

To identify how much data needs to be stored, identify:

  • Expected ingest per day (ingest_per_day)

  • Replication factor (replication)

  • Retention duration (retention)

  • Compression factor (compression); typically a factor of 10 (i.e. compressed data is 10% of original)

To calculate total ingest volume per day (before replication):

ingest_per_day/compression

To calculate storage requirements per day:

ingest_per_day/compression * replication

To calculate storage requirements for retention period:

((ingest_per_day * retention)/compression) * replication

For example, 20TB/day, default replication of 3 and 30 days retention:

((20*30)/10)*3

Or approximately 180TB across the cluster, including any bucket storage.

The storage capacity on local disks affects the query performance and how much data will be sent to bucket storage if it's configured. Using some of the same values, it's possible to calculate the storage duration for incoming data. The LOCAL_STORAGE_PERCENTAGE configuration variable is used to configure how much of the local disk is used to store data before it is offloaded to bucket storage.

For example, with the same figures as above, in a cluster with 20 nodes (nodes), each with 4TB local storage (local_disk) and LOGCAL_STORAGE_PERCENTAGE=80:

To calculate the overall storage capacity for the cluster for active segment data:

nodes * local_disk * (LOCAL_STORAGE_PERCENTAGE/100)

To determine the storage duration:

(nodes * local_disk * (LOCAL_STORAGE_PERCENTAGE/100))/((ingest_per_day/compression)*replication)

Using those figures above:

(20 * 4 * (80/100))/((20/10)*3)

Or approximately 10 days.