Data Retention

For the retention of data for security and compliance, you can set LogScale to delete old data. You can configure retention settings in the user interface based on compressed file sizes, uncompressed file sizes, and age of data. Retention deletes events in large chunks or segments. It doesn't delete individual events. The three types of retention enforced are independent. Data is deleted when any one of them marks data for deletion.

Retention is set on a repository basis; you can retain data in different repositories for different durations. Changes to these settings can be controlled on a per repository basis through the role-based access control.

To view the data storage and compression, see Cluster Statistics.

To learn how to change data retention settings, see Data Retention.

Retention Type Based On Considers Replicas Primary Use Case
Compressed Size Amount of disk space consumed by compressed data Yes Prevent file system from growing too large
Uncompressed Size Size of data before compression (internal format) No Keep at least a specified amount of input data
Age of Data Value of @timestamp field in events No Delete data older than a specified time period

Compressed Size

The compressed setting enables you to prevent the file system from growing too large. Configure the compressed settings for each repository so that the sum of all compressed sizes is less than the space available on the disk.

The compressed size calculation deletes data based on the amount of disk space consumed. It takes replicas into account until the amount on disk is below the setting. Replicas are handled by counting the copies in excess of the segment-replication settings as extra.

For example, consider a cluster of three LogScale instances with a segment-replication of three and a CompressedSize of 50 GB. The total disk usage for this repository would be 150 GB on those three devices. You can see 50 GB of compressed data. For more information about how multiple-byte numbers are represented, see LogScale Multiple-byte Units.

If the segment replication setting is then changed to two, the allowed disk usage drops to 100 GB in total on the three devices. The retention-job will then delete the oldest segments. This leaves approximately 33 GB of searchable data at first. When more data flows in through ingest, you will get back to having 50 GB of searchable compressed data in the 100 GB on disk. This data is likely distributed evenly as 33 GB on each LogScale instance in the cluster.

Uncompressed Size

The uncompressed setting is designed to delete data based on a promise to keep at least this much of the input. Original size is measured as the size stored before compression and is thus the size of the internal format, not the data that was ingested. It also includes the size of any additional fields sent along with the raw events.

The uncompressed size retention triggers a delete when it is able to retain at least the amount specified as uncompressed limit. Uncompressed retention does not consider multiple replicas as more than one copy, as it is based on the amount of data that you see.

Age of Data

Data gets deleted when the latest event in the chunk is older than the configured retention using the value of the @timestamp field in the ingested data. To ensure that you cannot see events older than the configured limit, LogScale also restricts the time interval when searching to the interval allowed by this retention setting. Retention by age effectively hides any event that is too old, even if the chunk still has other events that are still visible. The disk space is reclaimed once the latest event is sufficiently old.

Performance Tuning for Long Retention

The defaults for LogScale are targeting retention times of data in the range of 1-6 months. If you plan to keep data for much longer you can reduce the number of files stored on disk by telling LogScale to create larger files. As retention removes old data in chunks consisting of one file at a time, this will make those chunks larger.

If your LogScale repositories expect to have a retention of more than 6 months on average, you can increase the amount of data in each file, thus reducing the total number of files in the system. Changing these settings on a LogScale cluster has effect for files created after the change and making such a change is okay at any point in time.

ini
# The default value in LogScale for installs where the average data retention is 0-6 months.
# MAX_HOURS_SEGMENT_OPEN=24

# Suggest using the default for LogScale installs where the average data retention is 6-18 months.
# MAX_HOURS_SEGMENT_OPEN=48

# Suggest using the default for LogScale installs where the average data retention is 2+ years.
# MAX_HOURS_SEGMENT_OPEN=96