Troubleshooting: Disks Filling Up

Condition or Error

Disks used by LogScale fill up with data

LogScale runs out of disk space

Disk space usage increases and space is not recovered

Log shippers may see HTTP 404 errors if nodes have failed

LogScale may reject ingestion with HTTP 502 errors

Causes

Data within LogScale is stored within segments. Data is stored in the configured primary storage location in two situations:
1. When the segments are created
2. When they are downloaded from buckets due to a query.
In some cases, LogScale's local disks fill up before segments can be deleted fast enough, or the configuration has been set incorrectly.
To confirm the disk usage situation:
1. You can check the Primary Disk Usage graph within the LogScale Insights package.
  Figure 1. Graph of Disks Filling Up
2. Use df to check the disk space:
  shell
  Filesystem 1K-blocks Used Available Use% Mounted on udev 1967912 0 1967912 0% /dev tmpfs 399508 1640 397868 1% /run /dev/sda5 19992176 9021684 9931900 48% / tmpfs 1997540 0 1997540 0% /dev/shm tmpfs 5120 4 5116 1% /run/lock tmpfs 1997540 0 1997540 0% /sys/fs/cgroup /dev/sda1 523248 4 523244 1% /boot/efi /dev/sdb1 20510332 557992 18903816 3% /kafka /dev/sdc1 19992176 13588164 5365420 72% /humio
  In this case LogScale data is mounted in /humio and we can see it's 72% in use - disk space usage above 85% probably indicates that the disk space is being exhausted.
  Further diagnosis of the issue depends on the storage configuration:
  - If secondary storage is NOT enabled, check that LOCAL_STORAGE_MIN_AGE_DAYS and LOCAL_STORAGE_PERCENTAGE are set to sensible values. For example:
    ini
    LOCAL_STORAGE_PERCENTAGE=80 LOCAL_STORAGE_MIN_AGE_DAYS=0
    Important
    These configurations are only valid if bucket storage has been configured.
  - If bucket storage and secondary storage are enabled, check the values set for PRIMARY_STORAGE_PERCENTAGE and LOCAL_STORAGE_PERCENTAGE LogScale will fill the primary storage up to the limit specified by PRIMARY_STORAGE_PERCENTAGE, then the oldest segments (in terms of when they were ingested, not when they were last used) get moved to the secondary. Once secondary storage fills to the LOCAL_STORAGE_PERCENTAGE, LogScale will start deleting the least-recently used files from the secondary disk in order.

Solutions

Resolving the issue if LogScale nodes are up:
- Identify heavy repositories and add retention to trim data.
  In the short term, we need to remove data to stop LogScale disks from filling up.
  To do this you can run this query in the humio repository to see which are the heavy repositories:
  syslog
  class = "*c.h.r.RetentionJob*" "Retention-stats for 'dataspace'=" | timechart(dataspace, function={max(before_compressed)},unit=bytes,span=30min)
  This will show you how much storage is being used for each of your repositories. Try and target the highest repositories and add retention to those where appropriate. To add retention, go to Repository →Settings → Data Retention and either add a time limit or a storage size limit less than what is currently set.
- Kill all queries OR kill the most resource intensive queries for a short period of time to allow disk utilization to come down.
- Temporarily disable the node(s) with the highest disk utilization for a short period of time to allow disk usage to come down. This is where the chart in the Primary Disk Usage widget comes in handy. It will show values per node.
- Check your LogScale version. Improvements to better manage disk utilization are in LogScale Humio Server 1.30.1 LTS (2021-10-01), Humio Server 1.31.0 GA (2021-09-27) and Humio Server 1.32.0 LTS (2021-10-26), with each subsequent version offering more improvements. v1.31 introduced Improved handling of local disk space relative to LOCAL_STORAGE_MIN_AGE_DAYS. Previously, the local disk could overflow when respecting that config, LogScale can now delete the oldest local segments that are present in bucket storage, even when they are within that time range.

Knowledge Base

Troubleshooting: Disks Filling Up

Important

Other articles on this topic

Enter search term