Troubleshooting: Disks Filling Up

Last Updated: 2022-03-31

Condition or Error

Disks used by Humio fill up with data

Humio runs out of disk space

Disk space usage increases and space is not recovered

Log shippers may see HTTP 404 errors if nodes have failed

Humio may reject ingestion with HTTP 502 errors

Causes

  • Data within Humio is stored within segments. Data is stored in the configured primary storage location in two situations:

    1. When the segments are created

    2. When they are downloaded from buckets due to a query.

    In some cases, Humio's local disks fill up before segments can be deleted fast enough, or the configuration has been set incorrectly.

    To confirm the disk usage situation:

    1. You can check the Primary Disk Usage graph within the Humio Insights package.

      Graph of Disks Filling Up

      Figure 272. Graph of Disks Filling Up


    2. Use df to check the disk space:

      Filesystem 1K-blocks Used Available Use% Mounted on
       udev 1967912 0 1967912 0% /dev
       tmpfs 399508 1640 397868 1% /run
       /dev/sda5 19992176 9021684 9931900 48% /
       tmpfs 1997540 0 1997540 0% /dev/shm
       tmpfs 5120 4 5116 1% /run/lock
       tmpfs 1997540 0 1997540 0% /sys/fs/cgroup
       /dev/sda1 523248 4 523244 1% /boot/efi
       /dev/sdb1 20510332 557992 18903816 3% /kafka
       /dev/sdc1 19992176 13588164 5365420 72% /humio

      In this case Humio data is mounted in /humio and we can see it's 72% in use - disk space usage above 85% probably indicates that the disk space is being exhausted.

      Further diagnosis of the issue depends on the storage configuration:

Solutions

  • Resolving the issue if Humio nodes are up:

    • Identify heavy repositories and add retention to trim data.

      In the short term, we need to remove data to stop Humio disks from filling up.

      To do this you can run this query in the humio repository to see which are the heavy repositories:

      class = "*c.h.r.RetentionJob*" "Retention-stats for 'dataspace'="
       | timechart(dataspace, function={max(before_compressed)},unit=bytes,span=30min)

      This will show you how much storage is being used for each of your repositories. Try and target the highest repositories and add retention to those where appropriate. To add retention, go to Repository -> Settings -> Data Retention and either add a time limit or a storage size limit less than what is currently set.

    • Kill all queries OR kill the most resource intensive queries for a short period of time to allow disk utilization to come down.

    • Temporarily disable the node(s) with the highest disk utilization for a short period of time to allow disk usage to come down. This is where the chart in the Primary Disk Usage widget comes in handy. It will show values per node.

    • Check your Humio version. Improvements to better manage disk utilization are in Humio Humio Server 1.30.1 Stable (2021-10-01), Humio Server 1.31.0 Preview (2021-09-27) and Humio Server 1.32.0 Stable (2021-10-26), with each subsequent version offering more improvements. v1.31 introduced Improved handling of local disk space relative to LOCAL_STORAGE_MIN_AGE_DAYS. Previously, the local disk could overflow when respecting that config, Humio can now delete the oldest local segments that are present in bucket storage, even when they are within that time range.