Secondary or cold storage is intended for usages where the primary or hot storage is low-latency and fast, such as NVME, and the secondary is high-latency but very large, such as a SAN spanning many spinning disks. Only segment files get moved to the secondary storage. Older files get moved before younger ones.
When enabled, Humio will move segment files to secondary storage once the primary disk reaches the usage threshold set using PRIMARY_STORAGE_PERCENTAGE. Humio does not check what is using the space; it bases the decision on what the OS responds for “disk space used” and “disk space total” for the mount point that the primary data directory is on.
When the threshold is exceeded, Humio will copy files totaling the excess number of bytes to the secondary storage, and then delete the segment files from the primary data directory. The files are selected based on the latest event timestamp in them, to keep most recent events on the primary disk. This is done to get the best possible query performance from the assumed faster primary drive, since Humio is normally used for querying the latest data.
The extra storage gained is thus almost the available space on of the secondary data directory, as only a single segment file is ever present on both volumes at once.
Note that the secondary directory needs to be private to the Humio node, just like the primary directory does.
As an example, suppose you have a server with 1 TB NVME being used for system files, Kafka data, and Humio data. Adding a 2 TB SAN connection (or 2x2 TB local spinning disks in a mirror) and then designating that as secondary storage directory allows Humio to store up to 2.8 TB, while still querying the latest ~800 GB from the NVME, and also keeping all segment files still being constructed on the NVME. When searching beyond what the NVME holds, Humio will read from the slower disks.
Humio needs to be told where to store the secondary copies; that is, the location of the filesystem on the slower drive. When using Docker, make sure to mount the secondary directory into the container as well.
# SECONDARY_DATA_DIRECTORY enables the feature # and sets where to store the files. SECONDARY_DATA_DIRECTORY=/secondaryMountPoint/humio-data2 # PRIMARY_STORAGE_PERCENTAGE options decide the amount of data (Humio # and otherwise) that the drive holding the data directory must at least hold # before Humio decides to move any segments files to the secondary location. # If set to zero, Humio will move files to secondary as soon as possible, # which is when they become immutable completed segment files. # (Default 80) PRIMARY_STORAGE_PERCENTAGE=80
Say you have a slow disk as your only (and thus primary) disk for Humio. You add a new faster disk to the server, and want to use that disk as the primary, while leaving the bulk of the data on the old slow disk.
While this is possible, there is a bit of work involved, as only completed segment files can reside in the secondary storage. All other support files, and segment files in progress, need to reside on the primary disk. Humio must be shut down while this operation takes place.
Basically only files matching
bloom5*) can stay on the secondary storage; everything else must be on the primary. The tricky bit is moving the soft links
humiodata.current along with the file they point to.
You will need to move some specific files from the “new secondary” onto the “new primary” while the system is shut down for that to work, as some files must be on the primary. Here are their names, as they are below
/humio-data. The directory structure must be preserved.
For all the above
humiodata.currentsoft links, the file it points to as well.
If the above files are moved from the secondary to the primary, you may leave the remaining segment files, and start out with almost all data being on secondary. Or, if you want, move selected parts of the completed segment files from secondary to primary as well, to get improved performance from the new disk on searches that hit those. One could move all segments that are less than seven days old if that matches the search typical search range for the system.
Humio will not move files from secondary back to primary. Once the primary is full, Humio will start migrating segment files from primary to secondary.
When expanding the capacity for primary or secondary storage, and online expansion is not an option, you can move the existing data stored on the partitions and minimize downtime using rsync. Rsync allows you to sync only the new data between directories.
Assuming you’re moving secondary data storage from
/var/lib/humio-secondary-new you can do an initial rsync while Humio is running. Make sure your new mount has its owner and user set appropriately, generally
rsync -acv /var/lib/humio-secondary/ /var/lib/humio-new-secondary/
This can be run multiple times.
When you are ready to complete the move, start by stopping Humio:
systemctl stop humio
This command may differ depending on your Humio deployment. To move the data written since your last rsync, the delete option will remove no longer needed files from the destination. Double check your source and destination directories.
rsync -acv --delete /var/lib/humio-secondary/ /var/lib/humio-new-secondary/
There are two options for having Humio use the new partition, you can either update the Humio configuration to use the new mount or you can remount the new partition in the place of the old partitions mount point.
To update the Humio configuration appropriately, change either the value of the DIRECTORY or SECONDARY_DATA_DIRECTORY option in
/etc/humio/server_XX.conf file to point to the new mount. Then restart Humio:
systemctl start humio
To unmount and mount the new partition in place of Humio, make sure first that the appropriate changes are made for your OS, such as updating the
/etc/fstab, and mount the new partition in the location of the old one. Once that’s done, start Humio.