Preparation for Installing Humio
There are a few things to do to prepare to install Humio. Scroll down this page to see the headings for each section.
Hardware Requirements
Hardware requirements depend on how much data you will be ingesting, and how many concurrent searches you will be running.
Scaling your Environment
Humio was made to scale, and scales very well within the nodes in a cluster. Running a cluster of three or more Humio nodes provides higher capacity in terms of both ingest and search performance, and also allows high availability by replicating data to more than one node.
If you want to run a clustered node please review Cluster Setup.
Estimating Resources
Here are a few guidelines to help you determine what hardware you'll need.
Assume data compresses 9x on ingest. Test your installation; better compression means better performance.
You need to be able to hold 48 hours of compressed data in 80% of your RAM.
You want enough hyper-threads/vCPUs (each giving you 1GB/s search) to be able to search 24 hours of data in less than 10 seconds.
You need disk space to hold your compressed data. Never fill your disk more than 80%.
For information on how to choose hardware, and how to size your Humio installation, see the Instance Sizing.
Example Setup
Your machine has 64 GB of RAM, 8 hyper-threads (4 cores) and 1 TB of storage. Your machine can hold 460 GB of ingest data compressed in RAM and process 8 GB/s. In this case, it means 10 seconds worth of query time will run through 80 GB of data. So this machine fits an 80 GB/day ingest, with +5 days' data available for fast querying. You can store 7.2 TB of data before your disk is 80% full, corresponding to 90 days at 80 GB/day ingest rate.
This example assumes that all data has the same Data Retention. But you can configure Humio to automatically delete some events before others, allowing some data to be kept for several years while other data gets deleted after one week, for example.
For more details, refer to our Instance Sizing page.
Enable Authentication
For production deployments, you want to set up authentication. If
authentication is not configured, Humio runs in
NO_AUTH
mode, meaning that there
are no access restrictions at all — anyone with access to the
system can do anything. Refer to Authentication
Configuration for different login options.
Incidentally, if you only want to experiment with Humio, you can probably skip this documentation page.
Increase Open File Limit
For production usage, Humio needs to be able to keep a lot of files open for sockets and actual files from the file system. The default limits on Unix systems are typically too low for any significant amount of data and concurrents users.
You can verify the actual limits for the process using:
PID=`ps -ef | grep java | grep humio-assembly | head -n 1 | awk '{print $2}'`
cat /proc/$PID/limits | grep 'Max open files'
The minimum required settings depend on the number of open network connections and datasources. There is no harm in setting these limits high for the Humio process. A value of at least 8192 is recommended.
You can do that using a simple text editor to create a file named
99-humio-limits.conf
in the
/etc/security/limits.d/
sub-directory. Copy these
lines into that file:
# Raise limits for files:
humio soft nofile 250000
humio hard nofile 250000
Create another file with a text editor, this time in the
/etc/pam.d/
sub-directory, and name it
common-session
. Copy these lines into it:
# Apply limits:
session required pam_limits.so
These settings apply to the next Humio user login, not to any running processes.
If you run Humio using Docker, then you can raise the limit using the
--ulimit="nofile=8192:8192"
option on the docker run command.
Separate Disk for Kafka Data
For production usage, you should ensure Kafka's data volume is on a separate disk/volume from the other Humio services. This is because it's quite easy for Kafka to fill the disk it's using if Humio ingestion is slowed down for any reason. If it fills its disk, having it on a separate disk/volume than the other services will prevent them from crashing along with Kafka and will make recovery easier. If Kafka is running as separate servers or containers, you will likely be covered already, so this is primarily for situations where you're running the all-in-one Docker image we supply.
We also highly recommend setting up your own disk usage monitoring to alert you when disks get greater than 80% full, so you can take corrective action before the disk fills completely.
Check noexec
on /tmp
Check the filesystem options on /tmp
. Humio makes
use of the Facebook Zstandard real-time compression algorithm, which
requires the ability to execute files directly from the configured
temporary directory.
The options for the filesystem can be checked using mount:
$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,noexec,relatime,size=1967912k,nr_inodes=491978,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=399508k,mode=755,inode64)
/dev/sda5 on / type ext4 (rw,relatime,errors=remount-ro)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,seclabel)
You can temporarily remove noexec
using
mount to 'remount' the directory:
mount -oremount,exec /tmp
To permanently remove the noexec
flag, update
/etc/fstab
to remove the flag from the options:
tmpfs /tmp tmpfs mode=1777,nosuid,nodev 0 0
Recommended Installation Architectures
Bare-Metal/VM/Kubernetes Workers
Assumptions:
30 Day Retention on NVME
20% Overhead left on NVME
10x Compression
Secondary Storage can extend retention at slower speeds (SAN/NAS/RAID)
Kafka 5x Compression - 24 Hour Storage
Humio does not provide a self-hosted Kubernetes solution for Kafka and Zookeeper
Zookeeper/Kafka clusters are separate from Humio clusters to avoid resource contention and allow independent management.
X-Small - 1 TB/Day Ingestion
Software | Instances | vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 3 | 16 | 64 GB | NVME 2 TB | 6 TB |
Kafka | 3 | 4 | 8 GB | SSD 500 GB | 1.5 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | SSD 50 GB | 150 GB |
Small - 3 TB/Day Ingestion
Software | Instances | vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 3 | 32 | 128 GB | NVME 6 TB | 18 TB |
Kafka | 3 | 4 | 8 GB | SSD 1 TB | 3 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | SSD 50 GB | 150 GB |
Medium - 5 TB/Day Ingestion
Software | Instances | vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 6 | 32 | 128 GB | NVME 6 TB | 36 TB |
Kafka | 3 | 8 | 16 GB | SSD 1 TB | 3 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | SSD 50 GB | 150 GB |
Large - 10 TB/Day Ingestion
Software | Instances | vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 12 | 32 | 128 GB | NVME 6 TB | 72 TB |
Kafka | 6 | 8 | 16 GB | SSD 1 TB | 6 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | SSD 50 GB | 150 GB |
X-Large - 30 TB/Day Ingestion
Software | Instances | vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 30 | 32 | 128 GB | NVME 7 TB | 210 TB |
Kafka | 6 | 8 | 16 GB | SSD 1.5 TB | 13.5 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | SSD 50 GB | 150 GB |
AWS: EC2/EKS Workers
Assumptions:
Retention on NVME Varies due to fixed size, but is > 30 days
20% Overhead left on NVME
10x Compression
S3 Bucket storage used for longer retention
AWS Managed Kafka Service (MKS) for Zookeeper/Kafka
Humio does not provide a self-hosted Kubernetes solution for Kafka and Zookeeper
Zookeeper/Kafka clusters are separate from Humio clusters to avoid resource contention and allow independent management.
AWS EKS

Figure 3. Recommended AWS EKS
AWS Reference Architecture

Figure 4. AWS Reference Architecture
X-Small - 1 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 3 | i3.2xlarge / 8 | 61 GB | NVME 1.9 TB | 5.7 TB |
Kafka | 3 | kafka.m5.xlarge/ 4 | 16 GB | EBS 500 GB | 1.5 TB |
Zookeeper | MSK | MSK | MSK | MSK | MSK |
Small - 3 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 3 | i3.4xlarge / 16 | 122 GB | NVME 3.8 TB | 11.4 TB |
Kafka | 3 | kafka.m5.xlarge/ 4 | 16 GB | EBS 500 GB | 1.5 TB |
Zookeeper | MSK | MSK | MSK | MSK | MSK |
Medium - 5 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 6 | i3.8xlarge / 32 | 244 GB | NVME 7.6 TB | 45.6 TB |
Kafka | 3 | kafka.m5.2xlarge/ 8 | 16 GB | EBS 1.5 TB | 4.5 TB |
Zookeeper | MSK | MSK | MSK | MSK | MSK |
Large - 10 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 12 | i3.8xlarge / 32 | 244 GB | NVME 7.6 TB | 91.2 TB |
Kafka | 6 | kafka.m5.2xlarge/ 8 | 16 GB | EBS 1.5 TB | 9 TB |
Zookeeper | MSK | MSK | MSK | MSK | MSK |
X-Large - 30 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 30 | i3.8xlarge / 32 | 244 GB | NVME 7.6 TB | 228 TB |
Kafka | 9 | kafka.m5.2xlarge/ 8 | 16 GB | EBS 2 TB | 18 TB |
Zookeeper | MSK | MSK | MSK | MSK | MSK |
Google Cloud Platform (GCP)/Google Kubernetes Engine (GKE)
Assumptions:
30 Day Retention NVME
20% Overhead left on NVME
10x Compression
GCS Bucket storage used for longer retention
Humio does not provide a self-hosted Kubernetes solution for Kafka and Zookeeper
Zookeeper/Kafka clusters are separate from Humio clusters to avoid resource contention and allow independent management.
X-Small - 1 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 3 | n-standard-16 / 16 | 122 GB | NVME 3 TB | 9 TB |
Kafka | 3 | n-standard-8 / 8 | 32 GB | PD-SSD 500 GB | 1.5 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | PD-SSD 50 GB | 150 GB |
Small - 3 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 3 | n2-highmem-16 / 16 | 128 GB | NVME 5 TB | 18 TB |
Kafka | 3 | n-standard-8 / 8 | 32 GB | PD-SSD 500 GB | 1.5 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | PD-SSD 50 GB | 150 GB |
Medium - 5 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 6 | n-standard-32 / 32 | 128 GB | NVME 6 TB (16x375GB) | 36 TB |
Kafka | 6 | n-standard-8 / 8 | 32 GB | PD-SSD 1 TB | 6 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | PD-SSD 50 GB | 150 GB |
Large - 10 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 12 | n-standard-32 / 32 | 128 GB | NVME 6 TB (16x375GB) | 72 TB |
Kafka | 6 | n-standard-8 / 8 | 32 GB | PD-SSD 1 TB | 6 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | PD-SSD 50 GB | 150 GB |
X-Large - 30 TB/Day Ingestion
Software | Instances | EC2 Instance Type/vCPU | Memory | Storage | Total Storage |
---|---|---|---|---|---|
Humio | 30 | n-standard-64 / 64 | 256 GB | NVME 7.5 TB (16x375GB) | 225 TB |
Kafka | 9 | n-standard-8 / 8 | 32 GB | PD-SSD 1.5 TB | 13.5 TB |
Zookeeper | 3 | Shared with Kafka | Shared with Kafka | PD-SSD 50 GB | 150 GB |