Hosts Dashboard

This dashboard is part of the Insights package. It will show you information related to each of your Humio nodes. When a node is having a problem, this can be useful in helping diagnose it.

CPU Usage in Percent

This shows you how much CPU each Humio node is utilizing. Within a cluster, if each Humio node has the same specifications, and the digest and storage partitions are evenly distributed, you would expect each Humio node to have about the same CPU usage.

If some nodes are experiencing particularly high usage, this indicates that something is wrong with Humio or your cluster setup.

CPU Usage: Thread Group Ticks

This widget shows the number of CPU ticks used by each Humio thread group. Within a Humio cluster, the number of threads is usually dictated by the number of cores running on each node. The number of threads is then assigned to each particular thread group. The names of the groups should logically indicate the type of work being done by Humio. This widget can then help indicate the amount of time being spent by the CPU for each of these thread groups.

A common thread group to see consuming too many resources is the humio-akka thread group. This is the group responsible for handling network requests. Therefore, this may take up more time since it spends a lot of time being idle waiting for responses.

Another common thread group is the digester, which is the thread that handles digesting all new data coming into Humio.

JVM Garbage Collection Time

Humio is built to run on the JVM. Therefore, we need to monitor the amount of Garbage Collection being done by the JVM. If Humio is spending a lot of time doing Garbage Collection, this could be consuming plenty of resources and thereby stop Humio from doing useful work such as digesting new data or running queries.

If there is a particular node doing much of the Garbage Collection, it could be worth restarting that node to see if it helps.

Memory: System Usage Percentage

This widget shows you the percentage of memory being used for caching in comparison to each Humio node’s total memory allocation. Humio utilizes memory for caching segment files being used in queries for speed and performance.

Missing Nodes

This is a Humio Metric under the name missing-cluster-nodes. This metric is reported by each Humio node and shows the number of nodes that each node has indicated as dead. A healthy system has zero of these.

Node Shutdowns

This is a timechart showing which vHost (i.e., Humio Node ID) has shutdown in the given time range. If a node shutdown is unexpected, there could be ERROR logs explaining why.

Failed HTTP Checks

This is a Humio Metric named failed-http-checks. This is the number of nodes that appear to be unreachable using http, reported by each Humio node.

Networking (Bytes per second)

For each Humio node, this timechart shows the amount of bytes per second being transmitted and received by the network devices on each node.

This can be useful in diagnosing network throughput on each Humio node, especially if some nodes are slower than expected or if nodes are losing packets due to network issues.

Open File Descriptors

This is a Humio metric which shows the number of current open file descriptors on each Humio node.

Humio needs to be able to keep plenty of files open for sockets and actual files from the file system. The default limit on Linux systems is usually too low. See this documentation page for more information around increasing the File Limit in Linux.

CPU Architecture

This table illustrates you each Humio node’s CPU architecture. It can be a useful reference. It will tell you which processor each node has, along with the number of vCPUs, threads per core and how much it holds for L1-L3.

Cluster Time Skew

This is a timechart for each Humio node showing the largest time skew in milliseconds between this node and any other node in the cluster.

Keeping the time skew between Humio nodes as low as possible is important as Humio relies on system times being accurate for it to work as expected. To keep the time skew low between nodes, keep the nodes synced using something like NTP.

Logged Events

This is a timechart showing the number of Humio logged events per Humio node.

Humio Versions

This timechart shows the Humio versions that have been applied onto the cluster in the past 24 hours. This can be useful to correlate if the time of an upgrade may correlate to a change happening in another widget.

Primary Disk Usage

This shows a timechart of the Primary Local storage disk usage in percent. Humio by default limits disk usage to 85% to avoid disks reaching their maximum capacity. It’s very important not to let this happen as it could result in loss of data.

Secondary Disk Usage

If you have Secondary Storage configured, this timechart will show you the disk usage in percent of your secondary disk.