Humio Insights Segments & Datasources Dashboard

In Humio there is this concept of Segments and Datasources.

A Segment is a file stored on a Humio node that contains the compressed ingested data. Segment files sit under the directories of Datasources.

Datasources are part of repositories and are essentially split up into the combinations of repositories and tag. This is where the stream of data for each repository and tag will sit under.

This dashboard will show you information regarding your clusters segments and datasources.

Segment Merges Per Hour

In Humio, as ingest is coming in Humio is building up these segment files and then storing them for use in querying later. As these segment files get build it first builds mini-segment to then be merged later into the segment file.

This widget shows the CPU time spent per Humio host per hour on merging these segment files.

Merged Segments Sizes (Bytes)

This is median, 75th and 95th percentile of the size of the file for segments created by merging mini-segments.

On average this value will probably be around 50-500MB. Keeping this value below 1GB is a sign of a healthy cluster. Anything above that may be concerning so Humio Support can help here.

Number Of Datasources In Repositories

So a Datasources is this combination of repositories and the tags created within that repositories. This widget shows the number of datasources per repository which essentially means the total number of tag values in each repositories.

There is a hard limit in Humio of 10k datasources to keep Humio efficient and almost all cases you shouldn’t need to go above this limit. If you exceed this limit then ingest will be blocked for that repository to stop it creating too many datasources.

The number of max datasources in a Humio cluster is controlled by the MAX_DATASOURCES configuration.

For a healthy system I would not expect repository tags to exceed 100 for a repository and if only absolutely necessary.

Global Snapshot File Size

In Humio we have this file called the global-data-snapshot.json also known as Global which essentially holds all of the key information about the Humio cluster and is constantly updated across all nodes. It is where all metadata on repositories, users and all the other objects you can create through the UI is stored. It also holds the metadata on the “segment files” that holds the events shipped to Humio.

This “Global” file is handled by Humio and should be kept as small as possible to keep performance fast within Humio. A healthy system should not see the Global snapshot file exceed past 1GB and is ideally <500MB. If this is not the case this should be discussed with Humio support.