Humio Metrics

Humio generates a number of metrics that can be used to monitor and operate Humio itself.

JMX

Humio can expose all metrics over JMX. To enable this, you need to set the standard JMX options to your JVM by adding them to the HUMIO_JVM_ARGS configuration.

Below is an example of this:

ini
HUMIO_JVM_ARGS=-Dcom.sun.management.jmxremote \
               -Dcom.sun.management.jmxremote.authenticate=false \
               -Dcom.sun.management.jmxremote.ssl=false \
               -Dcom.sun.management.jmxremote.port=5000

Prometheus

Setting the PROMETHEUS_METRICS_PORT configuration will enable Prometheus to scrape metrics from Humio. More information on configuring this is available on our Prometheus integration page.

Humio Debug Logs

The Humio debug log also contains all the internal Humio metrics. In a standalone or on-premise installation, you can find them within the humio repository or the humio-metrics repository. On cloud, you can find them in the humio-organization-metrics view.

Metric Types

There are two types of metrics in Humio; node level metrics and object level metrics. Objects include repositories, ingest listeners, or storage partitions. For example ingest metrics for a given repo can be obtained by viewing ingest-bytes/<repo>.

Node Level Metrics

Metric Name

Description

bucket-storage-fetch-for-query-queue

Count of segment files queued awaiting fetch from Bucket Storage to local data store due to being referred by a query

bucket-storage-pending-upload

Total size of segment files pending upload to Bucket Storage

bucket-storage-pending-upload-underreplicated

Total size of segment files pending upload to Bucket Storage for files that are not known to have more than one replica in the local cluster

bucket-storage-total-segment-size

Total size of segment files stored in Bucket Storage

cluster-time-skew

Largest time skew (in milliseconds) between this node and any other node in the cluster

compact-timestamp-found

Total number of events for which the findTimestamp function found a timestamp in the compact format

day-month-year-timestamp-found

Total number of events for which the findTimestamp function found a timestamp in the day-month-year format

digest-active-datasources

Number of active datasources

digest-buffer-target-latency

Latency target of in-memory buffer after ingest queue in digest pipeline

digest-coordinator-changes

Number of changes to the set of active digest nodes triggered by digest coordination. For a healthy system this is close to zero, except when an administrator alters the desired digest partition scheme

digest-live-latency

Latency of live update part of digest pipeline for internal bulks in milliseconds

digest-segment-latency

Latency of segment building part of digest pipeline for internal bulks in milliseconds

direct-memory-allocated

Used for internal debugging. Amount of direct memory allocated by the Humio application. This does not account for every direct memory allocation in the JVM

elastic-search-ingestion-events-in-bulk

Number of events found in an elastic-search bulk

elastic-search-ingestion-request-errors

Number of ingest errors in the elastic-search endpoint since the node started

elastic-search-ingestion-requests

Time spent ingesting a bulk request using the elasticsearch ingest protocol

event-collector-request-errors

Number of ingest errors in the http-event-collector endpoint since the node started

event-latency

Overall latency of ingest queue and digest pipeline not including parsers, but from insert into ingest queue, then updating live queries and adding events to blocks for segment files

failed-http-checks

Number of nodes that appear to be unreachable using http as seen from this node. A healthy system has zero of these

gcs-storage-read

Bytes fetched for raw segment files and aux files from gcs to local data store

gcs-storage-write

Bytes stored for raw segment files and aux files using gcs as data store

global-publish-wait-for-value

Time spent waiting to see the value being read back from Kafka when pushing an update to the global state

globalsnapshot-size

Size of global-snapshot.json file written

hashfilter-included-blocks

Number of blocks included using hashfilters in queries and thus read from compressed blocks in segment files

hashfilter-skipped-blocks

Number of blocks skipped using informed filters in queries and thus not read from compressed blocks in segment files

http-requests

Timing of all inbound http requests

http-requests-external-size

Size of external inbound http requests

http-requests-external-timing

Timing of external inbound http requests

http-requests-internal-size

Size of internal inbound http requests

http-requests-internal-timing

Timing of internal inbound http requests

humio-ingestion-request-errors

Number of ingest errors in the humio ingestion endpoint since the node started

ingest-bytes-total

Number of bytes uncompressed in flushed blocks for segments being constructed across all repos

ingest-listener-tcp-available

TCP ingest listener free slots for lines to be processed (high when idle, zero when over-loaded)

ingest-writer-bulksize

Histogram of size (bytes) of data for jobs that carry events. Some jobs are no-payload and are not included here

ingest-writer-compressed-bytes

Number of bytes written to kafka as compressed events into the ingest queue in total

ingest-writer-jobs

Number of jobs pushed to in-memory job queue for digest writers

ingest-writer-queue-add

Number of times an ingest queue consumer pushes to in-memory job queue for digest writers, including when the operation fails due to the queue being full

ingest-writer-queue-empty

Number of times an ingest queue consumer hit an empty queue while pushing to in-memory job queue for digest writers

ingest-writer-queue-full

Number of times an ingest queue consumer hit a full queue while pushing to in-memory job queue for digest writers

ingest-writer-uncompressed-bytes

Number of bytes written to kafka before compression for events into the ingest queue in total

jvm-heap-usage

Java virtual machine heap memory usage

jvm-heap-usage-percent

Java virtual machine heap memory usage in percent

jvm-hiccup-latency

Latency of timed events inside Humio jvm

jvm-NON-heap-max-usage

Maximum java virtual machine NON heap memory usage

jvm-NON-heap-usage

Java virtual machine NON heap memory usage

kafka-chatter-bytes

Number of bytes written to kafka on the chatter topic

kafka-chatter-put

Time waiting for getting ack from Kafka when publishing to the chatter topic

kafka-ingestqueue-put

Time waiting for getting ack when adding ingest events to the ingest queue

kafka-request-bytes

Number of bytes written to kafka as compressed events for the ingest queue

kafka-request-events

Number of events written to kafka as compressed events for the ingest queue

live-dashboard-query-count

Number of live queries on dashboards

livequeries-canceled-due-to-digest-delay

Number of live queries that have been canceled due to excessive digest delay

livequeries-rate

The rate of the cost of live queries, in cost/s

livequeries-rate-canceled-due-to-digest-delay

The rate of the cost of live queries canceled due to excessive digest delay, in cost/s

livequery-count

Number of live- (real time-) queries active

load-segment-total

Time spent reading (waiting for) blocks from segment files

local-query-jobs-queue

Count queries currently queued or active on node including exports

local-query-jobs-queue-exports-part

Count queries currently queued or active on node for exports

local-query-jobs-wait

Histogram of time in milliseconds that each query waited between getting any work done including exports

local-query-segments-queue

Count of elements in queue as number of segments currently queued for query including exports

local-query-segments-queue-exports-part

Count of elements in queue as number of segments currently queued for query for exports

logplex-ingestion-request-errors

Number of ingest errors in the logplex endpoint since the node started

mapsegment

Time spent on ‘map’ phase while searching non-real time segment files

mini-segment-created

Number of new mini-segment being created. The number gets incremented when the mini-segment gets closed and added to global

missing-cluster-nodes

Number of nodes that this node has decided are now dead. A healthy system has zero of these

month-day-year-last-timestamp-found

Total number of events for which the findTimestamp function found a timestamp in the month-day-year(last) format

month-day-year-timestamp-found

Total number of events for which the findTimestamp function found a timestamp in the month-day-year format

primary-disk-usage

Percent used on the primary disk

proxied-query-polls

Timing of internal requests due to polling of queries not hitting the server coordinating the query

queries

Total number of queries started since this node started

query

Measure how long it takes for queries to complete

query-coordinator-latency

Latency for responses on query state refreshes from nodes within the cluster

query-delta-total-cost

30s delta of total cost on queries for the entire cluster

query-delta-total-memory-allocation

30s delta of total memory allocation on queries for the entire cluster

query-live-delta-cpu-usage

30s delta of cpu usage on live queries for the entire cluster

query-segments-count

Segment being queried that hit local files. Includes those fetched from remote once they arrive

query-segments-count-from-remote

Segments being queried that missed local, triggering a fetch from remote

query-static-cost-cache-hit

Part of static cost of queries coordinated by this host that completed in this time interval that were based on results loaded from the query state cache

query-static-cost-cache-miss

Part of static cost of queries coordinated by this host that completed in this time interval that were accumulated across the cluster refreshing the result

query-static-cost-total

Total static cost of queries coordinated by this host that completed in this time interval

query-static-delta-cpu-usage

30s delta of cpu usage on static queries for the entire cluster

query-thread-limit

Number of threads allowed to be executing historical parts of queries. Gets turned down if digest is unable to keep up

querycache-disk-usage

Sum of sizes of files in local query cache

querycache-max-age

Age of the oldest cache entry that has not been reused or deleted yet. As the cache drops the least recently used this is the age of the next item to be dropped from the cache

read-compressed-bytes

Number of bytes of read from compressed blocks in segment files

read-prefilter-bytes

Number of bytes of read from pre-filter files

recompress-millis

Number of milliseconds CPU time spent merging and re-compressing segment files

s3-archiving-bytes-per-second

Bytes archived in S3 per second

s3-archiving-errors-per-second

Errors per second archiving logs in S3

s3-archiving-writes-per-second

Successful S3 archival writes per second

s3-storage-read

Bytes fetched for raw segment files and aux files from s3 to local data store

s3-storage-write

Bytes stored for raw segment files and aux files using s3 as data store

schedulesegments

Time spent scheduling segment files for the ‘map’ phase while searching non-real time segment files

secondary-disk-usage

Percent used on the secondary disk. Only present if secondary disk is configured

segment-merge-cpu-time

CPU time spent merging segments

serialize-state-bytes

Number of bytes serialized for internal query states

serialize-state-time

Time spent serializing internal query states

target-segment-blocks

Number of blocks in segments created by merging mini-segments

target-segment-compressed-size

Size of the file for segments created by merging mini-segments

target-segment-created

Number of new segment targets being created. The number gets incremented when the target id is chosen, before any of the mini-segments exist

target-segment-uncompressed-size

Number of bytes uncompressed for segments created by merging mini-segments

time-digest

CPU time used on digest as a fraction of wall time

time-livequery

CPU time used on live queries as a fraction of wall time

time-only-timestamp-found

Total number of events for which the findTimestamp function found a timestamp in the time only format

timestamp-parsing-failed

Total number of timestamp strings that did not parse as a time stamp since start of the node

unix-epoch-timestamp-found

Total number of events for which the findTimestamp function found a timestamp in the unix epoch format

uploaded-files-cache-entries

Cached uploaded files. How many files are cached in memory

year-month-day-timestamp-found

Total number of events for which the findTimestamp function found a timestamp in the year-month-day format

Object Level Metrics

Metric Name

Description

Metric Name

Description

actions/repo

Time spent invoking actions from an alert or scheduled search

data-ingester-errors/repo

Number of events that got an @error tag added to their fields during parsing

datasource-count/repo

Number of datasources

event-forwarding-errors/repo/forwarderId

Number of events that were not forwarded for each forwarder

event-forwarding-events/repo/forwarderId

Number of events that were forwarded for each forwarder

event-latency-partition/partitions

Per-partition latency of the humio-ingest topic of the ingest+digest pipeline including time spent in parsers, updating live queries and adding events to blocks for segment files

event-latency-repo/repo

For each repository, overall latency of ingest+digest pipeline including time spent in parsers, updating live queries and adding events to blocks for segment files

find-timestamp-failed/repo

Total number of events for which the findTimestamp function failed

garbage-collection-time/garbage-collection

Time spend doing garbage collection

global-operation-rate/function

Named operations being applied to global data

ingest-bytes/repo

Number of bytes uncompressed in flushed blocks for segments being constructed

ingest-eventsize/repo

Number of bytes uncompressed summed over individual events in blocks in progress

ingest-offset-lowest/partitions

The lowest offset on the Kafka ingest queue partition that Humio will ever need to read again (in failover scenarios). This metric is not updated very often. With the default configuration it is updated around every 40 minutes. is important that this value keeps growing over time to show there is progress on all datasources.

ingest-parsing/repo/parser

Time spent parsing incoming events

ingest-queue-consumer/repo

Time spent constructing segment file blocks in memory and writing them to disk, including updating live queries if any

ingest-queue-latency/partitions

Latency of the ingest queue from insert into queue (after the parsers has completed) and up to the data has been read but not yet processed in the digest node for each partition.

ingest-reader-partition-bytes/partitions

Number of bytes read from kafka as compressed events from the ingest queue

ingest-reader-partition-events/partitions

Number of events added to segment file blocks being constructed

ingest-reader-polltime/partitions

Time blocked waiting for next message from Kafka from ingest queue

ingest-writer-partition-bytes/partitions

Number of bytes written to kafka as compressed events into the ingest queue in each partition

kafka-chatter-by-kind-bytes/kind

Number of bytes written to kafka on the chatter topic for each kind of chatter

kafka-chatter-by-kind-serialize/kind

Time spent serializing value being written to Kafka when publishing to the chatter topic for each kind of chatter

live-events/repo

Number of events processed by live queries

no-timestamp-found/repo

Total number of events for which the findTimestamp function did not find a timestamp

no-timezone-found/repo

Total number of events for which the findTimestamp function did not find a timezone and was not called with a default timezone

query-delta-cost/repo

30s delta cost on queries per repo, for the entire cluster

query-millis/repo

Number of milliseconds spent processing historical queries

repo-queries/repo

Number of queries started per repo

tcp-ingest-bytes/listener

Number of bytes read by tcp ingest listener

udp-ingest-bytes/listener

Number of bytes read by udp ingest listener

written-events-after-queue/repo

Number of events added to segment file blocks being constructed

written-events/repo

Number of events written to the ingest queue after being parsed