LogScale Metrics

LogScale generates a number of metrics that can be used to monitor and operate LogScale itself.

JMX

LogScale can expose all metrics over JMX. This is configured automatically when using the Launcher Script.

Prometheus

Setting the PROMETHEUS_METRICS_PORT configuration will enable Prometheus to scrape metrics from LogScale. More information on configuring this is available on our Prometheus integration page.

LogScale Debug Logs

The LogScale debug log also contains all the internal LogScale metrics. In a standalone or on-premise installation, you can find them within the humio repository or the humio-metrics repository. On cloud, you can find them in the humio-organization-metrics view.

Metric Structure

An event within the humio-metric repository will include the following core fields:

  • name

    The name fo the metric being reported. A list of possible metrics is shown in Node Level Metrics.

  • type

    The type of the metric, which provides an indication of the other fields that will be included within the event. For more information, see Metric Types.

Additional fields within an event will depend on the underlying type.

Metric Types

There are two major types of metrics in LogScale; node level metrics and object level metrics. Objects include repositories, ingest listeners, or storage partitions. For example ingest metrics for a given repository can be obtained by viewing ingest-bytes.

Node level metrics are recorded specific to each node within your LogScale cluster and will have a field, @host, that contains the hostname number within the cluster. To query across the nodes, a query will need to aggregate across all the records using a reference point, for example the time of the metric entry:

logscale
name="ingest-bytes"
| day := time:dayOfMonth(@timestamp)
| groupBy(day,function=sum(m1))

Within the individual metric types there are a different event entries for different metrics that include different fields. The underlyin type can be identified using the type field:

COUNTER

The COUNTER event is a basic count of the corresponding metric event as an integer. A metric event of this type may contain the following fields:

Field Type Description
count Integer An incremental count of the metric.
GAUGE

A GAUGE event stores either a value or ratio. A metric event of this type may contain the following fields:

Field Type Description
value Integer or Float Any value appropriate for the metric.
HISTOGRAM

The HISTOGRAM type tracks counter and statistical information for a given metric.

Field Type Description
count Integer Counter of the corresponding metric
max Float Maximum value of the metric within the time period
mean Float Mean value of the metric within the time period
median Float Median value of the metric within the time period
min Float Minimum value of the metric within the time period
p75 Float 75th percentile of the metric within the time period
p95 Float 95th percentile of the metric within the time period
p98 Float 98th percentile of the metric within the time period
p99 Float 99th percentile of the metric within the time period
p999 Float 99.9th percentile of the metric within the time period
rate_unit String A text description of the rate unit for the values.
stddev Float Standard deviation of the metric within the time period
METER

A METER measures the rate of a given metric, and includes moving averages and the rate description. A metric event of this type may contain the following fields:

Field Type Description
count Integer Count of the corresponding metric
m1 Float 1 minute moving average
m15 Float 15 minute moving average
m5 Float 5 minute moving average
mean_rate String A text description of the rate unit for the values.
TIMER

A TIMER metric measures the duration of a given metric event, and includes moving averages, statistical values and when appropriate a measured rate. A metric event of this type may contain the following fields:

Field Type Description
count Integer Count of the corresponding metric
duration_unit String A descsription of the unit used to track the duration.
m1 Float 1 minute moving average
m15 Float 15 minute moving average
m5 Float 5 minute moving average
mean_rate String A text description of the rate unit for the values.
max Float Maximum value of the metric within the time period
mean Float Mean value of the metric within the time period
mean_rate Float Mean rate of the metric within the time period
median Float Median value of the metric within the time period
min Float Minimum value of the metric within the time period
p75 Float 75th percentile of the metric within the time period
p95 Float 95th percentile of the metric within the time period
p98 Float 98th percentile of the metric within the time period
p99 Float 99th percentile of the metric within the time period
p999 Float 99.9th percentile of the metric within the time period
rate_unit String A text description of the rate unit for the values.
stddev Float Standard deviation of the metric within the time period

Node Level Metrics

Metric Name Description
bucket-storage-fetch-for-query-queue Count of segment files queued awaiting fetch from Bucket Storage to local data store due to being referred by a query
bucket-storage-pending-upload Total size of segment files pending upload to Bucket Storage (Note: that this is a cluster-level, not node-level, metric.)
bucket-storage-pending-upload-underreplicated Total size of segment files pending upload to Bucket Storage for files that are not known to have more than one replica in the local cluster
bucket-storage-total-segment-size Total size of segment files stored in Bucket Storage
cluster-time-skew Largest time skew (in milliseconds) between this node and any other node in the cluster
compact-timestamp-found Total number of events for which the findTimestamp function found a timestamp in the compact format
currently-running-streaming-queries The amount of currently active streaming queries
day-month-year-timestamp-found Total number of events for which the findTimestamp function found a timestamp in the day-month-year format
digest-active-datasources Number of active data sources
digest-buffer-target-latency Latency target of in-memory buffer after ingest queue in digest pipeline
digest-coordinator-changes Number of changes to the set of active digest nodes triggered by digest coordination. For a healthy system this is close to zero, except when an administrator alters the desired digest partition scheme
digest-live-latency Latency of live update part of digest pipeline for internal bulks in milliseconds
digest-segment-latency Latency of segment building part of digest pipeline for internal bulks in milliseconds
direct-memory-allocated Used for internal debugging. Amount of direct memory allocated by the LogScale application. This does not account for every direct memory allocation in the JVM
elastic-search-ingestion-events-in-bulk Number of events found in an elastic-search bulk
elastic-search-ingestion-request-errors Number of ingest errors in the elastic-search endpoint since the node started
elastic-search-ingestion-requests Time spent ingesting a bulk request using the elasticsearch ingest protocol
event-collector-request-errors Number of ingest errors in the http-event-collector endpoint since the node started
event-latency Overall latency of ingest queue and digest pipeline not including parsers, but from insert into ingest queue, then updating livequeries and adding events to blocks for segment files. The value is expressed in milliseconds (ms).
failed-http-checks Number of nodes that appear to be unreachable using http as seen from this node. A healthy system has zero of these.
gcs-storage-read Bytes fetched for raw segment files and aux files from GCS to local data store
gcs-storage-write Bytes stored for raw segment files and aux files using GCS as datastore
global-publish-wait-for-value Time spent waiting to see the value being read back from Kafka when pushing an update to the global state.
globalsnapshot-size Size of global-snapshot.json file written.
hashfilter-included-blocks Number of blocks included using hashfilters in queries and thus read from compressed blocks in segment files.
hashfilter-skipped-blocks Number of blocks skipped using informed filters in queries and thus not read from compressed blocks in segment files.
http-requests Timing of all inbound HTTP requests
http-requests-external-size Size of external inbound HTTP requests
http-requests-external-timing Timing of external inbound HTTP requests.
http-requests-internal-size Size of internal inbound HTTP requests.
http-requests-internal-timing Timing of internal inbound HTTP requests.
humio-ingestion-request-errors Number of ingest errors in the humio ingestion endpoint since the node started.
ingest-bytes-total Number of bytes uncompressed in flushed blocks for segments being constructed across all repos.
ingest-listener-tcp-available TCP ingest listener free slots for lines to be processed (high when idle, zero when over-loaded).
ingest-writer-bulksize Histogram of size (bytes) of data for jobs that carry events. Some jobs are no-payload and are not included here.
ingest-writer-compressed-bytes Number of bytes written to kafka as compressed events into the ingest queue in total.
ingest-writer-jobs Number of jobs pushed to in-memory job queue for digest writers.
ingest-writer-queue-add Number of times an ingest queue consumer pushes to in-memory job queue for digest writers, including when the operation fails due to the queue being full.
ingest-writer-queue-empty Number of times an ingest queue consumer hit an empty queue while pushing to in-memory job queue for digest writers.
ingest-writer-queue-full Number of times an ingest queue consumer hit a full queue while pushing to in-memory job queue for digest writers.
ingest-writer-uncompressed-bytes Number of bytes written to kafka before compression for events into the ingest queue in total.
jvm-heap-usage Java virtual machine heap memory usage.
jvm-heap-usage-percent Java virtual machine heap memory usage in percent.
jvm-hiccup-latency Latency of timed events inside LogScale jvm.
jvm-non-heap-max-usage Maximum java virtual machine non-heap memory usage.
jvm-non-heap-usage Java virtual machine non-heap memory usage.
kafka-chatter-bytes Number of bytes written to kafka on the chatter topic.
kafka-chatter-put Time waiting for getting ack from Kafka when publishing to the chatter topic.
kafka-ingestqueue-put Time waiting for getting ack when adding ingest events to the ingest queue.
kafka-request-bytes Number of bytes written to kafka as compressed events for the ingest queue.
kafka-request-events Number of events written to kafka as compressed events for the ingest queue.
live-dashboard-query-count Number of live queries on dashboards.
livequeries-canceled-due-to-digest-delay Number of live queries that have been canceled due to excessive digest delay.
livequeries-rate The rate of the cost of live queries, in cost/s.
livequeries-rate-canceled-due-to-digest-delay The rate of the cost of live queries canceled due to excessive digest delay, in cost/s.
livequery-count Number of live- (real time-) queries active.
load-segment-total Time spent reading (waiting for) blocks from segment files.
local-query-jobs-queue Count queries currently queued or active on node including exports.
local-query-jobs-queue-exports-part Count queries currently queued or active on node for exports.
local-query-jobs-wait Histogram of time in milliseconds that each query waited between getting any work done including exports.
local-query-segments-queue Count of elements in queue as number of segments currently queued for query including exports.
local-query-segments-queue-exports-part Count of elements in queue as number of segments currently queued for query for exports.
logplex-ingestion-request-errors Number of ingest errors in the logplex endpoint since the node started.
mapsegment Time spent on 'map' phase while searching non-real time segment files.
mini-segment-created Number of new mini-segment being created. The number gets incremented when the mini-segment gets closed and added to global.
missing-cluster-nodes Number of nodes that this node has decided are now dead. These are the nodes that are missing heartbeat data in addition to the nodes that have outdated heartbeat data.
missing-cluster-nodes-stateful Registered nodes with outdated/missing heartbeat data that can write to global.
month-day-year-last-timestamp-found Total number of events for which the findTimestamp function found a timestamp in the month-day-year(last) format.
month-day-year-timestamp-found Total number of events for which the findTimestamp function found a timestamp in the month-day-year format.
primary-disk-usage Percent used on the primary disk.
proxied-query-polls Timing of internal requests due to polling of queries not hitting the server coordinating the query.
queries Total number of queries started since this node started.
query Measure how long it takes for queries to complete.
query-coordinator-latency Latency for responses on query state refreshes from nodes within the cluster.
query-delta-total-cost 30s delta of total cost on queries for the entire cluster.
query-delta-total-memory-allocation 30s delta of total memory allocation on queries for the entire cluster.
query-live-delta-cpu-usage 30s delta of cpu usage on live queries for the entire cluster.
query-segments-count Segment being queried that hit local files. Includes those fetched from remote once they arrive.
query-segments-count-from-remote Segments being queried that missed local, triggering a fetch from remote.
query-static-cost-cache-hit Part of static cost of queries coordinated by this host that completed in this time interval that were based on results loaded from the query state cache.
query-static-cost-cache-miss Part of static cost of queries coordinated by this host that completed in this time interval that were accumulated across the cluster refreshing the result.
query-static-cost-total Total static cost of queries coordinated by this host that completed in this time interval.
query-static-delta-cpu-usage 30s delta of cpu usage on static queries for the entire cluster.
query-thread-limit Number of threads allowed to be executing historical parts of queries. Gets turned down if digest is unable to keep up.
querycache-disk-usage Sum of sizes of files in local query cache.
querycache-max-age Age of the oldest cache entry that has not been reused or deleted yet. As the cache drops the least recently used this is the age of the next item to be dropped from the cache.
read-compressed-bytes Number of bytes of read from compressed blocks in segment files.
read-prefilter-bytes Number of bytes of read from pre-filter files.
recompress-millis Number of milliseconds CPU time spent merging and re-compressing segment files.
s3-archiving-bytes-per-second Bytes archived in S3 per second.
s3-archiving-errors-per-second Errors per second archiving logs in S3.
s3-archiving-writes-per-second Successful S3 archival writes per second.
s3-storage-read Bytes fetched for raw segment files and aux files from S3 to local data store.
s3-storage-write Bytes stored for raw segment files and aux files using S3 as data store.
schedulesegments Time spent scheduling segment files for the 'map' phase while searching non-real time segment files.
secondary-disk-usage Percent used on the secondary disk. Only present if secondary disk is configured.
segment-merge-cpu-time CPU time spent merging segments.
serialize-state-bytes Number of bytes serialized for internal query states.
serialize-state-time Time spent serializing internal query states.
target-segment-blocks Number of blocks in segments created by merging mini-segments.
target-segment-compressed-size Size of the file for segments created by merging mini-segments.
target-segment-created Number of new segment targets being created. The number gets incremented when the target id is chosen, before any of the mini-segments exist.
target-segment-uncompressed-size Number of bytes uncompressed for segments created by merging mini-segments.
temp-disk-usage-bytes Disk size in bytes used by temporary files in .humiotmp.unique id directory inside the directory specified by the DIRECTORY environment variable.
time-digest CPU time used on digest as a fraction of wall time.
time-livequery CPU time used on live queries as a fraction of wall time.
time-only-timestamp-found Total number of events for which the findTimestamp function found a timestamp in the time only format.
timestamp-parsing-failed Total number of timestamp strings that did not parse as a timestamp since start of the node.
unix-epoch-timestamp-found Total number of events for which the findTimestamp function found a timestamp in the unix epoch format.
uploaded-files-cache-entries Cached uploaded files. How many files are cached in memory.
year-month-day-timestamp-found Total number of events for which the findTimestamp function found a timestamp in the year-month-day format.

Object Level Metrics

Metric Name Description
actions/repo Time spent invoking actions from an alert or scheduled search.
data-ingester-errors/repo Number of events that got an @error tag added to their fields during parsing.
datasource-count/repo Number of data sources.
event-forwarding-errors/repo/forwarderId Number of events that were not forwarded for each forwarder.
event-forwarding-events/repo/forwarderId Number of events that were forwarded for each forwarder.
event-latency-partition/partitions Per-partition latency of the humio-ingest topic of the ingest+digest pipeline including time spent in parsers, updating live queries and adding events to blocks for segment files.
event-latency-repo/repo For each repository, overall latency of ingest+digest pipeline including time spent in parsers, updating live queries and adding events to blocks for segment files.
fdr-ingest-events/repo/feedId Number of FDR events ingested for each FDR feed.
find-timestamp-failed/repo Total number of events for which the findTimestamp() function failed.
garbage-collection-time/garbage-collection Time spend doing garbage collection.
global-operation-rate/function Named operations being applied to global data.
ingest-bytes/repo Number of bytes uncompressed in flushed blocks for segments being constructed.
ingest-eventsize/repo Number of bytes uncompressed summed over individual events in blocks in progress.
ingest-offset-lowest/partitions The lowest offset on the Kafka ingest queue partition that LogScale will ever need to read again (in failover scenarios). This metric is not updated very often. With the default configuration it is updated around every 40 minutes. is important that this value keeps growing over time to show there is progression all data sources.
ingest-parsing/repo/parser Time spent parsing incoming events.
ingest-queue-consumer/repo Time spent constructing segment file blocks in memory and writing them to disk, including updating live queries if any.
ingest-queue-latency/partitions Latency of the ingest queue from insert into queue (after the parsers has completed) and up to the data has been read but not yet processed in the digest node for each partition.
ingest-reader-partition-bytes/partitions Number of bytes read from kafka as compressed events from the ingest queue.
ingest-reader-partition-events/partitions Number of events added to segment file blocks being constructed.
ingest-reader-polltime/partitions Time blocked waiting for next message from Kafka from ingest queue.
ingest-writer-partition-bytes/partitions Number of bytes written to kafka as compressed events into the ingest queue in each partition.
kafka-chatter-by-kind-bytes/kind Number of bytes written to kafka on the chatter topic for each kind of chatter.
kafka-chatter-by-kind-serialize/kind Time spent serializing value being written to Kafka when publishing to the chatter topic for each kind of chatter.
live-events/repo Number of events processed by live queries.
no-timestamp-found/repo Total number of events for which the findTimestamp function did not find a timestamp.
no-timezone-found/repo Total number of events for which the findTimestamp function did not find a timezone and was not called with a default timezone.
query-delta-cost/repo 30s delta cost on queries per repo, for the entire cluster.
query-millis/repo Number of milliseconds spent processing historical queries.
repo-queries/repo Number of queries started per repo.
tcp-ingest-bytes/listener Number of bytes read by tcp ingest listener.
udp-ingest-bytes/listener Number of bytes read by udp ingest listener.
written-events-after-queue/repo Number of events added to segment file blocks being constructed.
written-events/repo Number of events written to the ingest queue after being parsed.