Query data in the humio Repository

The documentation provides a comprehensive set of LogScale queries for monitoring and analyzing data within the humio repository, including methods to track segment block sizes, Kafka partition synchronization, S3 archiving backlogs, and system performance metrics. The queries enable administrators to measure ingest latency, monitor parser throttling, track failed requests, and assess data compression rates across different hosts and repositories.

Many of the queries you want to run on the data in the humio repository are included in the humio/insights. This section contains examples of queries that might be useful to run in the humio Repository.

View segment block sizes on a dataspace

logscale Syntax

class=*SegmentFileImporter* dataspace=$DATASPACE_NAME
| "segment ready for global"
| groupby(dataspace, function=[avg(blocks, as=avg_blocks), percentile(blocks, percentiles=[50]), max(blocks, as=max_blocks), count(as=count)])
| round(avg_blocks)
| sort(dataspace, order=asc)
| rename(_50, as=med_blocks)
| select([dataspace, avg_blocks, med_blocks, max_blocks, count])

The query above, when run with a wildcard (*) for dataspace, gives a result similar to the following:

dataspace	avg_blocks	med_blocks	max_blocks	count
humio	808	52.65177539243129	30767	52
humio-activity	6	3.9609462244260185	12	5
humio-audit	37	12.040423712553398	144	8
humio-fleet	16	12.040423712553398	44	11
humio-metrics	62	35.98315439448187	135	12
humio-usage	45	12.040423712553398	144	4

Find the unsync partitions in Kafka

logscale

kind=logs #repo=humio class = c.h.j.KafkaStatusLoggerJob
| /kafka status topic_name=\'(?<topicName>(.+))\' topic_partition=(?<topicPart>(.+)) topic_leader=(.+) topic_num_out_of_sync_replicas=(?<numUnsync>(.+)) topic_out_of_sync_replicas=/i
// | topicName="global-events"
// | topicName="humio-ingest"
| topicName="transientChatter-events"
| numUnsync > 0
| format("%s,%d", field=[topicName, topicPart], as="name")
| timechart(name)

Monitor S3 archiving job backlog

This query shows a continuously increasing backlog for the S3 Archiving job. Since that job can postpone merges, it can result in disk overflow.

logscale

#kind=logs #vhost=* /S3Archiving/i "Backlog for dataspace" | timechart(#vhost, function=max(count))

Show offline nodes

logscale

#type=humio #kind=logs class=/ClusterHostAliveStats/  "AliveStats on me" | age > 7200000 /* =2hours */ | timechart(hostId, function=count(hostId,distinct=true), limit=50, minSpan=4h)

Show repositories with ingest greater than 32MB

logscale

#kind=req* method=POST //| timechart(status, function=max(contentLength))
| contentLength > 32000000 | len:=contentLength/(1024*1024)
| groupBy("repo", function=[count(), max(len), avg(len)])

Calculate ingest queue average compression

logscale

#type=humio #kind=metrics | name=/^ingest-writer-(?<un>un)?compressed-bytes$/
| case { un=* | un:=m1; comp:=m1 }
| timechart(function=[sum(un,as=un),sum(comp,as=comp)], minSpan=1m)
| ratio:=un/comp | drop([un,comp])

This query gives a result similar to the following:

_bucket	ratio
1732450500000	1.608886671332077
1732451400000	1.6095566354846649

Measure ingest latency per host

logscale

#type=humio @host=* | name="event-latency"
| timechart(@host, function=[max(max, as=max)], limit=20)
|  max:=max/1000

This query gives a result similar to the following:

_bucket	@host	max
1732450500000	example.testing:8080	1.346
1732451400000	example.testing:8080	0.838

Measure ingest in bytes

logscale

#type=humio | metric_type=METER | @host=?{host=new*} @host=?{host=*} | name=/^ingest-bytes\/(?<dataspace>.*)$/
| timechart( span=60s,
       function={ groupby([dataspace, #host], function=avg(m1)) | sum(_avg)},
       unit="bytes/sec to bytes/day")

This query gives a result similar to the following:

_bucket	sum
1732450500000	4995149903.207501
1732451400000	5769455633.238956

Measure parser throttling

This query shows how close (in percent) the system has been to start throttling any parser. As long as that is below 100, nothing gets throttled by default.

logscale

#kind=logs class=/ParserLimitingJob/ /Top element for parser id=(?<parserID>[^/]+)\/(?<repo>[^/]+)\/(?<parserName>\S+)/
| pct:=100*costSum/threshold
| timechart(function=max(pct), minSpan=10s, limit=1)

Show failed ingest requests that have been throttled

logscale

#type=humio #kind=logs statuscode=503 /msg=(?<msg>Ingest parsing exceeded the acceptable amount of time[^\.]+)\. exception/

LogScale System Repository Schema Guide

The humio Repository

The humio-activity Repository

The humio-audit Repository

The humio-fleet Repository

The humio-measurements Repository

The humio-metrics Repository

The humio-usage Repository