Humio Insights Search Dashboard

This dashboard will provide some more insights into queries and searches made in a Humio cluster.

Search Queue

This is the number of segments to be queued for search by a query per vHost (i.e., Humio Node ID). When a query is run by a user or a dashboard or an alert, Humio needs resources to pull the segment files in question, have them scanned and then return the results to the query. If those resources aren’t available, the queries are put into a queue.

Ideally, this value per Humio node is kept at 0. This means that no Humio nodes have to wait to scan segments as soon as Humio gets the query. Spikes can be expected, especially during times when more queries are received than usual. A constant queue, however, could indicate built up load on the nodes. This will mean slow queries.

CPU Usage in Percent

This shows the amount of CPU usage of each Humio node. Within a cluster, if each Humio node has the same specifications, and digest and storage partitions are evenly distributed, you would expect each Humio node to have about the same CPU usage.

If some nodes are experiencing particularly high usage, this could indicate that something is wrong with Humio or the cluster setup.

Query Restarts By Reason

There are occasions in Humio when a query will need to be restarted. It could be an alert or a dashboard query. This timechart shows the reasons why a query might have been restarted over time.

There are a few common reasons:

  • Ingest Partition changes in the Humio cluster;

  • Lookup File changes that are used in the query;

  • Permission changes on the query;

  • View Connection changes.

A view connection change will update the repository connections in a view where the query might be running.

The list of common reasons above are to be expected and could be ignored. The reasons listed below, though, are worth investigating:

  • Poll Error because of a dead host. This means a Humio node is down and should be investigated as to why.

  • Statuscode=404. It could be worth checking the query, if it is an alert, and why it’s causing a 404 error.

Starved Searches

A starved search in Humio is when a query cannot proceed to finish its query because it’s restricted by resources in scanning the segment files, or segment files are pending to be fetched. This timechart shows the number of times it has experienced the starved searches log per Humio node.

Each log with the starved searches text will come with a queryID. You can then search for it in the Humio repository to find out which queries are having this issue.

Query Total Cost

This utilises a Humio Metric called query-delta-total-cost. There is a log of this metric per Humio host every 30 seconds of the delta of the total cost on queries for the entire cluster.

The Cost on the query is the unit Humio uses to schedule, limit, and monitor queries. A cost point is a combination of both the memory and CPU consumption that a query has, and can be used as a measurement of how expensive a query is overall.

Query Memory Allocation Cost

This utilises a Humio Metric called query-delta-total-memory-allocation. There is a log of this metric per Humio host every 30 seconds of the delta of the total cost on queries for the entire cluster.

This is important since the way Humio works when you run a query is that all of the segment files within the timeframe of that query will be pulled across into the memory of a Humio node and are decompressed and scanned there. This means that if your cluster is maxed on memory for queries, it could be a reason to slow performance on the cluster or queries not finishing.

Top Cost Queries

This shows the heaviest queries run on a Humio cluster within the last hour along with its cost.

The Cost of the query is the unit Humio uses to schedule, limit, and monitor queries. A cost point is a combination of both the memory and CPU consumption that a query has, and can be used as measurement of how expensive a query is overall.

This can help you gauge which queries are heavy and require plenty work for the cluster. You may have a query that is causing too much work. If so, you need to kill it to release resources to your Humio cluster. This is where you can Query Quotas.

Top Cost Queries by User

This shows the heaviest query users in a Humio cluster within the last hour, along with their cost.

This is where you may want to implement Query Quotas, if some users are using too many resources in the cluster with inefficient queries.

Query Historical Cost

Historical queries are essentially any static or non-live query. This utilises a Humio Metric called, query-static-delta-cpu-usage. There is a log of this metric per Humio host every 30 seconds. It logs the delta of the total cost on these historic or static queries for the entire cluster.

Query Threaddumps with Query ID’s

Within a Humio cluster, Humio logs constantly what each thread is doing at a particular time into the humio-threaddumps.log. Each threaddump contains the name of the group of which it belongs. This should logically indicate the type of work being done by Humio.

In this case, we’re looking at the query-mapper thread group which also logs the queryID. This indicates which queries are taking up the most threads over the last 24 hours. That can tell if particular queries are using too many resources on the Humio cluster.

To investigate any given queryID to find out more information, you can search for the queryID in the humio repository. Try a search like this:

humio
queryID="$QUERY_ID" "createQuery"
| groupBy([queryID,dataspace,live,query])

Top Queries In Mapper Threads

This widget is very similar to the Query Threaddumps with Query ID’s, except that it presents the information in table, along with the query being run.

The queries with the most threaddumps logs indicate it’s using more resources on the cluster than other queries.

Query Live Cost

This timechart looks specifically at the cost of live queries across the cluster. This utilises a Humio Metric called query-static-delta-cpu-usage. There is a log of this metric per Humio host every 30 seconds. It logs the delta of the total cost on these historic/static queries for the entire cluster.

Time Spend Reading Segments

This timechart shows the average time spent per Humio host reading (i.e., waiting for) blocks from segment files in milliseconds. This is indicative of the performance of queries in Humio since reading the blocks from the segment files is part of executing the query and producing the queries results.

Keeping this value below 5 milliseconds is a sign of a healthy cluster and performance speed.

Read Segment Files Performance

This timechart shows the average number of bytes per second read from compressed blocks in segment files per Humio host. Queries that look over large timeframes will need to scan more compressed blocks. Heavier queries like that can cause spikes in this graph.

CPU Usage: Thread Group Ticks

This widget shows the number of CPU ticks used by each Humio thread group. Within a Humio cluster, the number of threads is usually dictated by the number of cores running on each node. The number of threads is then assigned to each particular thread group. The names of the groups should logically indicate the type of work being done by Humio. This widget can indicate the amount of time being spent by the CPU for each of these thread groups.

A common thread group which will consume resources is the humio-akka thread group. This is the group responsible for handling network requests. As a result, it may take more time since it spends a lot of time waiting for responses.

Another common thread group is the digester. This is the thread that handles digesting all new data coming into Humio.

The runningqueries in particular for this dashboard will be interesting to compare to the other thread groups.

Slow Warnings to Users

When running a query in Humio, if one or more of the Humio nodes is slow or not responding, you’ll receive a warning in the User Interface letting you know that the query is slow. This widget lists how many times that warning was shown to users for each Humio host.

To fix this, you will need to investigate which nodes are receiving this warning. You do this by running this query:

humio
#type=humio #kind=logs loglevel=WARN class="c.h.q.QuerySessions$" "user got a queryresult containing a warning" (warning="*slow*" or warning="*respon*") /server node \'(?<node>\S+)\'/
| groupBy([node])

This will return the nodes currently not responding or that are slow. It may be that there is Ingest Latency or a heavy query has consumed Humio’s resources.

A healthy system should show no slow warnings to users.

Live Queries per Host

The way live queries work in Humio is that they are analysed at ingest as they are coming in, before being processed and stored as segment files. This widget shows how many queries are running on each Humio node which is responsible for running the query as ingest comes in. Although live queries aren’t very heavy work for Humio nodes, this can be useful to see if a Humio node is doing more for live queries than others.

HTTP Internal Query Requests

Internal HTTP requests are initiated by Humio nodes. This widget shows internal HTTP requests directly hitting the query endpoint. An example of an internal query to the query jobs endpoint would be proxying a query to a Humio host supposed to be in charge of a particular query.

This widget shows the number of these internal requests per second for each Humio node.

HTTP External Query Requests

External HTTP requests are initiated typically by users of Humio. This widget shows the external HTTP requests to the query endpoint which is usually either a dashboard’s widget, and alert query or a user running an ad-hoc query.

This widget then shows the number of these external HTTP requests per second for each Humio node.

HTTP Query Submits Per Repository

The widget shows the number of queries submitted per minute per repository on a Humio cluster.