Digest Node are responsible for executing real-time queries and compiling incoming events into segment files.
Whenever parsing is completed, new events are placed in a Kafka queue
A cluster will divide data into partitions (or buckets); we cannot know
exactly which partition a given event will be put in. Partitions are
chosen randomly to spread the workload evenly across digest nodes. These
partitions should correlate with the number of Kafka
humio-ingest topic partitions, which uses a default of
Each partition of that queue must have a node associated to handle the work that is placed in the partition, or they will not be processed or saved.
LogScale clusters have a set of rules that associate partitions in the Digest Queue with the nodes that are responsible for processing events on that queue. These are called Digest Rules.
Digest Partition Rules are managed automatically.
Configuring Digest Rules
As of v1.80.0, these digest rules are automatically configured.
To see the current digest rules for your own cluster:
Go to your profile account menu on the top right of the user interface and select
System Administrationpage → Cluster nodes tab, click : the rules are shown in the panel on the right-hand side.
Figure 273. Configuring Digest Rules
The image above shows digest rules for a cluster of three nodes where each node is assigned to 16 out of 24 digest partitions.
When a node is assigned to at least one digest partition, it is considered to be a Digest Node.
Example Digest Rules
The table shows three digest rules. Node
1 will receive 2x more work than
3. This is because
1 is assigned to two partitions
3 is only handling
events on partition
If a node is not assigned to a partition, it will not take part in the digest phase.
Increasing Digest Partitions
This operation is not reversible. Once you've increased partitions, you cannot later decrease the number you're using.
If additional digest nodes are added to a cluster, you will likely want
to increase the number of digest partitions to ensure all digest nodes
are able to consume from at least one partition. To do this, you must
first increase the number of partitions for the Kafka
humio-ingest topic. Increasing the number of Kafka
partitions for a topic involves logging into one of your Kafka nodes and
running this command:
/usr/lib/kafka/bin/kafka-topics.sh --alter --zookeeper \ localhost:2181 --topic humio-ingest --partitions 28
The path to kafka-topics.sh may vary depending on
installation type. Docker containers may have this at
/products/kafka/bin. Ansible-based installs will
have it at the above path. If you installed this by hand, use the path
where Kafka is installed.
If Zookeeper is running on the Kafka machine, use localhost for the
Zookeeper address; otherwise specify its actual hostname or IP. Replace
28 with whatever total partition count
you're after (note that you're specifying the total and not the number
of partitions being added). After running the command, you should see
output that says "Adding partitions succeeded!".
LogScale will configure itself based on the number of partitions in Kafka.
You can check the digest partitions in LogScale from the
System Administration page →
Cluster nodes →
LogScale has High Availability support for digest nodes.
The HA option works as follows.
Run a cluster of at least 3 nodes.
Set the digest and storage replication factor to
2, you can do so by clicking the pencil icons found under Replication:
Figure 274. Setting Replication Factors
If any data is present on only one node, then queries will get warnings saying Segment files missing, partial results.
Make your http-load-balancer know about more than one node in the cluster; preferably all of them.
Make sure your cluster has the CPU power required to handle the failover: It should be running at well below 50% utilization normally.
This feature presumes that your Kafka cluster is properly configured for high-availability — all topics in use by LogScale need to have at least two replicas on independent server nodes.
High Availability in LogScale
The design aims for high performance during normal operation at the cost of a penalty to be paid when a failover situation happens.
Turn on high availability on ingest by configuring all digest partitions to have two or more nodes listed. The first live node in the list will handle that partition during normal operation. If the primary node stops, then the next one in the list will take on the responsibility of handling the digest on that partition. If the primary returns at some point, then the primary will resume responsibility for that partition. This happens independently for all digest partitions.
To get proper results in all live queries, all live queries get restarted when a node stops.
The node that then takes on the digest processing has to start quite a way back in time on the digest queue, processing all events (again probably) that have not yet been written to segments replicated sufficiently to other nodes. LogScale takes care to ensure that events only get included once in a query, regardless of the re-processing in digest, including not duplicating them once the failed node returns and has those events as well.
The failover process can take quite while to complete, depending on how much data has been ingested during the latest 30 minutes or so. The cluster typically needs to start 30 minutes back in time on a failover. While it does so, the event latency of events coming in will be higher than normal on the partitions that have been temporarily reassigned, starting at up to 1800 - 2000 seconds. The cluster thus needs to have sufficiently ample CPU resources to catch up from such a latency fairly quickly. To aid in that process, queries get limited to 50% of their normal CPU time while the digest latency is high.
Fail-over makes the cluster start much further back in time compared to a controlled handover between two live nodes. When reassigning partitions with the nodes active, the nodes cooperate to reduce the number of events being reprocessed to a minimum, making the latency in ingested events typically stay below 10 seconds.
The decision to go with the 30 minutes as target failover cost is a compromise between two conflicting concerns. If the value is raised further, the failover will take even longer, which is usually not desirable. But raising it reduces the number of small segments generated, since segments get flushed after at most that amount of time. Reducing the 30-minute interval to say, five minutes would make the failover happen much faster, but at the cost of normal operation, as there would be 12 * 24 = 288 segments in each datasource per day, compared to the 2 * 24 = 48 with the current value. The cost of having all these extra segments would slow down normal operation somewhat.
There are a number of configuration parameters to control the tradeoffs. Here they are listed with their defaults.
# How long can a mini-segment stay open. How long back is a fail-over likely to go? FLUSH_BLOCK_SECONDS=1800 # Desired number of blocks (each ~1MB before compression) in a final segment after merge BLOCKS_PER_SEGMENT=2000 # How long cn the same semgent be the final target of all the minisegments being produced? MAX_HOURS_SEGMENT_OPEN=24