Switching Kafka

LogScale uses Kafka for queuing incoming messages and for storing shared state when running LogScale in a cluster setup. It's possible for LogScale to snapshot its state and continue running using a new Kafka cluster. This can be useful in situations where you want to change infrastructure or if there are problems with the current Kafka/ZooKeeper cluster. One example could be if all ZooKeeper machines have written the disk full and afterwards ZooKeeper will not start because of file inconsistencies. This section describes the procedure for doing a Kafka switch.

Danger

All LogScale processes must be completely stopped before performing this action.

Stop Sending Data to LogScale

If it is possible, stop sending data to LogScale, then wait for LogScale to process all data on the ingest queue. The LogScale Stats dashboard in the LogScale repository have an Events Processed After Ingest Queue graph by host per second and an Ingest Latency graph. If there is data on the ingest queue after closing LogScale it will be lost as the queue is reset or another queue will be used.

You'll need to stop all LogScale processes on all machines. You'll also need to stop all Kafka processes and all ZooKeeper processes on all machines.

Switch Kafka and ZooKeeper

Available: LogScale & ZooKeeper v1.108.0

The requirement for LogScale to use ZooKeeper was removed in LogScale 1.108.0. ZooKeeper may still be required by Kafka. Please refer to your chosen Kafka deployment documentation for details.

There are three options for switching Kafka and ZooKeeper. The first is to set up a new Kafka/ZooKeeper cluster and configure LogScale to use that. This is the simplest method. The other two methods are to delete the Kafka and ZooKeeper data, or to create new Kafka queues and topics with new names in the Kafka cluster.

Delete Kafka & ZooKeeper Data and Re-Use Cluster

Available: LogScale & ZooKeeper v1.108.0

The requirement for LogScale to use ZooKeeper was removed in LogScale 1.108.0. ZooKeeper may still be required by Kafka. Please refer to your chosen Kafka deployment documentation for details.

You could reset the current Kafka and ZooKeeper cluster by deleting their data directories on the filesystem on each node. This is the same as starting up a new and empty Kafka/ZooKeeper cluster. Another option is to spin up new ZooKeeper/Kafka clusters. To do this, delete everything inside Kafka's data directory. Then delete the folder version-2 inside ZooKeeper's data directory.

It's important that you not delete the ZooKeeper file myid in ZooKeeper's data directory. The myid is a configuration file that contains the id of the ZooKeeper node in the ZooKeeper cluster and must be there at startup.

After doing this, you will have created completely new ZooKeeper and Kafka clusters.

Create New Kafka Queues/Topics with New Names in Kafka Cluster

Instead of resetting Kafka ZooKeeper as described above, you can let LogScale use a new set of queues in the existing Kafka cluster. When reusing the same Kafka cluster, LogScale must be configured with a new HUMIO_KAFKA_TOPIC_PREFIX to detect the changes.

It's important to note that it will not work to delete and recreate topics with the same names. In that case LogScale cannot detect the Kafka switch. If Kafka is managed by LogScale (KAFKA_MANAGED_BY_HUMIO), the new topics will be created automatically when LogScale starts up. Otherwise you must create topics externally before you start LogScale.

Start Kafka, ZooKeeper and LogScale

Available: LogScale & ZooKeeper v1.108.0

The requirement for LogScale to use ZooKeeper was removed in LogScale 1.108.0. ZooKeeper may still be required by Kafka. Please refer to your chosen Kafka deployment documentation for details.

Now you're ready to get the Kafka/ZooKeeper cluster started. This is typically done by starting the ZooKeeper nodes. Wait for all nodes to be running and verify the ZooKeeper cluster. Then start all Kafka nodes, wait for them to be running and verify the Kafka cluster.

Once Kafka and ZooKeeper have started, start the LogScale nodes. It's important to start one LogScale node first. This node will detect the Kafka switch and create a new epoch in LogScale. If you're running multiple LogScale processes on one machine (with multiple CPUs), make sure to only start one LogScale process.

To verify that the Kafka switch was detected and handled, look for this logline:

syslog
Switching epoch to=${epochKey} from=${latestEpoch.kafkaClusterId}
- I'm the first cluster member to get here
  for this kafka. newEpoch=${newEpoch}

When the first node is up and running and the above logline confirms a new epoch has been created, the rest of the LogScale nodes can be started.

At that point, the LogScale cluster should be running again. Check the cluster nodes in the administrative section of the LogScale user interface: http://$HUMIOHOST/system/administration/partitions/ingest

Recap

It's worth reviewing the steps above again. In short, to do a Kafka switch you'll need to do the following steps:

  • Stop all LogScale processes on all nodes.

  • Stop all Kafka processes on all nodes.

  • Stop all ZooKeeper processes on all nodes (up to LogScale 1.107).

  • Delete ZooKeeper and Kafka data (or use new Kafka queues).

  • Start all ZooKeeper processes on all nodes (up to LogScale 1.107).

  • Verify the ZooKeeper cluster (up to LogScale 1.107).

  • Start all Kafka processes on all nodes.

  • Verify the Kafka cluster.

  • Start one LogScale node and let it change epoch.

  • Verify the epoch has changed.

  • Start the other LogScale processes on all nodes.