Troubleshooting: Error: The Cluster ID ### doesn't match stored clusterId (###)
Last Updated: 2021-08-13
Condition or Error
Kafka fails to start or restarts * LogScale fails to connect to Kafka
Kafka reports The Cluster ID ### doesn't match stored clusterId (###)
Rebooting your system
You may identify one or more of the following errors:
LogScale fails to start, or regularly restarts with the following error:
2021-07-25T07:37:15.870+0000 [main] ERROR c.h.m.ServerRunner$ -1 - Got exception starting pid=21 java.util.concurrent.TimeoutException: null at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108) ~[humio-assembly.jar:0.1] at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) ~[humio-assembly.jar:0.1] at com.humio.kafka.KafkaAdminUtils.numLiveKafkaBrokers(KafkaAdminUtils.scala:71) ~[humio-assembly.jar:0.1] at com.humio.kafka.KafkaAdminUtils.createTopic(KafkaAdminUtils.scala:97) ~[humio-assembly.jar:0.1] at com.humio.kafka.GlobalKafka.$anonfun$setupKafkaQueues$1(Global.scala:129) ~[humio-assembly.jar:0.1] at com.humio.kafka.GlobalKafka.$anonfun$setupKafkaQueues$1$adapted(Global.scala:128) ~[humio-assembly.jar:0.1] at scala.util.Using$.resource(Using.scala:261) ~[humio-assembly.jar:0.1] at com.humio.kafka.GlobalKafka.setupKafkaQueues(Global.scala:128) ~[humio-assembly.jar:0.1] at com.humio.kafka.GlobalKafka.<init>(Global.scala:114) ~[humio-assembly.jar:0.1] at com.humio.kafka.GlobalKafka.<init>(Global.scala:99) ~[humio-assembly.jar:0.1] at com.humio.core.Config$.fromEnv(Config.scala:1467) ~[humio-assembly.jar:0.1] at com.humio.main.ServerRunner$.main(ServerRunner.scala:92) ~[humio-assembly.jar:0.1] at com.humio.main.ServerRunner.main(ServerRunner.scala) ~[humio-assembly.jar:0.1]
Kafka fails to start or continually restarts with the following error:
humio | 2021-07-25 05:58:23,439 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) humio | 2021-07-25 05:58:24,862 INFO exited: kafka (exit status 1; not expected) humio | 2021-07-25 05:58:25,870 INFO spawned: 'kafka' with pid 48123 humio | 2021-07-25 05:58:26,872 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) humio | 2021-07-25 05:58:28,334 INFO exited: kafka (exit status 1; not expected)
Kafka fails to start and the Kafka logs show the following errors:
[2021-07-24 05:19:51,428] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x10001738b400026, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn) [2021-07-24 05:19:51,433] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient) [2021-07-24 05:19:51,717] INFO Cluster ID = LRXmWrj-RASwLZbqs2gr1g (kafka.server.KafkaServer) [2021-07-24 05:19:51,748] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) kafka.common.InconsistentClusterIdException: The Cluster ID LRXmWrj-RASwLZbqs2gr1g doesn't match stored clusterId Some(jnMgzYVhQb6OqKuLtHc0tw) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong. at kafka.server.KafkaServer.startup(KafkaServer.scala:220) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44) at kafka.Kafka$.main(Kafka.scala:84) at kafka.Kafka.main(Kafka.scala) [2021-07-24 05:19:51,755] INFO shutting down (kafka.server.KafkaServer) [2021-07-24 05:19:51,784] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient) [2021-07-24 05:19:51,894] INFO Session: 0x10001738b400026 closed (org.apache.zookeeper.ZooKeeper) [2021-07-24 05:19:51,894] INFO EventThread shut down for session: 0x10001738b400026 (org.apache.zookeeper.ClientCnxn) [2021-07-24 05:19:51,899] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient) [2021-07-24 05:19:51,909] INFO shut down completed (kafka.server.KafkaServer) [2021-07-24 05:19:51,911] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable) [2021-07-24 05:19:51,936] INFO shutting down (kafka.server.KafkaServer)
A Kafka deployment includes a number of configuration values that are stored within the corresponding Zookeeper deployment so that the information can be shared and accessed across the cluster. The cluster ID for Kafka is used to ensure that all the Kafka nodes within the cluster are members of the same cluster and sharing the same information and structures.
An invalid cluster error occurs when the expected cluster ID for Kafka and the cluster ID within Zookeeper do not match. Cluster IDs are automatically created using a random unique ID when the cluster is first started. The core error is that Kafka and Zookeeper no longer agree on what the cluster ID. The use of a cluster ID can be considered part of the security of your installation and provides verification to ensure the correct cluster information and configuration is being used at all times.
When there is an invalid cluster ID for Kafka the potential causes include:
Your Zookeeper data has been lost or the location has changed
Your Kafka data has been lost or the location has changed
One potential cause for this error is that the data directory used by your Zookeeper installation has been changed or lost. You should first check the configured location of your Zookeeper data. Find your zoo.cfg file and check the dataDir configuration line:
tickTime=2000 dataDir=/tmp/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
If the error occurs, particularly after a reboot, check if the dataDir configuration points to the /tmp directory. On some Linux systems, the default configuration is for the /tmp directory to be deleted during startup, and a system reboot empties the directory and the zookeeper configuration.
If the configuration is set to /tmp or another temporary location, modify the location to point to a dedicated directory, for example /var/zookeeper.
Another possibility is that your Kafka installation has been modified and may point to the wrong Zookeeper cluster. The default location for Kafka to store data in some configuration is /tmp. On some Linux systems, the default configuration is for the /tmp directory to be deleted during startup, and a system reboot empties the directory and Kafka data configuration.
Check your Zookeeper configuration to ensure that the cluster is configured to connect to the correct Zookeeper installation:
# Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=localhost:2181
Ensure that the configuration for the Zookeeper hosts points to your active Zookeeper installation.
If the configuration of your Kafka and Zookeeper have been checked and are correct, the simplest method is for Kafka to recreate the cluster ID and store this new information within the Zookeeper instance. To do this, you will need to:
Shutdown Zookeeper and Kafka
Delete the data directories that contain the cluster ID configuration
Restart Zookeeper and Kafka
Note that this step will remove data from your Zookeeper and Kafka installations. If you are using these for storing data beyond your LogScale cluster, you will need to follow the longer steps to manually delete the cluster ID configuration information.
To do this:
Stop Zookeeper on each node:
$ ./zkServer.sh stop
$ systemctl kafka stop
Stop Kafka on each node:
$ systemctl kafka stop
Delete your Zookeeper data directory content; this will need to be repeated on each node in your Zookeeper cluster:
$ rm -rf /var/zookeeper/*
Delete your Kafka data directory content; this will need to be repeated on each Kafka node in your Kafka cluster:
$ rm -rf /home/kafka/logs/*
$ ./zkServer.sh start
$ systemctl zookeeper start
$ systemctl kafka start
Check the logs and ensure that Zookeeper and Kafka have started correctly and are communicating.