Troubleshooting: Error: The Cluster ID ### doesn't match stored clusterId (###)

Affects:

  • Kafka

  • ZooKeeper

Condition or Error

Kafka fails to start or restarts

LogScale fails to connect to Kafka

The topic prefix, set via HUMIO_KAFKA_TOPIC_PREFIX is configured differently on different nodes in the cluster

Kafka reports The Cluster ID ### doesn't match stored clusterId (###)

After:

  • Restarting ZooKeeper (up to LogScale 1.107)

  • Restarting Kafka

  • Rebooting your system

You may identify one or more of the following errors:

  • LogScale fails to start, or regularly restarts with the following error:

syslog
2021-07-25T07:37:15.870+0000 [main] ERROR c.h.m.ServerRunner$ -1 - Got exception starting pid=21 java.util.concurrent.TimeoutException: null
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:108) ~[humio-assembly.jar:0.1]
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272) ~[humio-assembly.jar:0.1]
at com.humio.kafka.KafkaAdminUtils.numLiveKafkaBrokers(KafkaAdminUtils.scala:71) ~[humio-assembly.jar:0.1]
at com.humio.kafka.KafkaAdminUtils.createTopic(KafkaAdminUtils.scala:97) ~[humio-assembly.jar:0.1]
at com.humio.kafka.GlobalKafka.$anonfun$setupKafkaQueues$1(Global.scala:129) ~[humio-assembly.jar:0.1]
at com.humio.kafka.GlobalKafka.$anonfun$setupKafkaQueues$1$adapted(Global.scala:128) ~[humio-assembly.jar:0.1]
at scala.util.Using$.resource(Using.scala:261) ~[humio-assembly.jar:0.1]
at com.humio.kafka.GlobalKafka.setupKafkaQueues(Global.scala:128) ~[humio-assembly.jar:0.1]
at com.humio.kafka.GlobalKafka.<init>(Global.scala:114) ~[humio-assembly.jar:0.1]
at com.humio.kafka.GlobalKafka.<init>(Global.scala:99) ~[humio-assembly.jar:0.1]
at com.humio.core.Config$.fromEnv(Config.scala:1467) ~[humio-assembly.jar:0.1]
at com.humio.main.ServerRunner$.main(ServerRunner.scala:92) ~[humio-assembly.jar:0.1]
at com.humio.main.ServerRunner.main(ServerRunner.scala) ~[humio-assembly.jar:0.1]
  • Kafka fails to start or continually restarts with the following error:

syslog
humio    | 2021-07-25 05:58:23,439 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
humio    | 2021-07-25 05:58:24,862 INFO exited: kafka (exit status 1; not expected)
humio    | 2021-07-25 05:58:25,870 INFO spawned: 'kafka' with pid 48123
humio    | 2021-07-25 05:58:26,872 INFO success: kafka entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
humio    | 2021-07-25 05:58:28,334 INFO exited: kafka (exit status 1; not expected)
  • Kafka fails to start and the Kafka logs show the following errors:

syslog
[2021-07-24 05:19:51,428] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x10001738b400026, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2021-07-24 05:19:51,433] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2021-07-24 05:19:51,717] INFO Cluster ID = LRXmWrj-RASwLZbqs2gr1g (kafka.server.KafkaServer)
[2021-07-24 05:19:51,748] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentClusterIdException: The Cluster ID LRXmWrj-RASwLZbqs2gr1g doesn't match stored clusterId Some(jnMgzYVhQb6OqKuLtHc0tw) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
at kafka.server.KafkaServer.startup(KafkaServer.scala:220)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
at kafka.Kafka$.main(Kafka.scala:84)
at kafka.Kafka.main(Kafka.scala)
[2021-07-24 05:19:51,755] INFO shutting down (kafka.server.KafkaServer)
[2021-07-24 05:19:51,784] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-07-24 05:19:51,894] INFO Session: 0x10001738b400026 closed (org.apache.zookeeper.ZooKeeper)
[2021-07-24 05:19:51,894] INFO EventThread shut down for session: 0x10001738b400026 (org.apache.zookeeper.ClientCnxn)
[2021-07-24 05:19:51,899] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-07-24 05:19:51,909] INFO shut down completed (kafka.server.KafkaServer)
[2021-07-24 05:19:51,911] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2021-07-24 05:19:51,936] INFO shutting down (kafka.server.KafkaServer)

Causes

  • A Kafka deployment includes a number of configuration values that are stored within the corresponding ZooKeeper deployment so that the information can be shared and accessed across the cluster. The cluster ID for Kafka is used to ensure that all the Kafka nodes within the cluster are members of the same cluster and sharing the same information and structures.

    An invalid cluster error occurs when the expected cluster ID for Kafka and the cluster ID within ZooKeeper do not match. Cluster IDs are automatically created using a random unique ID when the cluster is first started. The core error is that Kafka and ZooKeeper no longer agree on what the cluster ID. The use of a cluster ID can be considered part of the security of your installation and provides verification to ensure the correct cluster information and configuration is being used at all times.

    When there is an invalid cluster ID for Kafka the potential causes include:

    • Your ZooKeeper data has been lost or the location has changed

    • Your Kafka data has been lost or the location has changed

  • The cluster ID used by Kafka is a concatentation of the Kafka ID and the topic prefix, set via the HUMIO_KAFKA_TOPIC_PREFIX. If the prefix has been set, and Kafka continutes to use the old prefix, or the prefix has been set inconsistently across different nodes in the cluster, the cluster ID reported by Kafka can be stored and reported incorrectly.

  • One potential cause for this error is that the data directory used by your ZooKeeper installation has been changed or lost. You should first check the configured location of your ZooKeeper data. Find your zoo.cfg file and check the dataDir configuration line:

    ini
    tickTime=2000
    dataDir=/tmp/zookeeper/
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=zoo1:2888:3888
    server.2=zoo2:2888:3888
    server.3=zoo3:2888:3888

    If the error occurs, particularly after a reboot, check if the dataDir configuration points to the /tmp directory. On some Linux systems, the default configuration is for the /tmp directory to be deleted during startup, and a system reboot empties the directory and the ZooKeeper configuration.

    If the configuration is set to /tmp or another temporary location, modify the location to point to a dedicated directory, for example /var/zookeeper.

  • Another possibility is that your Kafka installation has been modified and may point to the wrong ZooKeeper cluster. The default location for Kafka to store data in some configuration is /tmp. On some Linux systems, the default configuration is for the /tmp directory to be deleted during startup, and a system reboot empties the directory and Kafka data configuration.

    Check your ZooKeeper configuration to ensure that the cluster is configured to connect to the correct ZooKeeper installation:

    ini
    # ZooKeeper connection string (see zookeeper docs for details).
    # This is a comma separated host:port pairs, each corresponding to a zk
    # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
    # You can also append an optional chroot string to the urls to specify the
    # root directory for all kafka znodes.
    
    zookeeper.connect=localhost:2181

    Ensure that the configuration for the ZooKeeper hosts points to your active ZooKeeper installation.

Solutions

  • If the configuration of your Kafka and ZooKeeper have been checked and are correct, the simplest method is for Kafka to recreate the cluster ID and store this new information within the ZooKeeper instance. To do this, you will need to:

    1. Shutdown ZooKeeper and Kafka

    2. Delete the data directories that contain the cluster ID configuration

    3. Restart ZooKeeper and Kafka

    Warning

    Note that this step will remove data from your ZooKeeper and Kafka installations. If you are using these for storing data beyond your LogScale cluster, you will need to follow the longer steps to manually delete the cluster ID configuration information.

    To do this:

    1. Stop ZooKeeper on each node:

      shell
      $ ./zkServer.sh stop

      or

      shell
      $ systemctl kafka stop
    2. Stop Kafka on each node:

      shell
      $ /bin/kafka-server-stop.sh

      or

      shell
      $ systemctl kafka stop
    3. Delete your ZooKeeper data directory content; this will need to be repeated on each node in your ZooKeeper cluster:

      shell
      $ rm -rf /var/zookeeper/*
    4. Delete your Kafka data directory content; this will need to be repeated on each Kafka node in your Kafka cluster:

      shell
      $ rm -rf /home/kafka/logs/*
    5. Start ZooKeeper

      shell
      $ ./zkServer.sh start

      or

      shell
      $ systemctl zookeeper start
    6. Start Kafka

      shell
      $ /bin/kafka-server-start.sh

      or

      shell
      $ systemctl kafka start

    Check the logs and ensure that ZooKeeper and Kafka have started correctly and are communicating.

  • Check the setting of the HUMIO_KAFKA_TOPIC_PREFIX on each node in the cluster.

    If the setting is consistent, try updating the setting on each node to a new value which will reconfigure the generated cluster ID used in Kafka.