Adding & Removing Nodes

As administrator of a cluster, you may want to add new nodes to a cluster and you may sometimes want to remove a node.

Adding a Node

There are several reasons why you might want to add more nodes to your LogScale cluster. It may be to improve high-availability, or to increase query performance. You might do this to increase storage capacity, using some nodes for storage and others for query processing or API access — which may also improve performance.

When a new node joins a LogScale cluster it initially won't be responsible for processing any incoming data. There are three core tasks a node performs: parsing, digestion, and storing data. You can read the Ingest page if you want to know more about the different node tasks, but for now we will assume that the node we are adding should take its fair share of the entire workload.

We are going to use the Cluster Node Administration UI, but every step can be performed and automated using Accessing GraphQL using API Explorer. Incidentally, only the first step, on starting a new node, is required. The rest are optional and for more advanced scenarios. As a result, you will probably only do some of them.

Starting a New LogScale Node

The first step is to start the LogScale node and point it at the cluster. You can read about how to configure a node in the Docker Deployment. The important part is that the KAFKA_SERVERS configuration option points at the Kafka servers for the existing cluster.

Once the node has successfully joined the cluster it will appear in the Cluster UI list of nodes.

Notice that the columns Storage and Digest both say 0 / X. That is because at this point the new node's storage will not be used — indicated by the 0 in the Storage column — and it will not be used for digest (processing of events running of real-time queries — indicated by the X in the Digest column.

A node configured like this is called an Arrival Node since its only task is to parse messages arriving at this node, or coordinate queries sent to this node.

Taking Part of Existing Data in Cluster

We would like to have the node to take part of the existing data that was already in the cluster before it joined. This does not happen automatically, because moving a potentially huge amount of data between cluster nodes can adversely impact performance and you might want to do it during slow or downtime.

To move a fraction of the total data stored in the cluster to the node, the fraction shown in the Storage column, follow these steps:

  1. Select the node in the Cluster UI.

  2. Select ActionsMove a share of existing data onto this nodeMove data to node.

You will see that the Traffic column of the node list will indicate that data is being moved to the node.

Removing a Node

When you want to remove a node from a cluster in LogScale, you need to make sure that any digest and archiving responsibilities are transferred to another node. This means removing the node from any digest and archive rules. This will stop the node from accepting any new work. The data stored on the node would be copied to another node to keep the cluster's replication factor stable.

To safely remove a node from a LogScale cluster you need to ensure that the data stored on the node is completely copied to another node before it leaves.

We will be using the Cluster Node UI in this guide, but everything can be automated using the Accessing GraphQL using API Explorer. We will also be listing the associated HTTP calls performed in each step. The Cluster UI will indicate that a node is safe to remove from the cluster with a Yes in the Removable column.

Moving All Data to Other Nodes

Finally, you need to move all data archived on the node to other nodes to ensure that the cluster's replication factor is upheld before the node is removed.

In the Cluster UI, select the node you want to remove in the list of nodes. In the Actions panel, click Move all existing data away from this nodeMove data out of node.

Shut Down the LogScale Process on Node

You should shut down the LogScale process on the node, but you must wait until the Size column of the Node List shows 0 B indicating that no more data resides on the node.

Unregister from Cluster

Finally, you should see that the Removable column says Yes, and you can unregister the node from the cluster, telling other nodes that the node will not be coming back again.

In the Cluster UI, select the node you want to remove in the list of nodes. In the Actions panel, click Remove NodeRemove Node.

Forcing Removal

If a node has died and there is no backup and no way to retrieve the data, you can forcibly unregister the node. This means that you will have to accept potential data loss, if no replicas of the data existed. You can forcibly remove a node by checking Force Remove.