Storage Rules

In LogScale, data is distributed across the cluster nodes. Which nodes store what is chosen randomly. The only thing you as an operator can control is how big a portion is assigned to each node, and that multiple replicas are not stored on the same rack/machine/location (to ensure fault-tolerance).

Data is stored in units of segments, which are compressed files between 0.5GB and 1GB. For more information on segments and how data is stored and ingested, see Ingestion: Digest Phase.

See LogScale Multiple-byte Units for more information on how storage numbers are calculated.

Replication Factor

If you want fault-tolerance, you should ensure your data is replicated across multiple nodes, physical servers, and geographical locations.

You can achieve this by setting the storage replication factor higher than 1, and configuring ZONE on your nodes. LogScale uses your ZONE node configuration to determine where to place data, we will always place data in as many ZONEs as possible.

Storage Divergence

LogScale is capable of storing and searching across huge amounts of data. When LogScale Operational Architecture join or leave the cluster, data will usually need to be moved between nodes to ensure the replication factor is upheld and that no data is lost.

LogScale automatically redistributes data when nodes go offline, ensuring that your configured replication factor is met. This movement of data is throttled, to avoid excessively loading the cluster when a node goes offline. The "Low" counter will show a non-zero number while data is not replicated properly, letting you tell whether this movement of data is complete.

Evict a Node

If you know ahead of time that you want to Adding & Removing Nodes from the cluster, you can reduce the impact on the cluster by first evicting the node. Eviction will migrate work off of the node, and move data from the evicted node to other nodes.