Storage Rules

In LogScale, data is distributed across the cluster nodes. Which nodes store what is chosen randomly. The only thing you as an operator can control is how big a portion is assigned to each node, and that multiple replicas are not stored on the same rack/machine/location (to ensure fault-tolerance).

Data is stored in units of segments, which are compressed files between 0.5GB and 1GB. For more information on segments and how data is stored and ingested, see Ingestion: Digest Phase.

See LogScale Multiple-byte Units for more information on how storage numbers are calculated.

Replication Factor

If you want fault-tolerance, you should ensure your data is replicated across multiple nodes, physical servers, and geographical locations.

You can achieve this by setting the storage replication factor higher than 1, and configuring ZONE on your nodes. LogScale uses your ZONE node configuration to determine where to place data, we will always place data in as many ZONEs as possible.

Storage Divergence

LogScale is capable of storing and searching across huge amounts of data. When LogScale Operational Architecture join or leave the cluster, data will usually need to be moved between nodes to ensure the replication factor is upheld and that no data is lost.

LogScale automatically redistributes data when nodes go offline, ensuring that your configured replication factor is met. This movement of data is throttled, to avoid excessively loading the cluster when a node goes offline. The "Low" counter will show a non-zero number while data is not replicated properly, letting you tell whether this movement of data is complete.

Evict a Node

If you know ahead of time that you want to Adding & Removing Nodes from the cluster, you can reduce the impact on the cluster by first evicting the node. Eviction will migrate work off of the node, and move data from the evicted node to other nodes.

When evicting a node, the cluster attempts to move digest work away from the node. It also tries to find other nodes to take over ownership of the segments the node has, assuming that is needed in order to achieve full replication. Any query that's already running remains on the node. If it's a non-live query, it can't be migrated, so it will fail and then be run again once you terminate the evicted node. If it's a live query, the query will be migrated once the node terminates, eviction has no effect on this.

The effect of eviction on new queries depends on whether the segments were moved. If the segments were moved away from the evicted node, then queries will not target the node, since they follow the segments. Otherwise, new queries continue to be submitted to it.

Query coordination work is not moved away from evicted nodes either, nor is an evicted node excluded from being selected for starting new coordination work.

The effect on eviction is mainly on digest and segments. The current benefit of eviction is that it enables you to tell the cluster to replicate data elsewhere, so you don't see any increased risk of data loss when you terminate the node. What's this means is that if you have say three replicas, then if you terminate a node in an uncoordinated way, you temporarily have only two replicas. If you evict first, you should never fall below three.

Deployment Overview

Planning Your Deployment

Instance Sizing

Storage Architecture

Installing Using Containers

Installing On Bare Metal or Cloud Instance

Reference Architectures

Installing Load Balancers

Deploying Auxiliary Services

Configuration Settings

Managing Your Deployment

Testing Your Deployment

Storage Rules

Replication Factor

Storage Divergence

Evict a Node

Other articles on this topic

Enter search term