Cluster Nodes
Part of our Foundational Concepts series:
Back to Architecture of LogScale
This is the last in the series!
LogScale can run as distributed system in a cluster. Root users can access the Cluster Node Administration Page in the UI from the account menu.
Common Node Tasks:
Node Roles
All nodes in the cluster run the same software that can be configured to assume a combination of different logical roles:
A node can have any combination of the four roles, and all play a part in LogScale's data ingest flow.
Below is a short overview of the node types. For more detailed explanation refer to the Ingest Flow page.
It may be beneficial to specialize to only assume one role since that allows you to better tune cluster performance and cost; See the Node Role Examples.
Nodes in the Ingest or HTTP API roles are usually the only nodes exposed to external systems and are placed behind a load balancer.
Ingest Node
An Ingest node is a node that is responsible for servicing:
Ingest-only parts of the HTTP API
TCP and UDP ingest listeners
Parsing incoming data
The ingest node receives data from external systems, parses it into Events, and passes the data on to the Digest Node.
A node can be configured as a stateless ingest-only node by adding
NODE_ROLES=ingestonly
to the
configuration.
In order to remain stateless, a node in this role does not join the cluster as seen from the rest of the cluster. It does not show up in the cluster management UI and it does not get a node ID. This means that TCP/UDP ingest listeners that need to run on these nodes must be configured to run on all nodes, not tied to a specific node.
HTTP API Node
An HTTP API node is a node that is responsible for servicing:
The Web UI
The full HTTP API
TCP and UDP ingest listeners
Parsing incoming data
The HTTP API node handles all types of HTTP request, including those of the ingest node. An HTTP API node is visible in the cluster management user interface. It uses the local data directory for cache storage of files.
A node can be configured as an HTTP API node by adding
NODE_ROLES=httponly
to the
configuration of the node.
Digest Node
A digest node is responsible for
Constructing segment files for incoming events (the internal storage format in LogScale)
Executing the real-time part of searches.
Executing the historical part of searches on recent events (older events are handled by the storage nodes)
Once a segment file is completed it is passed on to storage nodes.
Digest nodes are designated by adding them to the cluster's Digest Rules. Any node that appears in the digest rules is a Digest Node.
A digest node must have the
NODE_ROLES=all
in the
configuration of the node, but as that is the default value, leaving
it out works too.
Storage Node
A storage node is a node that saves data to disk. They are responsible for:
Storing Events (Segment files constructed by Digest Rules)
Executing the historical part of searches (the most recent results are handled by digest nodes)
The data directory of a storage node is used to store the segment files. Segment files make up for the bulk of all data in LogScale.
Storage nodes are configured using the cluster's Storage Rules. Any node that appears in the storage rules is considered a Storage Node. A node that was previously in a storage rule can still contain segment files that are used for querying.
The Storage Rules are used to configure data Replication Factor.
A storage node must have the NODE_ROLES=all in the configuration of the node, but as that is the default value, leaving it out works too.
Node Role Examples
Here are a few examples of how the roles may be applied when setting up a cluster.
Single Node
A single LogScale node is a cluster of just one node which needs to assume all roles.
Symmetric Cluster
This configuration has all nodes being equal.
All nodes run on similar hardware configurations and all have the
default configuration for role of
all
. The load balancer has all
the cluster nodes in the set of backend nodes and dispatches HTTP
requests to all of them. The digest and storage partitions should be
assigned so that all nodes have a fair share of the partitions where
they act as primary node.
Cluster with Frontend or Backend Nodes
This configuration allows using potentially cheaper nodes with limited and slow storage as frontend nodes thus relieving the more expensive nodes with fast local storage from the tasks that do not require fast local storage.
The backend nodes with fast local storage are configured with the node
role all
and are the ones
configured as digest and storage nodes in the cluster.
The cheaper frontend nodes are configured with the node role
httponly
and only these are
added to the set of nodes known by the load balancer. The backend
nodes will then never see HTTP requests from outside the cluster.
Dedicated Ingest Nodes
As the number of cluster nodes required to handle the ingest traffic
grows, it may be convenient to add stateless ingest nodes to the
cluster. These nodes need a persistent data directory, but cause very
little disruption to the cluster when added or removed. They are
removed automatically by the cluster if offline for a while. This
makes it easier to add and remove this kind of node as demand changes.
The nodes are configured in this way by setting the parameter
NODE_ROLES
to ingestonly
.
The load balancing configuration should to direct ingest traffic primarily to the current set of stateless ingest nodes and direct all other HTTP traffic to the HTTP API nodes. Using a separate DNS name or port for this split is recommended, but splitting the traffic based on matching substrings in the URL is also possible.
The extra complexity added by also managing this split of the HTTP API requests means that adding dedicated ingest nodes is not worth the effort for smaller clusters.
Node Identity
A cluster node is identified in the cluster by it's UUID (Universally
unique identifier). The UUID is automatically generated the first time
a node is started. The UUID is stored in
$HUMIO_DATA_DIR/cluster_membership.uuid
.
When moving/replacing a node you can use this file to ensure a node
rejoins the cluster with the same identity.
Part of our Foundational Concepts series:
Back to Architecture of LogScale
This is the last in the series!