LogScale Physical Architecture

graph LR C[Client] subgraph LB [Load Balancer] direction LR P1[Proxy Host] P2[Proxy Host] P3[Proxy Host] end subgraph LC [LogScale Cluster] direction LR H1[LogScale Node] S1[Local Storage] H2[LogScale Node] S2[Local Storage] H3[LogScale Node] S3[Local Storage] end H1<-->S1 H2<-->S2 H3<-->S3 subgraph KC [Kafka Cluster] direction LR K1[Kafka Node] K2[Kafka Node] K3[Kafka Node] end BS([Block Storage]) C-->LB LB-->LC LC-->KC LC-->BS

The LogScale Physical architecture consists of three main components, only one of which is the LogScale cluster:

  • Load Balancer

    LogScale operates as a cluster with multiple physical nodes. From a client perspective, you want to use a single URL or IP address to communicate with your cluster. A load balancer is recommended for all LogScale deployments so that you can use a single URL but have the work and requests redirected to a single node within the LogScale cluster.

  • Kafka Cluster

    LogScale uses Kafka for a number of key processes within the operation of LogScale. Kafka is used because it is a reliable, durable and scalable message queuing system. Kafka is used as a queue mechanism when ingesting data, and also as a message bus for communication between nodes.

  • LogScale Cluster

    The LogScale cluster runs the LogScale application. Typically a cluster is made up three or more nodes, all working together to process and store data. Each node within LogScale will typically have local storage to store any data ingested by LogScale and may optionally make use of external bucket storage such as Amazon S3.

Node Role Deployment Scenarios

Here are a few examples of how the roles may be applied when setting up a cluster so that you can balance the cluster for performance and scalability.

Single Node

A single LogScale node is a cluster of just one node which needs to assume all roles.

Symmetric Cluster

This configuration has all nodes being equal, so all nodes are able to ingest, digest, store and process queries.

In this mode, all nodes should run on similar hardware configurations and all have the default configuration for role of all. The load balancer has all the cluster nodes in the set of backend nodes and dispatches HTTP requests to all of them.

Cluster with Frontend or Backend Nodes

This configuration allows using potentially cheaper nodes with limited and slow storage as frontend nodes thus relieving the more expensive nodes with fast local storage from the tasks that do not require fast local storage.

The backend nodes with fast local storage are configured with the node role all and are the ones configured as digest and storage nodes in the cluster.

The cheaper frontend nodes are configured with the node role httponly and only these are added to the set of nodes known by the load balancer. The backend nodes will then never see HTTP requests from outside the cluster.

Dedicated Ingest Nodes

As the number of cluster nodes required to handle the ingest traffic grows, it may be convenient to add stateless ingest nodes to the cluster. These nodes need a persistent data directory, but cause very little disruption to the cluster when added or removed. They are removed automatically by the cluster if offline for a while. This makes it easier to add and remove this kind of node as demand changes. The nodes are configured in this way by setting the parameter NODE_ROLES to ingestonly.

The load balancing configuration should to direct ingest traffic primarily to the current set of stateless ingest nodes and direct all other HTTP traffic to the HTTP API nodes. Using a separate DNS name or port for this split is recommended, but splitting the traffic based on matching substrings in the URL is also possible.

The extra complexity added by also managing this split of the HTTP API requests means that adding dedicated ingest nodes is not worth the effort for smaller clusters.

Setting the Node Identity

A cluster node is identified in the cluster by it's UUID (Universally unique identifier). The UUID is automatically generated the first time a node is started. The UUID is stored in $HUMIO_DATA_DIR/cluster_membership.uuid. When moving/replacing a node you can use this file to ensure a node rejoins the cluster with the same identity.