Cluster Management

A LogScale cluster has some additional requirements for configuration, monitoring, and management. This section explains common monitoring tasks when running a cluster.

Understanding cluster health

Cluster health depends on multiple factors working together. Monitor these key indicators to ensure your cluster operates correctly.

  • Data replication status โ€” Verify data replication according to your configured replication factor. The Cluster Nodes page shows three replication states:

    • Perfect โ€” Data is fully replicated to the target replication factor.

    • Low โ€” Data has not yet reached full replication. This is normal during transfers but should resolve quickly.

    • Absent โ€” Data cannot be found on any node. This indicates node failures or availability issues that require immediate attention.

  • Kafka synchronization โ€” Kafka manages data distribution across the cluster. The Kafka Cluster page shows in-sync partition counts. All partitions should remain synchronized for proper cluster operation.

  • Node availability โ€” All nodes should be reachable and running the same LogScale version. The Health Checks provide programmatic health status with three states:

    • OK โ€” All health checks are within normal parameters.

    • WARN โ€” At least one check needs investigation.

    • DOWN โ€” The node is not functioning and should be removed from load balancers.

  • Resource utilization โ€” Monitor disk usage, ingest latency, and query performance. Health checks trigger warnings when disk usage exceeds 90% or when ingest latency rises above 30 seconds by default.

Monitoring priorities

Different monitoring tools serve different purposes. Use this guidance to determine which tool to check when.

  • Daily health checks โ€” Review the Cluster Nodes page to verify replication status and node availability. Check that Perfect replication percentage remains high and that no data shows as Absent.

  • During incidents โ€” Use the Health Checks API for quick programmatic status checks. Integrate health checks with your monitoring and alerting systems.

  • Performance investigation โ€” Check the Query Monitor to identify heavy queries. Review LogScale Metrics for detailed performance data.

  • Before upgrades โ€” Verify that all nodes show the same version on the Cluster Nodes page. Ensure replication status shows Perfect before starting rolling upgrades.

  • Capacity planning โ€” Monitor disk usage trends and transfer rates on the Cluster Nodes page. Watch for increasing Transfers values that might indicate the need for additional nodes.

Cluster management using GraphQL

To use the GraphQL API for retrieving information on a cluster, see the documentation pages on the cluster() and clusterManagementSettings() query fields. For checking connections, consider using the checkLocalClusterConnection() and checkRemoteClusterConnection() query fields. Related to that, look at the pages on the createRemoteClusterConnection(), deleteClusterConnection(), and similar mutation fields.