Monitoring Capacity Metrics

LogScale is able to monitor its own performance and metrics through the humio and humio-metrics repositories. These metrics should be used as an indicator of the performance of the system overall as well as enabling examination of the LogScale metrics.

For an easy way of monitoring metrics, the humio/insights package provides a number of dashboards and specific widgets for reporting and displaying this information. It is installed and available against the humio repository by default. You can read more about this package in Insights Package.

The following metrics, and the humio/insights dashboards where the metrics can be found, should be monitored regularly:

  • The Hosts dashboard shows a variety of metrics related to the individual nodes within LogScale. Some key metrics to monitor include:

    • CPU performance - consistently high CPU usage may indiciate that a particular part of the overall system is starved for resources and may have an impact on ingestion, digestion or querying. Check the corresponding metrics to determine whether

    • Memory usage - shows how much memory is being used on the system. An increasing or consistently high memory usage could indicate that there is insufficient physical memory. In particular, the system usage percentage widget shows you the percentage of memory being used for caching in comparison to each LogScale node's total memory allocation. LogScale utilizes memory for caching segment files being used in queries for speed and performance.

    • Disk usage - widgets display primary and secondary disk usage. It is important this does not go above 85% as this could result in data loss.

    You can read more in the documentation for the Hosts dashboard.

  • The Ingest dashboard. Many widgets are available for tracking metrics related to ingest. Some key metrics to monitor include:

    • Ingest Per Rep - this displays the ingested bytes per repository. Hovering over the widget shows all repositories and the corresponding total bytes ingested at that time. This can be useful if you need to make load on repositories more balanced, reduce bytes ingested for a repository to comply with license agreements, or you need to reduce overall load on the cluster by blocking ingest on specific repositories.
    • Parsers Using The Most Time (Millis) - this widget lists the parsers in operation and key performance metrics, including timePerEvent. Any case of 1 millisecond per event should be investigated as a potential bottleneck - it could mean that parser is inefficient.

    You can read more in the documentation for the Ingest dashboard.

  • The Bucket Storage dashboard. Many widgets are available for tracking metrics related to bucket storage. Some key metrics to monitor include:

    • Download queue cap size hits - the number of times the request download queue size cap is hit.
    • Bytes read/written from/to bucket storage - displays bucket storage reads and writes and the total.

    You can read more in the documentation for the Bucket Storage dashboard.

  • The Segments and DataSources dashboard. Many widgets are available for tracking metrics related to segments and data sources. Some key metrics to monitor include:

    • Segment Merges Per Hour - as ingest takes place mini-segments are created that are later merged into segments. This metric shows the CPU time spent per LogScale host per hour on merging these mini-segment files.
    • Merged Segments Sizes (Bytes) - This is median, 75th and 95th percentile of the size of the file for segments created by merging mini-segments. A value below 1GB is a sign of a healthy cluster.

    You can read more in the documentation for the Segments and DataSources dashboard.

See the Insights package documentation for more details on the dashboards and widgets available.