Data Storage, Buckets, and Archiving

Security Requirements and Controls

Change bucket storage permission

Data that is ingested into a repository is stored locally. To enable Falcon LogScale to effectively store more than would fit in the primary disk, secondary storage and bucket storage can be used to extend the overall capacity. Falcon LogScale intelligently moves data from the different tiers of storage to make the most recently used data on the primary storage, with older, less recently used data stored on secondary and then bucket storage.

There are several methods and factors related to storing LogScale data that you might consider, including:

Data Retention
To avoid servers reaching their maximum storage capabilities, Falcon LogScale can be configured to expire (delete) data when approaching a given threshold, such as the compressed file sizes, uncompressed file sizes, or the age of data.
Secondary storage
Active data is stored on local disks within each node of the Falcon LogScale cluster. Primary disks should be high performance SSD. For additional local storage, secondary storage, for example, a lower performance SSD can be used. Falcon LogScale will automatically move segment files to secondary storage once the primary disk reaches a configured limit.
Bucket Storage
To store larger volumes of data, bucket storage can be used. Similar to secondary storage, Falcon LogScale will move segments to solutions such as Amazon Bucket Storage or Google Bucket. Bucket storage also allows for deployment of nodes, expansion of an existing cluster, and to maintain back-ups in case a node or a cluster crashes.
S3 Archiving
Ingested log data can be archived to Amazon S3. Archiving stores a copy of the ingested data logs, but the the archived data is not searchable by Falcon LogScale as it is when stored on bucket storage. Archived storage can optionally be re-ingested or read by other software.

To monitor the data storage:

Data storage across individual nodes can be monitoring using the Cluster nodes page
To monitor the amount of data stored across the cluster and the effects of compression, see Cluster statistics
For more detailed and historic information, use the humio/insights dashboard.

Before proceeding, familiarize yourself with LogScale's storage rules, covered in the next section.

Storage Rules

In LogScale, data is distributed across the cluster nodes. Which nodes store what is chosen randomly. The only thing you as an operator can control is how big a portion is assigned to each node, and that multiple replicas are not stored on the same rack/machine/location (to ensure fault-tolerance).

Data is stored in units of segments, which are compressed files between 0.5GB and 1GB. For more information on segments and how data is stored and ingested, see Ingestion: Digest Phase.

See LogScale Multiple-byte Units for more information on how storage numbers are calculated.

Replication Factor

If you want fault-tolerance, you should ensure your data is replicated across multiple nodes, physical servers, and geographical locations.

You can achieve this by setting the storage replication factor higher than 1, and configuring ZONE on your nodes. LogScale uses your ZONE node configuration to determine where to place data, we will always place data in as many ZONEs as possible.

Storage Divergence

LogScale is capable of storing and searching across huge amounts of data. When LogScale Operational Architecture join or leave the cluster, data will usually need to be moved between nodes to ensure the replication factor is upheld and that no data is lost.

LogScale automatically redistributes data when nodes go offline, ensuring that your configured replication factor is met. This movement of data is throttled, to avoid excessively loading the cluster when a node goes offline. The "Low" counter will show a non-zero number while data is not replicated properly, letting you tell whether this movement of data is complete.

Evict a Node

If you know ahead of time that you want to Adding & Removing Nodes from the cluster, you can reduce the impact on the cluster by first evicting the node. Eviction will migrate work off of the node, and move data from the evicted node to other nodes.

Self-Hosted Overview

Instance Administration

Organization Essentials

Configuring Security

Authentication & Identity Providers

Users & permissions

Cluster Management

Configuration Settings

Ingesting Data

Configuration Variables

LogScale URLs & Endpoints

Limits & Standards

Deployment Overview

Planning Your Deployment

Provisioning

Installing Using Containers

Installing On Bare Metal or Cloud Instance

Reference Architectures

LogScale Kubernetes Reference Architecture

Installing Load Balancers

Deploying Auxiliary Services

Humio Operator

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Dashboards & Widgets

Automation

Query Language Syntax

Query Functions

Template Language

Keyboard Shortcuts