Azure Archiving

Security Requirements and Controls

Change archiving settings permission

Available: Azure Archiving v1.211.0

Azure Archiving is available from version 1.211.0.

LogScale supports archiving ingested logs to Azure. The archived logs are then available for further processing in any external system that integrates with Azure. The files written by LogScale in this format are not searchable by LogScale — this is an export meant for other systems to consume.

For information about using Azure as storage for segments in a format that LogScale can read, see Bucket Storage.

When Azure archiving is enabled all the events in repository are backfilled into Azure and then it archives new events by running a periodic job inside all LogScale nodes, which looks for new, unarchived segment files. The segment files are read from disk, streamed to an Azure bucket, and marked as archived in LogScale.

Azure archive storage uses the native file format of LogScale.

An administrator must set up archiving per repository. After selecting a repository on LogScale, the configuration page is available under Settings.

Azure archiving storage format and layout

When uploading a segment file, LogScale creates the Azure object key based on the tags, start date, and repository name of the segment file. The resulting object key makes the archived data browsable through the Azure administrator console.

LogScale uses the following pattern:

shell

REPOSITORY/TYPE/TAG_KEY_1/TAG_VALUE_1/../TAG_KEY_N/TAG_VALUE_N/YEAR/MONTH/DAY/START_TIME-SEGMENT_ID.gz

Where:

REPOSITORY
Name of the repository
type
Keyword (static) to identify the format of the enclosed data.
TAG_KEY_1
Name of the tag key (typically the name of parser used to ingest the data, from the #type field)
TAG_VALUE
Value of the corresponding tag key.
YEAR
Year of the timestamp of the events
MONTH
Month of the timestamp of the events
DAY
Day of the timestamp of the events
START_TIME
The start time of the segment, in the format HH-MM-SS
SEGMENT_ID
The unique segment ID of the event data

An example of this layout can be seen in the file list below:

shell

$ s3cmd ls -r s3://logscale2/accesslog/
2023-06-07 08:03         1453  s3://logscale2/accesslog/type/kv/2023/05/02/14-35-52-gy60POKpoe0yYa0zKTAP0o6x.gz
2023-06-07 08:03       373268  s3://logscale2/accesslog/type/kv/humioBackfill/0/2023/03/07/15-09-41-gJ0VFhx2CGlXSYYqSEuBmAx1.gz

For more information about this layout, see Parsing Event Tags.

File format

LogScale supports two formats for storage: native format and NDJSON.

Native Format

The native format is the raw data, i.e. the equivalent of the @rawstring of the ingested data:

accesslog

127.0.0.1 - - [07/Mar/2023:15:09:42 +0000] "GET /falcon-logscale/css-images/176f8f5bd5f02b3abfcf894955d7e919.woff2 HTTP/1.1" 200 15736 "http://localhost:81/falcon-logscale/theme.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
127.0.0.1 - - [07/Mar/2023:15:09:43 +0000] "GET /falcon-logscale/css-images/alert-octagon.svg HTTP/1.1" 200 416 "http://localhost:81/falcon-logscale/theme.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36"
127.0.0.1 - - [09/Mar/2023:14:16:56 +0000] "GET /theme-home.css HTTP/1.1" 200 70699 "http://localhost:81/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] "GET /css-images/help-circle-white.svg HTTP/1.1" 200 358 "http://localhost:81/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] "GET /css-images/logo-white.svg HTTP/1.1" 200 2275 "http://localhost:81/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"

NDJSON Format

The default archiving format is NDJSON When using NDJSON, the parsed fields will be available along with the raw log line. This incurs some extra storage cost compared to using raw log lines but gives the benefit of ease of use when processing the logs in an external system.

json

{"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [07/Mar/2023:15:09:42 +0000] \"GET /falcon-logscale/css-images/176f8f5bd5f02b3abfcf894955d7e919.woff2 HTTP/1.1\" 200 15736 \"http://localhost:81/falcon-logscale/theme.css\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_1_1678201782","@timestamp":1678201782000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
{"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [07/Mar/2023:15:09:43 +0000] \"GET /falcon-logscale/css-images/alert-octagon.svg HTTP/1.1\" 200 416 \"http://localhost:81/falcon-logscale/theme.css\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_3_1678201783","@timestamp":1678201783000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
{"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [09/Mar/2023:14:16:56 +0000] \"GET /theme-home.css HTTP/1.1\" 200 70699 \"http://localhost:81/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_15_1678371416","@timestamp":1678371416000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
{"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] \"GET /css-images/help-circle-white.svg HTTP/1.1\" 200 358 \"http://localhost:81/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_22_1678371419","@timestamp":1678371419000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}
{"#type":"kv","#repo":"weblog","#humioBackfill":"0","@source":"/var/log/apache2/access_log","@timestamp.nanos":"0","@rawstring":"127.0.0.1 - - [09/Mar/2023:14:16:59 +0000] \"GET /css-images/logo-white.svg HTTP/1.1\" 200 2275 \"http://localhost:81/\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36\"","@id":"XPcjXSqXywOthZV25sOB1hqZ_0_23_1678371419","@timestamp":1678371419000,"@ingesttimestamp":"1691483483696","@host":"ML-C02FL14GMD6V","@timezone":"Z"}

A single NDJSON line is just a JSON object, which formatted looks like this:

json

{
"#humioBackfill" : "0",
"#repo" : "weblog",
"#type" : "kv",
"@host" : "ML-C02FL14GMD6V",
"@id" : "XPcjXSqXywOthZV25sOB1hqZ_0_1_1678201782",
"@ingesttimestamp" : "1691483483696",
"@rawstring" : "127.0.0.1 - - [07/Mar/2023:15:09:42 +0000] \"GET /falcon-logscale/css-images/176f8f5bd5f02b3abfcf894955d7e919.woff2 HTTP/1.1\" 200 15736 \"http://localhost:81/falcon-logscale/theme.css\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36\"",
"@source" : "/var/log/apache2/access_log",
"@timestamp" : 1678201782000,
"@timestamp.nanos" : "0",
"@timezone" : "Z"
}

Each record includes the full detail for each event, including parsed fields, the original raw event string, and tagged field entries.

How data is uploaded to Azure

Data is uploaded to Azure as soon as a segment file has been created during ingest. For more information, see Ingestion: Digest Phase.

Each segment file is sent as as multipart upload, so the upload of a single file may require multiple Azure requests. The exact number of requests will depend on rate of ingest, but expect a rate of one request for each 8MB of ingested data.

Azure storage configuration

To archive data to Azure, you must do the following steps in order.

Configure Azure storage in Azure administrator
Enable Azure archiving in LogScale

Configure Azure Blob Storage

Note

Prior to configuring Azure blob storage, make sure you or the person configuring blob storage has the Storage account contributor role to be able to configure blob storage correctly and obtain all of the information needed to configure Azure blob storage in LogScale.

LogScale only supports authorization using the Account key-based authentication with Azure Role Based Access Control (RBAC). Each storage account to which LogScale needs to read, write, and delete to the account's blob storage must have the following three standard, built-in Azure permissions:

Screenshot of Azure portal showing the Access Control (IAM) section for a storage account, displaying the required RBAC permissions for LogScale: Storage blob data reader, Storage blob data contributor, and Storage blob data owner roles configured for Azure archiving functionality.

Figure 5. Azure storage account permissions

For detailed information about how to set up an Azure storage account, see Azure storage account setup.

For detailed information about configuring blob storage in your Azure storage account, see Create storage containers.

Once you have set up blob storage, you can retrieve the storage account name and account key to use when configuring storage in LogScale:

Figure 6. Azure storage account name and account key

Enable Azure Storage in LogScale

You must set some options in the LogScale configuration file related to using Azure archiving.

Read, write, and delete access is set in Azure on the storage container.

Below is an excerpt from the LogScale configuration file, showing the options to set; your actual values will differ.

ini

AZURE_ARCHIVING_ACCOUNTNAME=$ACCOUNT_NAME
AZURE_ARCHIVING_ACCOUNTKEY=$ACCOUNT_KEY
AZURE_ARCHIVING_ENDPOINT_BASE=http://my-own-azure:8080

These variables set the following values:

Azure storage only supports Access Key based authentication for now. This means the environment variables AZURE_ARCHIVING_ACCOUNTNAME and AZURE_ARCHIVING_ACCOUNTKEY are required for authentication. These environment variables map to the Azure Storage account name and access key.
AZURE_ARCHIVING_ENDPOINT_BASE is required to set the hosting endpoint for Azure to use

Additionally, in LogScale you need to:

Go to the repository you want to archive and select Settings → Azure Archiving.
Configure the bucket name and then click Save.

You can change the settings to point to a fresh bucket at any point in time in Settings → Azure Archiving. From that point, LogScale will write new files to that bucket while still reading from any previously-configured buckets. Existing files already written to any previous bucket will not get written to the new bucket. LogScale will continue to delete files from the old buckets that match the file names that LogScale would put there.

Important

Until archiving has been configured at the cluster level, no archiving can be configured at the org level within the LogScale user interface.

Monitor Azure archiving

To monitor the Azure archiving process, the following query can be executed in the humio repository:

logscale

#kind=logs thread=/archiving-upload-latency/ class!=/TimerExecutor
| JobAssignmentsImpl/

A monitoring task within LogScale checks this and reports if the latency is greater than 15 minutes. This adds an event entry to the humio repo using the phrase Archiving is lagging by ingest time for more than (ms).

Troubleshoot Azure archiving configuration

If you encounter an access denied error message when configuring Azure archiving, check your configuration settings for missing information or typos.

Tag grouping

If tag grouping is applied for a repository, the archiving logic will upload one segment into one Azure file, even though the tag grouping makes each segment possibly contain multiple unique combinations of tags. The TAG_VALUE part of the Azure file name that corresponds to a tag with tag grouping will not contain any of the specific values for the tag in that segment, but will instead contain an internal value that denotes which tag group the segment belongs to. This is less human readable than splitting out a segment into a number of Azure files corresponding to each unique tag combination in the segment, but avoids the risk of a single segment being split into an unmanageable amount of Azure files.

Other options

The following sections describe other options for configuring Azure archiving, and for fine tuning performance.

HTTP proxy

If LogScale is set up to use an HTTP_PROXY_HOST, it will be used for communicating with Azure by default. To disable it, set the following:

ini

# Use the globally configured HTTP proxy for communicating with Azure.
# Default is true.
AZURE_ARCHIVING_USE_HTTP_PROXY=false

Azure archived log re-ingestion

You can re-ingest log data that has been written to an Azure bucket through Azure archiving by using Log Collector and the native JSON parsing within LogScale.

This process has the following requirements:

The files need to be downloaded from the Azure bucket to the machine running the Log Collector. The Azure files cannot be accessed natively by the Log Collector.
The ingested events will be ingested into the repository that is created for the purpose of receiving the data.

To re-ingest logs:

Create a repo in LogScale where the ingested data will be stored. See Creating a Repository or View.
Create an ingest token, and choose the JSON parser. See Assigning Parsers to Ingest Tokens.

Install the Falcon LogScale Collector to read from a file using the .gz extension as the file match. For example, using a configuration similar to this:

yaml

#dataDirectory is only required in the case of local configurations and must not be used for remote configurations files.
dataDirectory: data
sources:
bucketdata:
  type: file
  # Glob patterns
  include:
  - /bucketdata/*.gz
  sink: my_humio_instance
  parser: json
  ...

For more information, see Configuration Examples.

Copy the log file from the Azure bucket into the configured directory (/bucketdata) in the above example.

The Log Collector reads the file that has been copied, sends it to LogScale, where the JSON event data will be parsed and recreated.

Self-Hosted Overview

Instance Administration

Organization Essentials

Configure Security

Users and Permissions

Cluster Management

Health Checks

Ingesting Data

Configuration Variables

LogScale URLs and Endpoints

Limits and Standards

Deployment Overview

Planning Your Deployment

Instance Sizing

Authentication and identity providers

Other authentication methods

Storage Architecture

Installing Using Containers

Installing On Bare Metal or Cloud Instance

Reference Architectures

Installing Load Balancers

Deploying Auxiliary Services

Configuration Settings

Managing Your Deployment

Testing Your Deployment

Humio Operator

Data Analysis Overview

LogScale Web Interface

Manage Repositories and Views

Manage Account

Parse Data

Search Data

Write Queries

Query Language Syntax

Query Joins and Lookups

Query Functions

Data Visualization

Automation

Template Language

Keyboard Shortcuts