Google Cloud Bucket Storage

LogScale supports writing a copy of the ingested logs to Google Cloud Storage using the native file format of LogScale, allowing LogScale to fetch those files and search them efficiently if the local copies are lost or deleted. This page will explain how to set up bucket storage with Google Cloud Storage. For more details on this topic in general, see the Bucket Storage page.

Keys & Configuration

You need to create a Google service account that is authorized to manage the contents of the bucket that will hold the data. See Google Authentication Documentation for an explanation on how to obtain and provide service account credentials, manually. Go to the Google Service Account Key page to create a service account key.

Once you have the JSON file from Google with a set of credentials, place them in the /etc directory on each LogScale node. Be sure to provide the full path to the file in the configuration file like this:

ini

GCP_STORAGE_ACCOUNT_JSON_FILE=/path/GCS-project-example.json

The JSON file must include the fields project_id, client_email and private_key. Any other field in the file is currently ignored. Additionally, you will need to set some options in the LogScale configuration file, related to using Google Cloud Bucket Storage. Below is an excerpt from that file, showing the options to set — your actual values will be different, though:

ini

GCP_STORAGE_BUCKET=$BUCKET_NAME
GCP_STORAGE_ENCRYPTION_KEY=$ENCRYPTION_SECRET
GCP_STORAGE_OBJECT_KEY_PREFIX=/basefolder
USING_EPHEMERAL_DISKS=true

These variables set the following values:

GCP_STORAGE_BUCKET sets the name of the bucket to use.
The encryption key given with GCP_STORAGE_ENCRYPTION_KEY can be any UTF-8 string and will be used to encrypt the data stored within the bucket. The suggested value is 64 or more random ASCII characters.
The GCP_STORAGE_OBJECT_KEY_PREFIX is used to set the optional prefix for all object keys. This option is empty by default. The GCP_STORAGE_OBJECT_KEY_PREFIX option allows nodes to share a single bucket, but each node must use a unique prefix. There is a performance penalty when using a non-empty prefix, and it is therefore recommend not to use a prefix.
If there are any ephemeral disks in the cluster, you must set the last option here to true.

You can change the settings using the GCP_STORAGE_BUCKET to point to a fresh bucket at any point in time. From that point, LogScale will write new files to that bucket while still reading from any previously-configured buckets. Existing files already written to any previous bucket will not get written to the new bucket. LogScale will continue to delete files from the old buckets that match the file names that LogScale would put there.

Use with Non-Default Endpoints

You can point to your own hosting endpoint for the GCP to use for bucket storage if you host an GCP-compatible service.

ini

GCP_STORAGE_ENDPOINT_BASE=http://my-own-gcs:8080

Google Bucket Parameters

There are a few options that can help in tuning LogScale performance related to using Google Cloud for bucket storage.

Important

There may be financial costs associated with increasing these as storage is billed using a combination of the number of operations and storage used.

You can set the maximum number of files that LogScale will concurrently download or upload. If not set in the configuration file, LogScale will take the number of hyperthreads supported by the CPU(s) and divide it by 2 to determine the value for this option. You might want to set it yourself with a different value:

ini

GCP_STORAGE_CONCURRENCY=8

This first option below is used to set the chunk size for upload and download ranges. The maximum is 8 MB, which is the default. The minimum value is 5 MB.

ini

GCP_STORAGE_CHUNK_SIZE=8388608

Use this next option to set whether you prefer LogScale fetch data files from the bucket when possible — even if another node in the LogScale cluster has a copy. It's set to false by default.

In some environments, it may be less expensive to transfer files this way. The transfer from the bucket may be billed at a lower cost, than a transfer from a node in another region or in another data center.

ini

GCP_STORAGE_PREFERRED_COPY_SOURCE=false

Setting the preference doesn't guarantee that the bucket copy will be used. The cluster can still make internal replications directly when the file is not yet in a bucket.

Export to Bucket with Google Cloud Storage

By default LogScale allows downloading the results of a query to a file. This file is generated as a HTTP stream directly from LogScale, and can be long-lasting with long periods of no data being transmitted when LogScale is searching for rare hits in large data sets. This can cause issues for some networks and load balancers.

As an alternative, LogScale allows exporting to Google Cloud Storage. The result of the query will be uploaded to the bucket storage provider and the user will be given a URL to download the file once the upload is complete.

As LogScale uses signed URLs for downloads, the user does not need read access to the bucket. The following configuration must be set for exporting to Google Cloud Storage:

ini

GCP_EXPORT_ACCOUNT_JSON_FILE=/path/to/GCS-project-example.json
GCP_EXPORT_BUCKET=$BUCKET_NAME

The first line here is the GCP credentials to use when authenticating. The second line is the bucket where exports are sent.

Google Cloud Bucket Storage with Workload Identity

LogScale supports using Workload Identity for bucket storage and export to bucket of query results, rather than an explicit service account for Google Cloud Storage access.

To enable it, use the following configurations for bucket storage and export, respectively.

ini

GCP_STORAGE_WORKLOAD_IDENTITY=true

ini

GCP_EXPORT_WORKLOAD_IDENTITY=true

With these options enabled, the container service account will be used for authentication rather than static keys. This configuration is recommended as the best and most secure practice, therefore it takes precedence over the usage of GCP_STORAGE_ACCOUNT_JSON_FILE and GCP_EXPORT_ACCOUNT_JSON_FILE settings.

Note

The account applied for export requires the SignBlob project level permission.

Self-Hosted Overview

Instance Administration

Organization Essentials

Configure Security

API Tokens

IP Filters

Security policies

Session management

Audit Logging

Authentication & Identity Providers

Users & permissions

Manage Roles

Permissions requirements

Cluster Management

Health Checks

Configuration Settings

Ingesting Data

Configuration Variables

LogScale URLs & Endpoints

Limits & Standards

Deployment Overview

Planning Your Deployment

Provisioning

Installing Using Containers

Installing On Bare Metal or Cloud Instance

Reference Architectures

LogScale Kubernetes Reference Architecture

Installing Load Balancers

Deploying Auxiliary Services

Humio Operator

Data Analysis Overview

LogScale User Interface

Repositories & Views

Parsing Data

Searching Data

Writing Queries

Query Language Syntax

Query Joins and Lookups

Query Functions

Dashboards & Widgets

Automation

Template Language

Keyboard Shortcuts