Google Cloud Bucket Storage

Humio supports writing a copy of the ingested logs to Google Cloud Storage using the native file format of Humio, allowing Humio to fetch those files and search them efficiently if the local copies are lost or deleted. This page will explain how to set up bucket storage with Google Cloud Storage. For more details on this topic in general, see the Bucket Storage page.
Keys & Configuration
You need to create a Google service account that is authorized to manage the contents of the bucket that will hold the data. See Google Authentication Documentation for an explanation on how to obtain and provide service account credentials, manually. Go to the Google Service Account Key page to create a service account key.
Once you have the JSON file from Google with a set of credentials,
place them in the /etc
directory on each Humio
node. Be sure to provide the full path to the file in the
configuration file like this:
GCP_STORAGE_ACCOUNT_JSON_FILE=/path/GCS-project-example.json
The JSON file must include the fields
project_id
,
client_email
and
private_key
. Any other field in
the file is currently ignored. Additionally, you will need to set some
options in the Humio configuration file, related to using Google Cloud
Bucket Storage. Below is an excerpt from that file, showing the
options to set — your actual values will be different, though:
GCP_STORAGE_BUCKET=$BUCKET_NAME
GCP_STORAGE_ENCRYPTION_KEY=$ENCRYPTION_SECRET
GCP_STORAGE_OBJECT_KEY_PREFIX=/basefolder
USING_EPHEMERAL_DISKS=true
The first option here is to set the name of the bucket to use. The
encryption key given with GCP_STORAGE_ENCRYPTION_KEY
can be any UTF-8 string. The suggested value is 64 or more random
ASCII characters. The GCP_STORAGE_OBJECT_KEY_PREFIX
is
used to set the optional prefix for all object keys. This option is
empty by default. The GCP_STORAGE_OBJECT_KEY_PREFIX
option allows nodes to share a bucket, but requires them each to write
to a unique prefix. Note, there is a performance penalty when using a
non-empty prefix. We recommend an unset prefix. If there are any
ephemeral disks in the cluster, you must set the last option here to
true
.
You can change the settings using the
GCP_STORAGE_BUCKET
to point to a fresh bucket at any
point in time. From that point, Humio will write new files to that
bucket while still reading from any previously-configured buckets.
Existing files already written to any previous bucket will
not get written to the new bucket. Humio will continue to
delete files from the old buckets that match the file names that Humio
would put there.
Use with Non-Default Endpoints
You can point to your own hosting endpoint for the GCP to use for bucket storage if you host an GCP-compatible service.
GCP_STORAGE_ENDPOINT_BASE=http://my-own-gcs:8080
MinIO in its default mode doesn't use MD5Sum checksums of incoming
streams. This leads to incompatibility with Humio's client. MinIO
provides a workaround: use the --compat
option
instead to start the server. For example, ./minio --compat
server /data
Performance Tuning
There are a few options that can help in tuning Humio performance related to using Google Cloud for bucket storage. Note that there may be financial costs associated with increasing these as S3 is billed also based on the number of operations executed.
You can set the maximum number of files that Humio will concurrently download. If not set, Humio takes the number of hyperthreads and divides that by two to set this option. You might want to set it yourself like so, but with a different values:
GCP_STORAGE_DOWNLOAD_CONCURRENCY=8
GCP_STORAGE_UPLOAD_CONCURRENCY=8
The second option here is to set the maximum number of files that Humio will concurrently upload. If not set in the configuration file, Humio will take the number of hyperthreads and divide it by two to determine the value for this option.
This first option below is used to set the chunk size for upload and download ranges. The maximum is 8 MB, which is the default. The minimum value is 5 MB.
GCP_STORAGE_CHUNK_SIZE=8388608
Use this next option to set whether you prefer Humio fetch data
files from the bucket when possible — even if another node in
the Humio cluster has a copy. It's set to
false
by default.
In some environments, it may be less expensive to transfer files this way. The transfer from the bucket may be billed at a lower cost, than a transfer from a node in another region or in another data center.
GCP_STORAGE_PREFERRED_COPY_SOURCE=false
Setting the preference doesn't guarantee that the bucket copy will be used. The cluster can still make internal replications directly when the file is not yet in a bucket.
Export to Bucket with Google Cloud Storage
By default Humio allows downloading the results of a query to a file. This file is generated as a HTTP stream directly from Humio, and can be long-lasting with long periods of no data being transmitted when Humio is searching for rare hits in large data sets. This can cause issues for some networks and load balancers.
As an alternative, Humio allows exporting to Google Cloud Storage. The result of the query will be uploaded to the bucket storage provider and the user will be given a URL to download the file once the upload is complete.
As Humio uses signed URLs for downloads, the user does not need read access to the bucket. The following configuration must be set for exporting to Google Cloud Storage:
GCP_EXPORT_ACCOUNT_JSON_FILE=/path/to/GCS-project-example.json
GCP_EXPORT_BUCKET=$BUCKET_NAME
The first line here is the GCP credentials to use when authenticating. The second line is the bucket where exports are sent.