GCP Deployment Prerequisites
Before following this guide, there are some basic prerequisites and tooling:
GCP Required Tool Components
The following tools are required to follow this architecture guide:
Terraform 1.5.7+
kubectl 1.27+
gcloud cli 447.0.0+
You should also install the gcloud auth plugin.
Helm v3+
Logscale on GCP Requirements
The following requirements exist for any LogScale deployment:
Bucket Storage
GCP provides NVMe storage in the form of local SSDs, which are directly attached to the virtual machine instances. Local SSDs offer high Input/Output Operations Per Second(IOPS) and low latency. When utilizing ephemeral instances bucket storage is required for a production environment as it acts as the persistent storage for the cluster.
Kubernetes
The minimum Kubernetes version supported by the LogScale Operator can be found Version Matrix.
Strimzi Operator
Strimzi Operator. You can install strimzi operator using helm.
LogScale relies on Kafka as a fault tolerant event bus and internal cluster communication system. You must have an available Kafka cluster before deploying LogScale.
See the Deploying and Upgrading Strimzi guide for more info:
The recommended deployment uses Rack awareness in Kafka configs (
topology.kubernetes.io/zone
label) to spread replicas across different racks, data centers, or availability zones.TLS
By default the LogScale Operator utilizes cert-manager to create an internal certificate authority for use by the LogScale cluster. In addition, support for provisioning certificates for external connectivity can be used in conjunction with cert-manager's external issuer support. If LogScale is configured to expose its APIs using HTTPS, which is the default, LogScale assumes Kafka connectivity will also utilize TLS, this is configurable. In some environments that employ service meshes that implement TLS or mTLS, TLS support can be disabled completely.
Topo-lvm for preparing NVMe disks
HumioCluster resources assumes disks are prepped on the underlying k8s worker nodes. We use raid0 on the local SSDs (or as GCP calls them, ephemeral local SSD), in combination with bucket storage. So as long as Kafka is stable and bucket storage is working, then using raid0 on the individual k8s workers is fine. TopoLVM provides dynamic volume provisioning using LVM, making it easier to manage disk space for kubernetes pods.
Workload Identity for Google Cloud Storage
Workload Identity allows us to associate a Google Kubernetes Engine (GKE) service with a specific Google Cloud service account. This minimizes the need to embed GCS credentials directly in our app or pod configs, reducing the risk of exposure. Service account keys are long-lived credentials that, if compromised, could lead to security risks. With Workload Identity, there's no need to manually rotate service account keys. GKE manages the credentials automatically, reducing administrative overhead
Instance Sizing
The provided Terraform has templates to create clusters for varying sizes. These templates are meant as a starting point and different deployment requirements will require different sizing depending on the particular workload. By default an extra small cluster is created that can ingest 1TB per day.