How-To: Deploy LogScale with Operator on Google Cloud Platform (GCP)

Important

This material is now deprecated from LogScale version 1.217.0. For the latest material please refer to the Google Cloud Platform (GCP) Reference Architecture docs.

LogScale can be deployed within Google Cloud Platform (GCP). This section provides a guide to the process and configuration within GCP.

This guide is specific to GCP deployments, but it uses the reference architecture and structure covered in Kubernetes Reference Architecture and decide how running LogScale fits into your overall GCP deployment.

GCP Reference Architecture

The reference architecture diagram for a GCP deployment looks like the following diagram:

Diagram showing the architecture for a typical Google Cloud Platform deployment

This includes a number of distinct components described below.

Private GKE cluster

Private clusters offer increased isolation by utilizing private IP addresses for nodes and providing private or public endpoints for control plane nodes, which can be further isolated. With private clusters, we can still access Google APIs through Private Google Access. Within a private cluster, Pods are segregated from both inbound and outbound communication, establishing a cluster perimeter. The directional flows of these communications can be managed by exposing services through load balancing and Cloud NAT. To enable a private endpoint when you create a cluster, use the --enable-private-endpoint flag.

Bastion Host

By default the Terraform will create a bastion host. The bastion host creation can be disabled by adding setting the bastion_host_enabled variable to false on the command line or by adding it to the _override.tf file. A jump host plays a crucial role in enhancing the security and manageability of the GKE private cluster. The cluster nodes are not directly accessible from the public internet, which provides an inherent layer of security. The host serves as an intermediary server that allows authorised users to access the private cluster securely. It acts as a single entry point for SSH access.

The Bastion host acts as a barrier between the public internet and the cluster's internal network, mitigating the risk of unauthorised access or attacks. Access to the bastion host can be tightly controlled using standard authentication mechanisms, in this setup we use SSH key pairs to access the Bastion host and the cluster private nodes, user can generate ssh key pair.

The Bastion host can also be utilised for tasks like maintenance, troubleshooting, and updates. Once deployed, the host can be used as a proxy to run changes on the private cluster as described in Google Documentation.

By default, ssh sessions to GCP VMs timeout after 10 minutes, the session can be extended by adding --ssh-flag="-ServerAliveInterval=60" to gcloud compute ssh command.

The host has a dedicated ssh firewall rule to restrict access except from IP addresses that gcloud IAP uses for TCP forwarding as per Google Documentation

Cert Manager and Let's Encrypt issuer

Cert Manager is a popular Kubernetes tool used for managing the TLS certificates. It is combined with Let's Encrypt issuer to automate the process of obtaining and renewing SSL/TLS certificates for LogScale application. Cert Manager is responsible for certificate management within the GKE cluster, it can generate renew and keep track of TLS certificates. Let's encrypt is a certificate authority that provides TLS/SSL certificates.

Cloud NAT:

To enable outbound internet connectivity for the pods within the private GKE cluster, you can set up Cloud NAT in your VPC network. Cloud NAT acts as a gateway that translates the private IP addresses of the pods to public IP addresses, allowing them to access the internet. It's configured for logscale egress traffic.

Component Version Requirements

Specific component version requirements for deploying LogScale, including the Humio Operator chart 0.22.0, Strimzi helm chart 0.37.0, cert-manager helm chart v1.13.1, and kubernetes version 1.28+. The deployment specifications also require LogScale Docker image version 1.142.1 or higher and supports x86_64 and amd64 CPU architectures.

Deployment Versions list

  • Humio Operator chart version 0.22.0

  • Strimzi helm chart version 0.37.0

  • Cert-manager helm chart version v1.13.1

  • kubernetes version 1.28+

    The minimum supported Kubernetes version supported by the Humio Operator can be seen in the Version Matrix

  • LogScale Docker image, minimum humio/humio-core:1.142.1

  • Humio Operator supports the x86_64 and amd64 CPU architectures.

GCP Deployment Prerequisites

Essential prerequisites and tooling requirements for deploying LogScale on Google Cloud Platform (GCP), including specific version requirements for tools like Terraform, kubectl, gcloud CLI, and Helm. Key infrastructure requirements cover bucket storage using NVMe SSDs, Kubernetes cluster configuration, Strimzi Operator for Kafka management, TLS security implementation, TopoLVM for disk preparation, Workload Identity for GCS authentication, and instance sizing recommendations for different deployment scales.

Before following this guide, there are some basic prerequisites and tooling:

GCP Required Tool Components

The following tools are required to follow this architecture guide:

LogScale on GCP Requirements

The following requirements exist for any LogScale deployment:

  • Bucket Storage

    GCP provides NVMe storage in the form of local SSDs, which are directly attached to the virtual machine instances. Local SSDs offer high Input/Output Operations Per Second(IOPS) and low latency. When utilizing ephemeral instances bucket storage is required for a production environment as it acts as the persistent storage for the cluster.

  • Kubernetes

    The minimum Kubernetes version supported by the Humio Operator can be found Humio Operator Version Matrix.

  • Strimzi Operator

    Strimzi Operator. You can install strimzi operator using helm.

    LogScale relies on Kafka as a fault tolerant event bus and internal cluster communication system. You must have an available Kafka cluster before deploying LogScale.

    See the Deploying and Upgrading Strimzi guide for more info:

    The recommended deployment uses Rack awareness in Kafka configs (topology.kubernetes.io/zone label) to spread replicas across different racks, data centers, or availability zones.

  • TLS

    By default the Humio Operator utilizes cert-manager to create an internal certificate authority for use by the LogScale cluster. In addition, support for provisioning certificates for external connectivity can be used in conjunction with cert-manager's external issuer support. If LogScale is configured to expose its APIs using HTTPS, which is the default, LogScale assumes Kafka connectivity will also utilize TLS, this is configurable. In some environments that employ service meshes that implement TLS or mTLS, TLS support can be disabled completely.

  • Topo-lvm for preparing NVMe disks

    HumioCluster resources assumes disks are prepped on the underlying k8s worker nodes. We use raid0 on the local SSDs (or as GCP calls them, ephemeral local SSD), in combination with bucket storage. So as long as Kafka is stable and bucket storage is working, then using raid0 on the individual k8s workers is fine. TopoLVM provides dynamic volume provisioning using LVM, making it easier to manage disk space for kubernetes pods.

  • Workload Identity for Google Cloud Storage

    Workload Identity allows us to associate a Google Kubernetes Engine (GKE) service with a specific Google Cloud service account. This minimizes the need to embed GCS credentials directly in our app or pod configs, reducing the risk of exposure. Service account keys are long-lived credentials that, if compromised, could lead to security risks. With Workload Identity, there's no need to manually rotate service account keys. GKE manages the credentials automatically, reducing administrative overhead

  • Instance Sizing

    The provided Terraform has templates to create clusters for varying sizes. These templates are meant as a starting point and different deployment requirements will require different sizing depending on the particular workload. By default an extra small cluster is created that can ingest 1TB per day. See also Instance Sizing.

Instance Sizing

Assumptions:

  • 30 Day Retention NVME

  • 20% Overhead left on NVME

  • 10x Compression

  • GCS Bucket storage used for longer retention

  • LogScale does not provide a self-hosted Kubernetes solution for Kafka

Kafka clusters are separate from LogScale clusters to avoid resource contention and allow independent management.

X-Small - 1 TB/Day Ingestion
Software Instances EC2 Instance Type/vCPU Memory Storage Total Storage
LogScale 3 n-standard-16 / 16 122 GB NVME 3 TB 9 TB
Kafka 3 n-standard-8 / 8 32 GB PD-SSD 500 GB 1.5 TB
ZooKeeper (up to LogScale 1.107) 3 4 8GB PD-SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16GB PD-SSD 64 GB 150 GB
Small - 3 TB/Day Ingestion
Software Instances EC2 Instance Type/vCPU Memory Storage Total Storage
LogScale 3 n2-highmem-16 / 16 128 GB NVME 6 TB 18 TB
Kafka 3 n-standard-8 / 8 32 GB PD-SSD 500 GB 1.5 TB
ZooKeeper (up to LogScale 1.107) 3 4 8GB PD-SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16GB PD-SSD 64 GB 150 GB
Medium - 5 TB/Day Ingestion
Software Instances EC2 Instance Type/vCPU Memory Storage Total Storage
LogScale 6 n-standard-32 / 32 128 GB NVME 6 TB (16x375GB) 36 TB
Kafka 6 n-standard-8 / 8 32 GB PD-SSD 1 TB 6 TB
ZooKeeper (up to LogScale 1.107) 3 4 8GB PD-SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16GB PD-SSD 64 GB 150 GB
Large - 10 TB/Day Ingestion
Software Instances EC2 Instance Type/vCPU Memory Storage Total Storage
LogScale 12 n-standard-32 / 32 128 GB NVME 6 TB (16x375GB) 72 TB
Kafka 6 n-standard-8 / 8 32 GB PD-SSD 1 TB 6 TB
ZooKeeper (up to LogScale 1.107) 3 4 8GB PD-SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16GB PD-SSD 64 GB 150 GB
X-Large - 30 TB/Day Ingestion
Software Instances EC2 Instance Type/vCPU Memory Storage Total Storage
LogScale 30 n-standard-64 / 64 256 GB NVME 7.5 TB (16x375GB) 225 TB
Kafka 9 n-standard-8 / 8 32 GB PD-SSD 1.5 TB 13.5 TB
ZooKeeper (up to LogScale 1.107) 3 4 8GB PD-SSD 50 GB 150 GB
KRaft controller (after LogScale 1.107) 3 4 16GB PD-SSD 64 GB 150 GB

Deploy GCP Resources and LogScale

Step-by-step instructions for deploying LogScale on Google Cloud Platform (GCP) using Terraform, including the setup of required GCP resources like Compute Engine, GCS, IAM, Kubernetes Engine, and VPC Networks. The deployment process covers cloning necessary repositories, configuring Terraform state storage, deploying GCP infrastructure, setting up cluster credentials, and installing LogScale components with proper licensing and public URL configuration.

In order to deploy the following Terraform you will need sufficient permissions, the Terraform will require access to create, modify, and delete resources in the following GCP services:

  • Compute Engine

  • GCS

  • IAM

  • Kubernetes Engine

  • VPC Networks

Deploy GCP Infrastructure
  1. Clone the logscale-gcp and logscale-gcp-components repositories:

    shell
    $ mkdir logscale-gcp-example
    $ cd logscale-gcp-example
    $ git clone https://github.com/CrowdStrike/logscale-gcp.git
    $ git clone https://github.com/CrowdStrike/logscale-gcp-components.git

  2. Create a bucket to store the Terraform state in the region you will deploy GKE cluster. By default the Terraform assumes the region is us-east1, this can be changed via the _override.tf file or by overriding it on the command line in further commands.

    shell
    $ gcloud storage buckets create gs://UNIQUE_PREFIX-logscale-terraform-state-v1 --location=us-east1

  3. Update the backend.tf files to set the bucket used for Terraform state to the bucket created in step 2. The files are located in the gcp directory for both the logscale-gcp and logscale-gcp-components repositories.

    hcl
    # Terraform State Bucket and Prefix
      terraform {
        backend "gcs" {
          bucket = "UNIQUE_PREFIX-logscale-terraform-state-v1"
          prefix = "logscale/gcp/terraform/tf.state"
        }
      }

  4. Deploy the GCP resources required for LogScale

    shell
    $ cd logscale-gcp/gcp
    $ terraform init
    $ terraform apply -var project_id=google-projectid-XXXXXX -var logscale_cluster_type=basic -var logscale_cluster_size=xsmall 
    # Example output:
    # Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
    # 
    # Outputs:
    # 
    # bastion_hostname = "logscale-XXXX-bastion"
    # bastion_ssh = "gcloud compute ssh logscale-XXXX-bastion --project=google-projectid-XXXXXX --zone=us-central1-a  --tunnel-through-iap"
    # bastion_ssh_proxy = "gcloud compute ssh logscale-XXXX-bastion --project=google-projectid-XXXXXXX --zone=us-central1-a  --tunnel-through-iap --ssh-flag=\"-4 -L8888:localhost:8888 -N -q -f\""
    # gce-ingress-external-static-ip = "24.14.19.33"
    # .......

  5. Once the Terraform is supplied the credentials for the GKE cluster must be downloaded. The Terraform output contains the gcloud command to download the credentials.

    shell
    $ terraform output | grep gke_credential_command
    # 
    # Example output:
    # gke_credential_command = "gcloud container clusters get-credentials logscale-XXXX-gke --region us-central1 --project google-projectid-XXXXXX"
    # Run the gcloud command
    $ gcloud container clusters get-credentials logscale-XXXX-gke --region us-central1 --project google-projectid-XXXXXX
    # Examples output:
    # Fetching cluster endpoint and auth data.
    # kubeconfig entry generated for logscale-XXXX-gke
    
    # Next, configure kubetctl to use the new kubeconfig entry 
    $kubectl config use-context gke_google-projectid-111111_us-central1_logscale-XXXX-gke

  6. By default the Terraform will create a bastion host to facilitate access to the GKE cluster, if you have disabled the bastion in favor of a VPN or other access means you can skip this step.

    Grep the bastion SSH proxy command from the Terraform output.

    shell
    $ terraform output | grep bastion_ssh_proxy
    # Example output:
    # bastion_ssh_proxy = "gcloud compute ssh logscale-XXXX-bastion --project=google-projectid-111111 --zone=us-central1-a  --tunnel-through-iap --ssh-flag=\"-4 -L8888:localhost:8888 -N -q -f\""

    Run the command and set the HTTPS_PROXY environmental variable:

    shell
    $ gcloud compute ssh logscale-XXXX-bastion --project=google-projectid-111111 --zone=us-central1-a  --tunnel-through-iap --ssh-flag="-4 -L8888:localhost:8888 -N -q -f"
    # User the SSH proxy in your terminal
    $ export HTTPS_PROXY=localhost:8888

  7. Verify connectivity to the GKE cluster by listing the pods.

    shell
    $ kubectl get pods -A
    # Example output:
    # NAMESPACE     NAME                                                             READY   STATUS    RESTARTS   AGE
    # gmp-system    collector-6rls7                                                  2/2     Running   0          46m
    # gmp-system    collector-bbwql                                                  2/2     Running   0          45m
    # gmp-system    collector-c9z8c                                                  2/2     Running   0          46m
    # gmp-system    collector-d4ltf                                                  2/2     Running   0          45m
    # gmp-system    collector-g77mx                                                  2/2     Running   0          46m
    # gmp-system    collector-mmmc7                                                  2/2     Running   0          45m
    # gmp-system    collector-qfpx4                                                  2/2     Running   0          46m
    # gmp-system    collector-rhm48                                                  2/2     Running   0          45m
    # gmp-system    collector-w77c8                                                  2/2     Running   0          46m
    # .......

  8. Next LogScale will be deployed using the logscale-gcp-components repository. Supplying the LogScale license key and public URL for the cluster is required.

    shell
    $ cd ../../logscale-gcp-components/gcp
    $ terraform init
    $ export TF_VAR_humiocluster_license="YOUR LICENSE KEY"
    $ terraform apply -var project_id=google-projectid-XXXXXX  -target null_resource.install_cert_manager -target=helm_release.strimzi_operator -target=null_resource.humio_operator_crds
    # Example output:
    # │ Warning: Applied changes may be incomplete
    # │ 
    # │ The plan was created with the -target option in effect, so some changes requested in the configuration may have been ignored and the output values may not be fully updated. Run the following command to verify that no other changes are pending:
    # │     terraform plan
    # │ 
    # │ Note that the -target option is not suitable for routine use, and is provided only for exceptional situations such as recovering from errors or mistakes, or when Terraform specifically suggests to use it as part of an error message.
    #
    # Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
    #
    # Apply the remaining terraform
    $ terraform apply -var project_id=google-projectid-XXXXXX -var logscale_cluster_type=basic -var logscale_cluster_size=xsmall -var public_url=logscale.mycompany.com 
    # Example output
    # kubernetes_manifest.humio_cluster_type_basic[0]: Creating...
    # kubernetes_manifest.humio_cluster_type_basic[0]: Creation complete after 3s
    # kubernetes_service.logscale_basic_nodeport[0]: Creating...
    # kubernetes_ingress_v1.logscale_basic_ingress[0]: Creating...
    # kubernetes_ingress_v1.logscale_basic_ingress[0]: Creation complete after 0s [id=logging/logscale-XXXX-basic-ingress]
    # kubernetes_service.logscale_basic_nodeport[0]: Creation complete after 0s [id=logging/logscale-XXXX-nodeport]
    # kubernetes_manifest.logscale_basic_ingress_backend[0]: Creating...
    # kubernetes_manifest.logscale_basic_ingress_backend[0]: Creation complete after 1s
    
    # Apply complete! Resources: 19 added, 0 changed, 0 destroyed.

  9. Check the status of the pods by running:

    shell
    $ kubectl get humiocluster -n logging
    # Example output:
    # humio-operator-945b5845f-hld8b              1/1     Running   0          5m
    # logscale-XXXX-core-ckjksp                   3/3     Running   0          100s
    # logscale-XXXX-core-dninhg                   3/3     Running   0          100s
    # logscale-XXXX-core-zdevoe                   3/3     Running   0          99s
    # logscale-XXXX-strimzi-kafka-kafka-0         1/1     Running   0          6m
    # logscale-XXXX-strimzi-kafka-kafka-1         1/1     Running   0          6m
    # logscale-XXXX-strimzi-kafka-kafka-2         1/1     Running   0          6m
    # logscale-XXXX-strimzi-kafka-zookeeper-0     1/1     Running   0          6m
    # logscale-XXXX-strimzi-kafka-zookeeper-1     1/1     Running   0          6m
    # logscale-XXXX-strimzi-kafka-zookeeper-2     1/1     Running   0          6m
    # strimzi-cluster-operator-86948f6756-9nj4p   1/1     Running   0          6m

    Check the status of the HumioCluster by running:

    shell
    $ kubectl get humiocluster -n logging
    # NAME            STATE         NODES   VERSION
    # logscale-XXXX   Running

    Initially the cluster will go into the state Bootstrapping as it starts up, but once it starts all nodes it will go into the state of Running.

Accessing the Deployed GCP LogScale Instance

Learn how to access a deployed LogScale instance on Google Cloud Platform (GCP) by configuring DNS records and logging into the system. After obtaining the IP address from the ingress resource and setting up appropriate DNS records, users can access the LogScale cluster using admin credentials, where the password can be retrieved through a specific kubectl command.

To access the deployed cluster:

Configure DNS

To access the Humio cluster a DNS record needs to be created that points to the IP created by the ingress resource. The hosname will be the public URL specifide in previous steps. Use kubectl to look at the ingress resource:

shell
$ kubectl -n logging get ingress
# example output
# name                          class    hosts   address      ports   age
# logscale-xxxxx-basic-ingress   none   *      24.14.19.33  80      20m

Configuring the DNS record depends on your DNS provider. In most cases an A record should be created that points to the public URL.

Logging In

Once the DNS records exist, you can now open the public URL in a browser and login. Since we are using static authentication mode, the username will be admin and the password can be obtained by running:

shell
$ kubectl get secret logscale-XXXX-static-users  -n logging -o=template --template={{.data.users}} | base64 -D
# Example output: 
# Depending on the shell this might include a % sign, omit it when you copy the password.
# admin:XXXXXXXXXXXX%

Note

This command uses base64 -D, but you may need to use base64 --decode if using Linux.

Testing GCP Deployment

Step-by-step instructions for testing a GCP deployment by sending data to a LogScale cluster, including the creation of repositories and ingest tokens. This page details how to configure and install Fluentbit in a Kubernetes environment to gather pod logs, and explains the verification process to ensure proper log ingestion through the LogScale UI.

Once deployed, the deployment should be tested:

Sending Data to the Cluster

To send data to the cluster, we will create a new repository, obtain the ingest token, and then configure fluentbit to gather logs from all the pods in our Kubernetes cluster and send them to LogScale.

  1. Create a repo using LogScale UI

    Click on Add new button and create a new repo

    Screenshot showing creation of a new repository
  2. Create an ingest token

    Go to the test repo you've created and in the settings tab, select Ingest tokens and create a new Ingest token with any available parsers.

    Screenshot showing creation of a new token
Ingest Logs to the Cluster

LogScale recommends using the Falcon LogScale Collector for ingesting data. The example below uses fluentbit to perform a simple connectivity test.

Now we'll install fluentbit into the Kubernetes cluster and configure the endpoint to point to our $INGRESS_HOSTNAME, and use the $INGEST_TOKEN that was just created.

shell
$ helm repo add humio https://github.com/humio/humio-helm-charts/
$ helm repo update

Using a text editor, create a file named, humio-agent.yaml and copy the following lines into it:

yaml
humio-fluentbit:
  enabled: true
  humioHostname: $INGRESS_ES_HOSTNAME
  es:
    tls: true
    port: 443
    inputConfig: |-
      [INPUT]
           Name             tail
           Path             /var/log/containers/*.log
           Parser           docker
           # The path to the DB file must be unique and
           # not conflict with another fluentbit running on the same nodes.
           DB               /var/log/flb_kube.db
           Tag              kube.*
           Refresh_Interval 5
           Mem_Buf_Limit    512MB
           Skip_Long_Lines  On
    resources:
      limits:
        cpu: 100m
        memory: 1024Mi
      requests:
        cpu: 100m
        memory: 512Mi

Now configure this with helm:

shell
$ helm install test humio/humio-helm-charts \ --namespace logging \ --set humio-fluentbit.token=$INGEST_TOKEN \ --values humio-agent.yaml

Verify logs are ingested:

  • Go to the LogScale UI and click on the quickstart-cluster-logs repository

  • In the search field, enter:

    logscale
    "kubernetes.container_name" = "humio-operator"
  • Verify you can see the Humio Operator logs

GCP Deployment Cleanup

The cleanup process for a GCP deployment includes infrastructure removal using Terraform destroy commands and manual deletion steps. The process involves executing specific commands from the logscale-gcp-components directory, removing DNS records, and deleting the GCS bucket named 'logscale-terraform-state' through the console.

  1. It's possible to run an individual kubectl delete on each resource or you can run terraform destroy

    You'll need to run the following commands:

    shell
    $ cd logscale-gcp-components/gcp
    $ terraform destroy
    Do you really want to destroy all resources?
      Terraform will destroy all your managed infrastructure, as shown above.
      There is no undo. Only 'yes' will be accepted to confirm.
     
      Enter a value: yes
  2. Delete the DNS records which were created for $INGRESS_HOSTNAME

  3. Delete the GCS bucket - logscale-terraform-state using the console