AWS Cloud Reference Architecture

The following explains how to quickly set up a LogScale cluster using the Humio Operator.

As part of the Quick Start, we will create AWS resources such as MSK, EKS and S3 bucket using terraform, and then install the LogScale Operator using helm. For production installations, it is recommended to follow the full Installation Guide and decide how running LogScale fits into your infrastructure.

Prerequisites
Tooling
Authentication & Permissions

Ensure you are logged into the AWS through the terminal and have the necessary permissions to create resources such as EKS and MSK clusters and S3 buckets. For additional AWS authentication options, see the authentication section of the terraform AWS provider documentation.

When authenticating with kubectl later in the doc, it will expect that the aws-iam-authenticator is installed, and it will use the above AWS authentication.

Create AWS Resources

The following will create an EKS cluster with three nodes by default, an MSK cluster with three nodes by default, an S3 bucket where the LogScale data will be stored, and a number of dependent resources such as a VPC, subnets, security groups and an internet gateway.

First, clone the operator quick-start repo where the terraform quick start files are stored:

shell
$ git clone https://github.com/humio/humio-operator-quickstart
$ cd humio-operator-quickstart/aws

Note, review the default values in the variables.tf file. It's possible to overwrite these, but be careful as changing some may have undesirable effects. A common change may be overwriting region, but changing instance types for example will have downstream consequences such as when setting the resources for the HumioCluster.

And then init and apply terraform:

shell
$ terraform init
$ terraform apply

Once the terraform resources have been applied, configure kubectl to point to the newly created EKS cluster:

shell
$ export KUBECONFIG=$PWD/kubeconfig
$ aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name)

And then verify you can authenticate with the EKS cluster and see pods:

shell
$ kubectl get pods -A
Install Humio Operator Dependencies

It is necessary to have both cert-manager and the nginx-ingress controller if running the Humio Operator with TLS and/or ingress enabled.

Install Cert Manager
shell
$ kubectl create namespace cert-manager
$ helm repo add jetstack https://charts.jetstack.io
$ helm repo update
$ helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v1.13.2

Once cert manager is installed, create a clusterissuer which will be used to issue the certs for our LogScale cluster:

shell
$ export MY_EMAIL=<your email address>

Create the clusterissuer.yaml file with the following content:

yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: $MY_EMAIL
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Next, execute the following:

shell
kubectl apply -f clusterissuer.yaml
Install the Nginx Ingress Controller
shell
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/aws/deploy.yaml
Install the Humio Operator

Now that you have authenticated with the EKS cluster, it's time to create the Humio Operator.

shell
$ kubectl create namespace logging
$ helm repo add humio-operator https://humio.github.io/humio-operator
$ helm repo update
$ helm install humio-operator humio-operator/humio-operator \
  --namespace logging

You can check the status of the Humio Operator pod by running:

shell
$ kubectl get pods -n logging
Prepare for Creating LogScale Cluster

Before creating a cluster, we need set a number of attributes specific to the cluster. We will set these as environment variables and then reference them later when creating the HumioCluster spec.

First, generate an encryption key that will be used by LogScale to encrypt the data in the S3 bucket.

shell
$ kubectl create secret generic bucket-storage --from-literal="encryption-key=$(openssl rand -base64 64)" -n logging

Also create a developer user password which we will use to login once the LogScale cluster is up. By default we will start LogScale in single-user mode.

shell
$ kubectl create secret generic developer-user --from-literal="password=$(openssl rand -base64 16)" -n logging

We will need the connection strings for Kafka and ZooKeeper, as well as the name of the S3 bucket and Role ARN which has access to write to the bucket. We can obtain those from terraform:

shell
$ export KAFKA_BROKERS=$(terraform output bootstrap_brokers_tls)
$ export ZOOKEEPER_CONNECTION=$(terraform output zookeeper_connect_string)
$ export ROLE_ARN=$(terraform output oidc_role_arn)
$ export BUCKET_NAME=$(terraform output s3_bucket_name)

Additionally, we'll need to set hostnames for the HTTP and Elasticsearch ingresses. Use your own domain here. In order to use ingress with Let's Encrypt encryption, a DNS record must be created later in this process.

shell
$ export INGRESS_HOSTNAME=humio-quickstart.example.com
$ export INGRESS_ES_HOSTNAME=humio-quickstart-es.example.com

Also set the region:

shell
$ export REGION=us-west-2

Add license secret:

shell
$ kubectl create secret generic humio-quickstart-license --namespace logging --from-literal=data=<license>
Create a LogScale Cluster

Finally, we can configure a yaml file which contains the HumioCluster (known as HumioCluster) spec. Run the following command to create a file named humiocluster.yaml with the desired HumioCluster spec:

yaml
cat > humiocluster.yaml <<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
  name: humio-quickstart
  namespace: logging
spec:
  image: humio-core:1.124.0
  license:
    secretKeyRef:
      name: humio-quickstart-license
      key: data
  nodeCount: 3
  targetReplicationFactor: 2
  storagePartitionsCount: 24
  digestPartitionsCount: 24
  extraKafkaConfigs: "security.protocol=SSL"
  tls:
    enabled: true
  autoRebalancePartitions: true
  hostname: ${INGRESS_HOSTNAME}
  esHostname: ${INGRESS_ES_HOSTNAME}
  ingress:
    enabled: true
    controller: nginx
    annotations:
      use-http01-solver: "true"
      cert-manager.io/cluster-issuer: letsencrypt-prod
      kubernetes.io/ingress.class: nginx
  resources:
    limits:
      cpu: "2"
      memory: 12Gi
    requests:
      cpu: "1"
      memory: 6Gi
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - humio
        topologyKey: kubernetes.io/hostname
  dataVolumeSource:
    hostPath:
      path: "/mnt/disks/vol1"
      type: "Directory"
  humioServiceAccountAnnotations:
    eks.amazonaws.com/role-arn: ${ROLE_ARN}
  environmentVariables:
    - name: S3_STORAGE_BUCKET
      value: ${BUCKET_NAME}
    - name: S3_STORAGE_REGION
      value: ${REGION}
    - name: LOCAL_STORAGE_PERCENTAGE
      value: "80"
    - name: LOCAL_STORAGE_MIN_AGE_DAYS
      value: "7"
    - name: S3_STORAGE_ENCRYPTION_KEY
      valueFrom:
        secretKeyRef:
          name: bucket-storage
          key: encryption-key
    - name: USING_EPHEMERAL_DISKS
      value: "true"
    - name: S3_STORAGE_PREFERRED_COPY_SOURCE
      value: "true"
    - name: SINGLE_USER_USERNAME
      value: "admin"
    - name: SINGLE_USER_PASSWORD
      valueFrom:
        secretKeyRef:
          name: developer-user
          key: password
    - name: "ZOOKEEPER_URL"
      value: ${ZOOKEEPER_CONNECTION}
    - name: "KAFKA_SERVERS"
      value: ${KAFKA_BROKERS}
  EOF

And then apply it:

shell
$ kubectl apply -f humiocluster.yaml

Note

environmentVariables in HumioCluster is an array of corev1.EnvVar types, so each item in the array has the same capabilities as envvar-v1-core.

Validate the LogScale Cluster

Check the status of the HumioCluster by running:

shell
$ kubectl get humiocluster -n logging

Initially the cluster will go into the state Bootstrapping as it starts up, but once it starts all nodes it will go into the state of Running.

Access the LogScale Cluster
Configure DNS

To access the HumioCluster as well as allow cert-manager to generate a valid certificate for the cluster, there must be a DNS record added for $INGRESS_HOSTNAME as well as $INGRESS_ES_HOSTNAME which point to the NLB name of the ingress service. To get the NLB name of the ingress service, run:

shell
$ export INGRESS_SERVICE_HOSTNAME=$(kubectl get service ingress-nginx-controller -n ingress-nginx -o template --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")

Configuring the DNS record depends on your DNS provider. If using AWS Route53, create an Alias record which points both names directly to the INGRESS_SERVICE_HOSTNAME. For other providers, create a CNAME which points them to the INGRESS_SERVICE_HOSTNAME.

Logging In

Once the DNS records exist, you can now open https://${INGRESS_HOSTNAME} in a browser and login. Since we are using single-user authentication mode, the username will be the value of SINGLE_USER_USERNAME in the cluster spec, admin in the example, the password can be obtained by running:

shell
$ kubectl get secret developer-user -n logging -o=template --template={{.data.password}} | base64 -D

Note, this command uses base64 -D, but you may need to use base64 --decode if using linux.

Sending Data to the Cluster

To send data to the cluster, we will create a new repository, obtain the ingest token, and then configure fluentbit to gather logs from all the pods in our Kubernetes cluster and send them to LogScale.

Create Repo, Parser and Ingest Token

Create the repository using the Humio Operator by running the following. Using a simple text editor, create a file named, humiorepository.yaml and copy the following lines into it:

yaml
apiVersion: core.humio.com/v1alpha1
kind: HumioRepository
metadata:
  name: quickstart-cluster-logs
  namespace: logging
spec:
  managedClusterName: humio-quickstart
  name: quickstart-cluster-logs
  description: "Cluster logs repository"
  retention:
    timeInDays: 30
    ingestSizeInGB: 50
    storageSizeInGB: 10
shell
$ kubectl apply -f humiorepository.yaml

Next, create a parser which will be assigned to the repository and later on to the ingest token. It is also possible to skip this step and rely on one of the built-in parsers. Using a simple text editor, create a file named, humioparser.yaml and copy the following lines into it:

yaml
apiVersion: core.humio.com/v1alpha1
kind: HumioParser
metadata:
  name: quickstart-cluster-parser
  namespace: logging
spec:
  managedClusterName: humio-quickstart
  name: quickstart-cluster-parser
  repositoryName: quickstart-cluster-logs
  parserScript: |
    case {
      kubernetes.pod_name=/fluentbit/
        | /\[(?<@timestamp>[^\]]+)\]/
        | /^(?<@timestamp>.*)\[warn\].*/
        | parseTimestamp(format="yyyy/MM/dd' 'HH:mm:ss", field=@timestamp);
     parseJson();
      * | kvParse()
    }

Apply the changes:

shell
$ kubectl apply -f humioparser.yaml

Now create an Ingest Token using the Humio Operator and assign it to the repository and use the parser that were created in the previous steps. Using a simple text editor, create a file named, humioingesttoken.yaml and copy the following lines into it:

yaml
apiVersion: core.humio.com/v1alpha1
kind: HumioIngestToken
  metadata:
  name: quickstart-cluster-ingest-token
  namespace: logging
spec:
  managedClusterName: humio-quickstart
  name: quickstart-cluster-ingest-token
  repositoryName: quickstart-cluster-logs
  parserName: quickstart-cluster-parser
  tokenSecretName: quickstart-cluster-ingest-token

Then update the configuration:

shell
$ kubectl apply -f humioingesttoken.yaml

Since we set tokenSecretName in the Ingest Token spec, the token content is stored as a secret in Kubernetes. We can then fetch the token:

shell
$ export INGEST_TOKEN=$(kubectl get secret quickstart-cluster-ingest-token -n logging -o template --template '{{.data.token}}' | base64 -D)

Note

This command uses base64 -D, but you may need to use base64 --decode if using linux.

Ingest Logs into the Cluster

Now we'll install fluentbit into the Kubernetes cluster and configure the endpoint to point to our $INGRESS_ES_HOSTNAME, and use the $INGEST_TOKEN that was just created.

shell
$ helm repo add humio https://humio.github.io/humio-helm-charts
$ helm repo update

Using a simple text editor, create a file named, humio-agent.yaml and copy the following lines into it:

yaml
humio-fluentbit:
  enabled: true
  humioHostname: $INGRESS_ES_HOSTNAME
  es:
    tls: true
    port: 443
    inputConfig: |-
      [INPUT]
           Name             tail
           Path             /var/log/containers/*.log
           Parser           docker
           # The path to the DB file must be unique and
           # not conflict with another fluentbit running on the same nodes.
           DB               /var/log/flb_kube.db
           Tag              kube.*
           Refresh_Interval 5
           Mem_Buf_Limit    512MB
           Skip_Long_Lines  On
    resources:
      limits:
        cpu: 100m
        memory: 1024Mi
      requests:
        cpu: 100m
        memory: 512Mi
shell
$ helm install humio humio/humio-helm-charts \
  --namespace logging \
  --set humio-fluentbit.token=$INGEST_TOKEN \
  --values humio-agent.yaml
Verify Logs are Ingested
  • Go to the LogScale UI and click on the quickstart-cluster-logs repository

  • In the search field, enter "kubernetes.container_name" = "humio-operator" and click Run

  • Verify you can see the Humio Operator logs

Cleanup

It's possible to run an individual kubectl delete on each resource, but since we have created a dedicated EKS cluster, we will delete everything we just created by deleting the cluster resource and then running terraform destroy.

First, delete the cluster so pods no longer write to the S3 bucket:

shell
$ kubectl delete -f humiocluster.yaml

Prior to running terraform destroy, it will be necessary to ensure the S3 bucket that was created by terraform is emptied. The name of the S3 bucket can be obtained by running:

shell
$ terraform output s3_bucket_name

Now empty the S3 bucket either through the AWS console or CLI.

Next we'll need to ensure the nginx-ingress-controller's service is removed. This way the NLB will be removed from AWS. If this is not done, terraform will get stuck deleting the subnets when performing a terraform destroy:

shell
$ kubectl delete -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.35.0/deploy/static/provider/aws/deploy.yaml

Once the bucket has been emptied and the nginx-ingress-controller has been deleted, run:

shell
$ terraform destroy

Also delete the DNS records which were created for $INGRESS_HOSTNAME and $INGRESS_ES_HOSTNAME.