Best Practice: Humio Installation using AWS and Kubernetes

Last Updated: 2022-07-21

The following explains how to set up a Humio cluster using the Humio Operator.

As part of the Quick Start, we will create AWS resources, including:

  • MSK (Amazon's managed Kafka solution)

  • EKS (Amazon's managed Kubernetes solution)

  • S3 bucket (Amazon's object storage service used for large amounts of data)

We will perform the installation using a tool called terraform for automating the process as much as possible. Finally, we will install the Humio Operator using Helm. (Helm is a package manager for Kubernetes environments).

Sizing of the cluster (which will provide input to editing the variables.tf and humiocluster.yaml files mentioned further down in this document) can be estimated using the following guide: AWS EKS

Prerequisites

The following tools are used during installation:

  • Terraform 1.1.0+ (Tool for automating the setup and configuration of cloud infrastructure)

  • kubectl 1.22+ (command line tool for configuring Kubernetes)

  • aws cli 2+ (Unified tool for managing AWS services)

  • Helm v3+ (Package manager for Kubernetes environments)

Authentication & Permissions

Ensure you are logged into the AWS through the terminal and have the necessary permissions to create resources such as EKS and MSK clusters and S3 buckets. For additional AWS authentication options, see the ??? section of the terraform AWS provider documentation.

Create AWS Resources

The following will create an EKS cluster with three nodes by default, an MSK cluster with three nodes by default, an S3 bucket where the Humio data will be stored, and a number of dependent resources such as a VPC (Virtual Private Cloud), subnets, security groups and an internet gateway.

First, clone the operator quick-start repo where the terraform quick start files are stored:

shell
git clone https://github.com/humio/humio-operator-quickstartcd humio-operator-quickstart/aws

Note

Review the default values in the variables.tf file. It s possible to overwrite these, but be careful as changing some may have undesirable effects. A common change may be overwriting region, but changing instance types for example will have downstream consequences such as when setting the resources for the HumioCluster.

Make sure to change the number of Humio nodes, Kafka nodes and instance types based on the needs you have. Typically the result of using the sizing guide: Humio Sizing AWS(remember to also modify the node count in the humiocluster.yaml file to reflect the number of Humio nodes you need).

And then init and apply terraform:

shell
terraform init
terraform apply

Creating the cluster can take some time. Good time for a coffee break!

Once the terraform resources have been applied, configure kubectl to point to the newly created EKS cluster:

shell
export KUBECONFIG=$PWD/kubeconfigaws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name)

And then verify you can authenticate with the EKS cluster and see pods:

shell
kubectl get pods -A

Output should look something like this

shell
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-67wq8 1/1 Running 0 21m
kube-system aws-node-ghbrb 1/1 Running 0 20m
kube-system aws-node-pn8b6 1/1 Running 0 21m
kube-system coredns-657694c6f4-dpgzm 1/1 Running 0 28m
kube-system coredns-657694c6f4-nkxgk 1/1 Running 0 28m
kube-system kube-proxy-h7gfh 1/1 Running 0 22m
kube-system kube-proxy-mzhx9 1/1 Running 0 22m
kube-system kube-proxy-trrs8 1/1 Running 0 22m

Install Humio Operator Dependencies

It is necessary to have both cert-manager and the nginx-ingress controller if running the Humio Operator with TLS and/or ingress enabled. TLS will encrypt data between the browser and Humio using certificates issued by the cert manager. Ingress/nginx handles IP configurations so that the cluster is accessible when these change.

Install Cert Manager

Installing the certificates manager:

shell
kubectl create namespace cert-managerhelm repo add jetstack https://charts.jetstack.iohelm repo updatehelm install cert-manager jetstack/cert-manager \
 --namespace cert-manager \
 --version v1.8.0 \
 --set installCRDs=truehelm upgrade --install cert-manager cert-manager \
 --repo https://charts.jetstack.io \
 --version v1.8.0 \
 --set installCRDs=true \
 --namespace cert-manager --create-namespace

Once cert manager is installed, create a clusterissuer which will be used to issue the certificates for our Humio cluster:

shell
export MY_EMAIL=your email address

Run the following command:

shell
cat > clusterissuer.yaml<<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
 name: letsencrypt-prod
spec:
 acme:
 server: https://acme-v02.api.letsencrypt.org/directory
 email: $MY_EMAIL
 privateKeySecretRef:
 name: letsencrypt-prod
 solvers:
 - http01:
 ingress:
 class: nginx
EOF

Next, execute the following:

shell
kubectl apply -f clusterissuer.yaml
Install the Nginx Ingress Controller
shell
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/aws/deploy.yamlhelm upgrade --install ingress-nginx ingress-nginx \
 --repo https://kubernetes.github.io/ingress-nginx \
 --namespace ingress-nginx --create-namespace
Install the Humio Operator

Now that you have authenticated with the EKS cluster, it's time to create the Humio Operator. The installCRDs line tells Kubernetes that we will be adding our own custom resources used by Humio

shell
helm upgrade --install humio-operator humio-operator \
 --repo https://humio.github.io/humio-operator \
 --set installCRDs=true \
 --namespace logging --create-namespace

You can check the status of the Humio Operator pod by running:

shell
kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
humio-operator-548b8d587-mgd4b 1/1 Running 0 32s
Prepare for Creating Humio Cluster

Before creating a cluster, we need to set a number of attributes specific to the cluster. We will set these as environment variables and then reference them later when creating the HumioCluster spec.

First, generate an encryption key that will be used by Humio to encrypt the data in the S3 bucket.

shell
kubectl create secret generic bucket-storage --from-literal=encryption-key=$(openssl rand -hex 64) -n logging

Also create a developer user password which we will use to login once the Humio cluster is up. By default we will start Humio in single-user mode.

shell
kubectl create secret generic developer-user --from-literal=password=$(openssl rand -hex 16) -n logging

We will need the connection strings for Kafka and Zookeeper, as well as the name of the S3 bucket and Role ARN which has access to write to the bucket. We can obtain those from terraform:

shell
export KAFKA_BROKERS=$(terraform output bootstrap_brokers_tls)
export ZOOKEEPER_CONNECTION=$(terraform output zookeeper_connect_string)
export ROLE_ARN=$(terraform output oidc_role_arn)
export BUCKET_NAME=$(terraform output s3_bucket_name)

Additionally, we ll need to set hostnames for the HTTP and Elasticsearch ingresses. Use your own domain here. In order to use ingress with Let s Encrypt encryption, a DNS record must be created later in this process.

shell
export INGRESS_HOSTNAME=humio-quickstart.example.com
export INGRESS_ES_HOSTNAME=humio-quickstart-es.example.com

Also set the region:

shell
export REGION=us-west-2

Note

If you adjusted the region in the variables.tf file then you must use the same one in the command above.

Add license secret:

shell
kubectl create secret generic humio-quickstart-license --namespace logging --from-literal=data=license

You should have obtained a valid Humio license before performing this step. Replace license with the one you received.

Create a Humio Cluster

Finally, we can configure a yaml file which contains the HumioCluster spec. Run the following command to create a file named humiocluster.yaml with the desired HumioCluster spec:

shell
cat > humiocluster.yaml <<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
 name: humio-quickstart
 namespace: logging
spec:
 image: "humio/humio-core:1.38.0"
 license:
 secretKeyRef:
 name: humio-quickstart-license
 key: data
 nodeCount: 3
 targetReplicationFactor: 2
 storagePartitionsCount: 24
 digestPartitionsCount: 720
 extraKafkaConfigs: "security.protocol=SSL"
 tls:
 enabled: true
 autoRebalancePartitions: true
 hostname: ${INGRESS_HOSTNAME}
 esHostname: ${INGRESS_ES_HOSTNAME}
 ingress:
 enabled: true
 controller: nginx
 annotations:
 use-http01-solver: "true"
 cert-manager.io/cluster-issuer: letsencrypt-prod
 kubernetes.io/ingress.class: nginx
 resources:
 limits:
 cpu: "2"
 memory: 12Gi
 requests:
 cpu: "1"
 memory: 6Gi
 affinity:
 podAntiAffinity:
 requiredDuringSchedulingIgnoredDuringExecution:
 - labelSelector:
 matchExpressions:
 - key: app.kubernetes.io/name
 operator: In
 values:
 - humio
 topologyKey: kubernetes.io/hostname
 dataVolumeSource:
 hostPath:
 path: "/mnt/disks/vol1"
 type: "Directory"
 humioServiceAccountAnnotations:
 eks.amazonaws.com/role-arn: ${ROLE_ARN}
 environmentVariables:
 - name: S3_STORAGE_BUCKET
 value: ${BUCKET_NAME}
 - name: S3_STORAGE_REGION
 value: ${REGION}
 - name: LOCAL_STORAGE_PERCENTAGE
 value: "80"
 - name: LOCAL_STORAGE_MIN_AGE_DAYS
 value: "7"
 - name: S3_STORAGE_ENCRYPTION_KEY
 valueFrom:
 secretKeyRef:
 name: bucket-storage
 key: encryption-key
 - name: USING_EPHEMERAL_DISKS
 value: "true"
 - name: S3_STORAGE_PREFERRED_COPY_SOURCE
 value: "true"
 - name: SINGLE_USER_PASSWORD
 valueFrom:
 secretKeyRef:
 name: developer-user
 key: password
 - name: HUMIO_JVM_ARGS
 value: -Xss2m -Xms2g -Xmx6g -server -XX:MaxDirectMemorySize=6g -XX:+UseParallelGC -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc+jni=debug:stdout -Dakka.log-config-on-start=on -Xlog:gc*:stdout:time,tags -Dzookeeper.client.secure=false
 - name: "ZOOKEEPER_URL"
 value: ${ZOOKEEPER_CONNECTION}
 - name: "KAFKA_SERVERS"
 value: ${KAFKA_BROKERS}
EOF

Nodecount should be set to the number of Humio nodes you decided on using the sizing guide (see Humio Sizing AWS)

And then apply it:

shell
kubectl apply -f humiocluster.yaml
Validate the Humio Cluster

Check the status of the HumioCluster by running:

shell
kubectl get humiocluster -n logging

Initially the cluster will go into the state Bootstrapping as it starts up, but once it starts all nodes it will go into the state of Running.

Access the Humio Cluster
Configure DNS

To access the HumioCluster as well as allow cert-manager to generate a valid certificate for the cluster, there must be a DNS record added for $INGRESS_HOSTNAME as well as $INGRESS_ES_HOSTNAME which point to the NLB (Network Load Balancer) name of the ingress service. To get the NLB name of the ingress service, run:

shell
export INGRESS_SERVICE_HOSTNAME=$(kubectl get service ingress-nginx-controller -n ingress-nginx -o template --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")

Configuring the DNS record depends on your DNS provider. If using AWS Route53, create an Alias record which points both names directly to the INGRESS_SERVICE_HOSTNAME. For other providers, create a CNAME which points them to the INGRESS_SERVICE_HOSTNAME.

Logging In

Once the DNS records exist, you can now open https://${INGRESS_HOSTNAME} in a browser and login. Since we are using single-user authentication mode, the username will be developer and the password can be obtained by running:

shell
kubectl get secret developer-user -n logging -o=template --template={{.data.password}} | base64 -D

Note

This command uses base64 -D, but you may need to use base64 --decode if using linux.

Sending Data to the Cluster

To send data to the cluster, we will create a new Repository, obtain the ingest token, and then configure fluentbit (a tool for shipping logs) to gather logs from all the pods in our Kubernetes cluster and send them to Humio.

Create Repo, Parser and Ingest Token

Create the Repository using the Humio Operator by running the following. Using a simple text editor, create a file named, humiorepository.yaml and copy the following lines into it:

ini
cat > humiorepository.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioRepository
metadata:
 name: quickstart-cluster-logs
 namespace: logging
spec:
 managedClusterName: humio-quickstart
 name: quickstart-cluster-logs
 description: "Cluster logs repository"
 retention:
 timeInDays: 30
 ingestSizeInGB: 50
 storageSizeInGB: 10
 EOF
shell
kubectl apply -f humiorepository.yaml

Next, create a parser which will be assigned to the repository and later on to the ingest token. It is also possible to skip this step and rely on one of the built-in parsers. Using a simple text editor, create a file named, humioparser.yaml and copy the following lines into it:

ini
cat > humioparser.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioParser
metadata:
 name: quickstart-cluster-parser
 namespace: logging
spec:
 managedClusterName: humio-quickstart
 name: quickstart-cluster-parser
 repositoryName: quickstart-cluster-logs
 parserScript: |
 case {
 kubernetes.pod_name=/fluentbit/
 | /\[(?<@timestamp>[^\]]+)\]/
 | /^(?<@timestamp>.*)\[warn\].*/
 | parseTimestamp(format="yyyy/MM/dd' 'HH:mm:ss", field=@timestamp);
 parseJson();
 * | kvParse()
 }
EOF
shell
kubectl apply -f humioparser.yaml

Now create an Ingest Token using the Humio Operator and assign it to the repository and use the parser that was created in the previous steps. Using a simple text editor, create a file named, humioingesttoken.yaml and copy the following lines into it:

ini
cat > humioingesttoken.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioIngestToken
metadata:
 name: quickstart-cluster-ingest-token
 namespace: logging
spec:
 managedClusterName: humio-quickstart
 name: quickstart-cluster-ingest-token
 repositoryName: quickstart-cluster-logs
 parserName: quickstart-cluster-parser
 tokenSecretName: quickstart-cluster-ingest-token
EOF
shell
kubectl apply -f humioingesttoken.yaml

Since we set tokenSecretName in the Ingest Token spec, the token content is stored as a secret in Kubernetes. We can then fetch the token:

shell
export INGEST_TOKEN=$(kubectl get secret quickstart-cluster-ingest-token -n logging -o template --template '{{.data.token}}' | base64 -D)

Note

This command uses base64 -D, but you may need to use base64 --decode if using linux.

Ingest Logs into the Cluster

Now we ll install fluentbit into the Kubernetes cluster and configure the endpoint to point to our $INGRESS_ES_HOSTNAME, and use the $INGEST_TOKEN that was just created.

shell
helm repo add humio https://humio.github.io/humio-helm-charts
helm repo update

Using a simple text editor, create a file named, humio-agent.yaml and copy the following lines into it:

ini
cat > humio-agent.yaml <<EOF
humio-fluentbit:
 enabled: true
 humioHostname: $INGRESS_ES_HOSTNAME
 es:
 tls: true
 port: 443
 inputConfig: |-
 [INPUT]
 Name tail
 Path /var/log/containers/*.log
 Parser docker
 # The path to the DB file must be unique and
 # not conflict with another fluentbit running on the same nodes.
 DB /var/log/flb_kube.db
 Tag kube.*
 Refresh_Interval 5
 Mem_Buf_Limit 512MB
 Skip_Long_Lines On
 resources:
 limits:
 cpu: 100m
 memory: 1024Mi
 requests:
 cpu: 100m
 memory: 512Mi
EOF
shell
helm upgrade --install fluentbit humio-helm-charts \
 --repo https://humio.github.io/humio-helm-charts \
 --set humio-fluentbit.token=$INGEST_TOKEN \
 --namespace logging --create-namespace \
 --values humio-agent.yaml
Verify Logs are Ingested

Go to the Humio UI and click on the quickstart-cluster-logs repository

In the search field, enter:

humio
"kubernetes.container_name" = "humio-operator"

Then click Run.

Verify you can see the Humio Operator logs

Cleanup

If you have set this up to test the installation and you want to remove the Humio cluster you just built then use the following procedure to clean up:

It's possible to run an individual kubectl delete on each resource, but since we have created a dedicated EKS cluster, we will delete everything we just created by deleting the cluster resource and then running terraform destroy.

First, delete the cluster so pods no longer write to the S3 bucket:

shell
kubectl delete -f humiocluster.yaml

Prior to running terraform destroy, it will be necessary to ensure the S3 bucket that was created by terraform is emptied.

Empty the S3 bucket by using the following command:

shell
aws s3 rm s3://$BUCKET_NAME/ --recursive

Next we ll need to ensure the nginx-ingress-controller service is removed. This way the NLB will be removed from AWS. If this is not done, terraform will get stuck deleting the subnets when performing a terraform destroy:

shell
helm -n ingress-nginx delete ingress-nginx

Once the bucket has been emptied and the nginx-ingress-controller has been deleted, run:

shell
terraform destroy

Also delete the DNS records which were created for $INGRESS_HOSTNAME and $INGRESS_ES_HOSTNAME.