Best Practice: Humio Installation using AWS and Kubernetes
Last Updated: 2022-07-21
The following explains how to set up a Humio cluster using the Humio Operator.
As part of the Quick Start, we will create AWS resources, including:
MSK (Amazon's managed Kafka solution)
EKS (Amazon's managed Kubernetes solution)
S3 bucket (Amazon's object storage service used for large amounts of data)
We will perform the installation using a tool called terraform for automating the process as much as possible. Finally, we will install the Humio Operator using Helm. (Helm is a package manager for Kubernetes environments).
Sizing of the cluster (which will provide input to editing the
variables.tf
and
humiocluster.yaml
files mentioned further down in
this document) can be estimated using the following guide:
AWS EKS
Prerequisites
The following tools are used during installation:
Authentication & Permissions
Ensure you are logged into the AWS through the terminal and have the necessary permissions to create resources such as EKS and MSK clusters and S3 buckets. For additional AWS authentication options, see the ??? section of the terraform AWS provider documentation.
Create AWS Resources
The following will create an EKS cluster with three nodes by default, an MSK cluster with three nodes by default, an S3 bucket where the Humio data will be stored, and a number of dependent resources such as a VPC (Virtual Private Cloud), subnets, security groups and an internet gateway.
First, clone the operator quick-start repo where the terraform quick start files are stored:
git clone https://github.com/humio/humio-operator-quickstart
cd humio-operator-quickstart/aws
Note
Review the default values in the variables.tf
file. It s possible to overwrite these, but be careful as changing
some may have undesirable effects. A common change may be
overwriting region, but changing instance types for example will
have downstream consequences such as when setting the resources for
the HumioCluster.
Make sure to change the number of Humio nodes, Kafka nodes and
instance types based on the needs you have. Typically the result of
using the sizing guide: Humio Sizing AWS(remember to also modify the
node count in the humiocluster.yaml
file to
reflect the number of Humio nodes you need).
And then init and apply terraform:
terraform init
terraform apply
Creating the cluster can take some time. Good time for a coffee break!
Once the terraform resources have been applied, configure kubectl to point to the newly created EKS cluster:
export KUBECONFIG=$PWD/kubeconfig
aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name)
And then verify you can authenticate with the EKS cluster and see pods:
kubectl get pods -A
Output should look something like this
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system aws-node-67wq8 1/1 Running 0 21m
kube-system aws-node-ghbrb 1/1 Running 0 20m
kube-system aws-node-pn8b6 1/1 Running 0 21m
kube-system coredns-657694c6f4-dpgzm 1/1 Running 0 28m
kube-system coredns-657694c6f4-nkxgk 1/1 Running 0 28m
kube-system kube-proxy-h7gfh 1/1 Running 0 22m
kube-system kube-proxy-mzhx9 1/1 Running 0 22m
kube-system kube-proxy-trrs8 1/1 Running 0 22m
Install Humio Operator Dependencies
It is necessary to have both cert-manager and the nginx-ingress controller if running the Humio Operator with TLS and/or ingress enabled. TLS will encrypt data between the browser and Humio using certificates issued by the cert manager. Ingress/nginx handles IP configurations so that the cluster is accessible when these change.
Install Cert Manager
Installing the certificates manager:
kubectl create namespace cert-manager
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--version v1.8.0 \
--set installCRDs=true
helm upgrade --install cert-manager cert-manager \
--repo https://charts.jetstack.io \
--version v1.8.0 \
--set installCRDs=true \
--namespace cert-manager --create-namespace
Once cert manager is installed, create a clusterissuer which will be used to issue the certificates for our Humio cluster:
export MY_EMAIL=your email address
Run the following command:
cat > clusterissuer.yaml<<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: $MY_EMAIL
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF
Next, execute the following:
kubectl apply -f clusterissuer.yaml
Install the Nginx Ingress Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/aws/deploy.yaml
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace ingress-nginx --create-namespace
Install the Humio Operator
Now that you have authenticated with the EKS cluster, it's time to create the Humio Operator. The installCRDs line tells Kubernetes that we will be adding our own custom resources used by Humio
helm upgrade --install humio-operator humio-operator \
--repo https://humio.github.io/humio-operator \
--set installCRDs=true \
--namespace logging --create-namespace
You can check the status of the Humio Operator pod by running:
kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
humio-operator-548b8d587-mgd4b 1/1 Running 0 32s
Prepare for Creating Humio Cluster
Before creating a cluster, we need to set a number of attributes specific to the cluster. We will set these as environment variables and then reference them later when creating the HumioCluster spec.
First, generate an encryption key that will be used by Humio to encrypt the data in the S3 bucket.
kubectl create secret generic bucket-storage --from-literal=encryption-key=$(openssl rand -hex 64) -n logging
Also create a developer user password which we will use to login once the Humio cluster is up. By default we will start Humio in single-user mode.
kubectl create secret generic developer-user --from-literal=password=$(openssl rand -hex 16) -n logging
We will need the connection strings for Kafka and Zookeeper, as well as the name of the S3 bucket and Role ARN which has access to write to the bucket. We can obtain those from terraform:
export KAFKA_BROKERS=$(terraform output bootstrap_brokers_tls)
export ZOOKEEPER_CONNECTION=$(terraform output zookeeper_connect_string)
export ROLE_ARN=$(terraform output oidc_role_arn)
export BUCKET_NAME=$(terraform output s3_bucket_name)
Additionally, we ll need to set hostnames for the HTTP and Elasticsearch ingresses. Use your own domain here. In order to use ingress with Let s Encrypt encryption, a DNS record must be created later in this process.
export INGRESS_HOSTNAME=humio-quickstart.example.com
export INGRESS_ES_HOSTNAME=humio-quickstart-es.example.com
Also set the region:
export REGION=us-west-2
Note
If you adjusted the region in the
variables.tf
file then you must use the same
one in the command above.
Add license secret:
kubectl create secret generic humio-quickstart-license --namespace logging --from-literal=data=license
You should have obtained a valid Humio license before performing
this step. Replace license
with the one
you received.
Create a Humio Cluster
Finally, we can configure a yaml file which contains the
HumioCluster spec. Run the following command to create a file named
humiocluster.yaml
with the desired HumioCluster
spec:
cat > humiocluster.yaml <<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
name: humio-quickstart
namespace: logging
spec:
image: "humio/humio-core:1.38.0"
license:
secretKeyRef:
name: humio-quickstart-license
key: data
nodeCount: 3
targetReplicationFactor: 2
storagePartitionsCount: 24
digestPartitionsCount: 720
extraKafkaConfigs: "security.protocol=SSL"
tls:
enabled: true
autoRebalancePartitions: true
hostname: ${INGRESS_HOSTNAME}
esHostname: ${INGRESS_ES_HOSTNAME}
ingress:
enabled: true
controller: nginx
annotations:
use-http01-solver: "true"
cert-manager.io/cluster-issuer: letsencrypt-prod
kubernetes.io/ingress.class: nginx
resources:
limits:
cpu: "2"
memory: 12Gi
requests:
cpu: "1"
memory: 6Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- humio
topologyKey: kubernetes.io/hostname
dataVolumeSource:
hostPath:
path: "/mnt/disks/vol1"
type: "Directory"
humioServiceAccountAnnotations:
eks.amazonaws.com/role-arn: ${ROLE_ARN}
environmentVariables:
- name: S3_STORAGE_BUCKET
value: ${BUCKET_NAME}
- name: S3_STORAGE_REGION
value: ${REGION}
- name: LOCAL_STORAGE_PERCENTAGE
value: "80"
- name: LOCAL_STORAGE_MIN_AGE_DAYS
value: "7"
- name: S3_STORAGE_ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: bucket-storage
key: encryption-key
- name: USING_EPHEMERAL_DISKS
value: "true"
- name: S3_STORAGE_PREFERRED_COPY_SOURCE
value: "true"
- name: SINGLE_USER_PASSWORD
valueFrom:
secretKeyRef:
name: developer-user
key: password
- name: HUMIO_JVM_ARGS
value: -Xss2m -Xms2g -Xmx6g -server -XX:MaxDirectMemorySize=6g -XX:+UseParallelGC -XX:+UnlockDiagnosticVMOptions -XX:CompileCommand=dontinline,com/humio/util/HotspotUtilsJ.dontInline -Xlog:gc+jni=debug:stdout -Dakka.log-config-on-start=on -Xlog:gc*:stdout:time,tags -Dzookeeper.client.secure=false
- name: "ZOOKEEPER_URL"
value: ${ZOOKEEPER_CONNECTION}
- name: "KAFKA_SERVERS"
value: ${KAFKA_BROKERS}
EOF
Nodecount
should be set to the number of
Humio nodes you decided on using the sizing guide (see Humio Sizing
AWS)
And then apply it:
kubectl apply -f humiocluster.yaml
Validate the Humio Cluster
Check the status of the HumioCluster by running:
kubectl get humiocluster -n logging
Initially the cluster will go into the state Bootstrapping as it starts up, but once it starts all nodes it will go into the state of Running.
Access the Humio Cluster
Configure DNS
To access the HumioCluster as well as allow cert-manager to generate a valid certificate for the cluster, there must be a DNS record added for $INGRESS_HOSTNAME as well as $INGRESS_ES_HOSTNAME which point to the NLB (Network Load Balancer) name of the ingress service. To get the NLB name of the ingress service, run:
export INGRESS_SERVICE_HOSTNAME=$(kubectl get service ingress-nginx-controller -n ingress-nginx -o template --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")
Configuring the DNS record depends on your DNS provider. If using
AWS Route53, create an Alias record which points both names
directly to the INGRESS_SERVICE_HOSTNAME
.
For other providers, create a CNAME
which
points them to the
INGRESS_SERVICE_HOSTNAME
.
Logging In
Once the DNS records exist, you can now open
https://${INGRESS_HOSTNAME}
in a browser and
login. Since we are using single-user authentication mode, the
username will be developer and the password can be obtained by
running:
kubectl get secret developer-user -n logging -o=template --template={{.data.password}} | base64 -D
Note
This command uses base64 -D, but you may need to use base64 --decode if using linux.
Sending Data to the Cluster
To send data to the cluster, we will create a new Repository, obtain the ingest token, and then configure fluentbit (a tool for shipping logs) to gather logs from all the pods in our Kubernetes cluster and send them to Humio.
Create Repo, Parser and Ingest Token
Create the Repository using the Humio Operator by running the following. Using a simple text editor, create a file named, humiorepository.yaml and copy the following lines into it:
cat > humiorepository.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioRepository
metadata:
name: quickstart-cluster-logs
namespace: logging
spec:
managedClusterName: humio-quickstart
name: quickstart-cluster-logs
description: "Cluster logs repository"
retention:
timeInDays: 30
ingestSizeInGB: 50
storageSizeInGB: 10
EOF
kubectl apply -f humiorepository.yaml
Next, create a parser which will be assigned to the repository and later on to the ingest token. It is also possible to skip this step and rely on one of the built-in parsers. Using a simple text editor, create a file named, humioparser.yaml and copy the following lines into it:
cat > humioparser.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioParser
metadata:
name: quickstart-cluster-parser
namespace: logging
spec:
managedClusterName: humio-quickstart
name: quickstart-cluster-parser
repositoryName: quickstart-cluster-logs
parserScript: |
case {
kubernetes.pod_name=/fluentbit/
| /\[(?<@timestamp>[^\]]+)\]/
| /^(?<@timestamp>.*)\[warn\].*/
| parseTimestamp(format="yyyy/MM/dd' 'HH:mm:ss", field=@timestamp);
parseJson();
* | kvParse()
}
EOF
kubectl apply -f humioparser.yaml
Now create an Ingest Token using the Humio Operator and assign it to
the repository and use the parser that was created in the previous
steps. Using a simple text editor, create a file named,
humioingesttoken.yaml
and copy the following
lines into it:
cat > humioingesttoken.yaml<<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioIngestToken
metadata:
name: quickstart-cluster-ingest-token
namespace: logging
spec:
managedClusterName: humio-quickstart
name: quickstart-cluster-ingest-token
repositoryName: quickstart-cluster-logs
parserName: quickstart-cluster-parser
tokenSecretName: quickstart-cluster-ingest-token
EOF
kubectl apply -f humioingesttoken.yaml
Since we set tokenSecretName
in the Ingest
Token spec, the token content is stored as a secret in Kubernetes.
We can then fetch the token:
export INGEST_TOKEN=$(kubectl get secret quickstart-cluster-ingest-token -n logging -o template --template '{{.data.token}}' | base64 -D)
Note
This command uses base64 -D, but you may need to use base64 --decode if using linux.
Ingest Logs into the Cluster
Now we ll install fluentbit into the Kubernetes
cluster and configure the endpoint to point to our
$INGRESS_ES_HOSTNAME
, and use the
$INGEST_TOKEN
that was just created.
helm repo add humio https://humio.github.io/humio-helm-charts
helm repo update
Using a simple text editor, create a file named,
humio-agent.yaml
and copy the following lines
into it:
cat > humio-agent.yaml <<EOF
humio-fluentbit:
enabled: true
humioHostname: $INGRESS_ES_HOSTNAME
es:
tls: true
port: 443
inputConfig: |-
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
# The path to the DB file must be unique and
# not conflict with another fluentbit running on the same nodes.
DB /var/log/flb_kube.db
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 512MB
Skip_Long_Lines On
resources:
limits:
cpu: 100m
memory: 1024Mi
requests:
cpu: 100m
memory: 512Mi
EOF
helm upgrade --install fluentbit humio-helm-charts \
--repo https://humio.github.io/humio-helm-charts \
--set humio-fluentbit.token=$INGEST_TOKEN \
--namespace logging --create-namespace \
--values humio-agent.yaml
Verify Logs are Ingested
Go to the Humio UI and click on the quickstart-cluster-logs repository
In the search field, enter:
"kubernetes.container_name" = "humio-operator"
Then click
.Verify you can see the Humio Operator logs
Cleanup
If you have set this up to test the installation and you want to remove the Humio cluster you just built then use the following procedure to clean up:
It's possible to run an individual kubectl delete on each resource, but since we have created a dedicated EKS cluster, we will delete everything we just created by deleting the cluster resource and then running terraform destroy.
First, delete the cluster so pods no longer write to the S3 bucket:
kubectl delete -f humiocluster.yaml
Prior to running terraform destroy, it will be necessary to ensure the S3 bucket that was created by terraform is emptied.
Empty the S3 bucket by using the following command:
aws s3 rm s3://$BUCKET_NAME/ --recursive
Next we ll need to ensure the nginx-ingress-controller service is removed. This way the NLB will be removed from AWS. If this is not done, terraform will get stuck deleting the subnets when performing a terraform destroy:
helm -n ingress-nginx delete ingress-nginx
Once the bucket has been emptied and the nginx-ingress-controller has been deleted, run:
terraform destroy
Also delete the DNS records which were created for
$INGRESS_HOSTNAME
and
$INGRESS_ES_HOSTNAME
.