AWS Cloud Reference Architecture
The following explains how to quickly set up a LogScale cluster using the Humio Operator.
As part of the Quick Start, we will create AWS resources such as MSK, EKS and S3 bucket using terraform, and then install the LogScale Operator using helm. For production installations, it is recommended to follow the full Installation Guide and decide how running LogScale fits into your infrastructure.
Prerequisites
Authentication & Permissions
Ensure you are logged into the AWS through the terminal and have the necessary permissions to create resources such as EKS and MSK clusters and S3 buckets. For additional AWS authentication options, see the authentication section of the terraform AWS provider documentation.
When authenticating with
kubectl
later in the doc, it
will expect that the
aws-iam-authenticator
is
installed, and it will use the above AWS authentication.
Create AWS Resources
The following will create an EKS cluster with three nodes by default, an MSK cluster with three nodes by default, an S3 bucket where the LogScale data will be stored, and a number of dependent resources such as a VPC, subnets, security groups and an internet gateway.
First, clone the operator quick-start repo where the terraform quick start files are stored:
$ git clone https://github.com/humio/humio-operator-quickstart
$ cd humio-operator-quickstart/aws
Note, review the default values in the
variables.tf
file. It's possible
to overwrite these, but be careful as changing some may have
undesirable effects. A common change may be overwriting
region
, but changing instance
types for example will have downstream consequences such as when
setting the resources for the HumioCluster.
And then init and apply terraform:
$ terraform init
$ terraform apply
Once the terraform resources have been applied, configure kubectl to point to the newly created EKS cluster:
$ export KUBECONFIG=$PWD/kubeconfig
$ aws eks --region $(terraform output -raw region) update-kubeconfig --name $(terraform output -raw cluster_name)
And then verify you can authenticate with the EKS cluster and see pods:
$ kubectl get pods -A
Install Humio Operator Dependencies
It is necessary to have both cert-manager and the nginx-ingress controller if running the Humio Operator with TLS and/or ingress enabled.
Install Cert Manager
$ kubectl create namespace cert-manager
$ helm repo add jetstack https://charts.jetstack.io
$ helm repo update
$ helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--version v1.13.2
Once cert manager is installed, create a clusterissuer which will be used to issue the certs for our LogScale cluster:
$ export MY_EMAIL=<your email address>
Create the
clusterissuer.yaml
file with
the following content:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: $MY_EMAIL
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
Next, execute the following:
kubectl apply -f clusterissuer.yaml
Install the Nginx Ingress Controller
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.2.0/deploy/static/provider/aws/deploy.yaml
Install the Humio Operator
Now that you have authenticated with the EKS cluster, it's time to create the Humio Operator.
$ kubectl create namespace logging
$ helm repo add humio-operator https://humio.github.io/humio-operator
$ helm repo update
$ helm install humio-operator humio-operator/humio-operator \
--namespace logging
You can check the status of the Humio Operator pod by running:
$ kubectl get pods -n logging
Prepare for Creating LogScale Cluster
Before creating a cluster, we need set a number of attributes specific to the cluster. We will set these as environment variables and then reference them later when creating the HumioCluster spec.
First, generate an encryption key that will be used by LogScale to encrypt the data in the S3 bucket.
$ kubectl create secret generic bucket-storage --from-literal="encryption-key=$(openssl rand -base64 64)" -n logging
Also create a developer user password which we will use to login once the LogScale cluster is up. By default we will start LogScale in single-user mode.
$ kubectl create secret generic developer-user --from-literal="password=$(openssl rand -base64 16)" -n logging
We will need the connection strings for Kafka and ZooKeeper, as well as the name of the S3 bucket and Role ARN which has access to write to the bucket. We can obtain those from terraform:
$ export KAFKA_BROKERS=$(terraform output bootstrap_brokers_tls)
$ export ZOOKEEPER_CONNECTION=$(terraform output zookeeper_connect_string)
$ export ROLE_ARN=$(terraform output oidc_role_arn)
$ export BUCKET_NAME=$(terraform output s3_bucket_name)
Additionally, we'll need to set hostnames for the HTTP and Elasticsearch ingresses. Use your own domain here. In order to use ingress with Let's Encrypt encryption, a DNS record must be created later in this process.
$ export INGRESS_HOSTNAME=humio-quickstart.example.com
$ export INGRESS_ES_HOSTNAME=humio-quickstart-es.example.com
Also set the region:
$ export REGION=us-west-2
Add license secret:
$ kubectl create secret generic humio-quickstart-license --namespace logging --from-literal=data=<license>
Create a LogScale Cluster
Finally, we can configure a yaml file which contains the HumioCluster
(known as HumioCluster
) spec. Run the following
command to create a file named
humiocluster.yaml
with the
desired HumioCluster
spec:
cat > humiocluster.yaml <<EOF
apiVersion: core.humio.com/v1alpha1
kind: HumioCluster
metadata:
name: humio-quickstart
namespace: logging
spec:
image: humio-core:1.124.0
license:
secretKeyRef:
name: humio-quickstart-license
key: data
nodeCount: 3
targetReplicationFactor: 2
storagePartitionsCount: 24
digestPartitionsCount: 24
extraKafkaConfigs: "security.protocol=SSL"
tls:
enabled: true
autoRebalancePartitions: true
hostname: ${INGRESS_HOSTNAME}
esHostname: ${INGRESS_ES_HOSTNAME}
ingress:
enabled: true
controller: nginx
annotations:
use-http01-solver: "true"
cert-manager.io/cluster-issuer: letsencrypt-prod
kubernetes.io/ingress.class: nginx
resources:
limits:
cpu: "2"
memory: 12Gi
requests:
cpu: "1"
memory: 6Gi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- humio
topologyKey: kubernetes.io/hostname
dataVolumeSource:
hostPath:
path: "/mnt/disks/vol1"
type: "Directory"
humioServiceAccountAnnotations:
eks.amazonaws.com/role-arn: ${ROLE_ARN}
environmentVariables:
- name: S3_STORAGE_BUCKET
value: ${BUCKET_NAME}
- name: S3_STORAGE_REGION
value: ${REGION}
- name: LOCAL_STORAGE_PERCENTAGE
value: "80"
- name: LOCAL_STORAGE_MIN_AGE_DAYS
value: "7"
- name: S3_STORAGE_ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: bucket-storage
key: encryption-key
- name: USING_EPHEMERAL_DISKS
value: "true"
- name: S3_STORAGE_PREFERRED_COPY_SOURCE
value: "true"
- name: SINGLE_USER_USERNAME
value: "admin"
- name: SINGLE_USER_PASSWORD
valueFrom:
secretKeyRef:
name: developer-user
key: password
- name: "ZOOKEEPER_URL"
value: ${ZOOKEEPER_CONNECTION}
- name: "KAFKA_SERVERS"
value: ${KAFKA_BROKERS}
EOF
And then apply it:
$ kubectl apply -f humiocluster.yaml
Note
environmentVariables
in HumioCluster
is an
array of corev1.EnvVar
types, so each item in the array
has the same capabilities as
envvar-v1-core.
Validate the LogScale Cluster
Check the status of the HumioCluster by running:
$ kubectl get humiocluster -n logging
Initially the cluster will go into the state
Bootstrapping
as it starts up,
but once it starts all nodes it will go into the state of
Running
.
Access the LogScale Cluster
Configure DNS
To access the HumioCluster as well as allow cert-manager to generate
a valid certificate for the cluster, there must be a DNS record
added for $INGRESS_HOSTNAME
as well as
$INGRESS_ES_HOSTNAME
which point to the NLB name of
the ingress service. To get the NLB name of the ingress service,
run:
$ export INGRESS_SERVICE_HOSTNAME=$(kubectl get service ingress-nginx-controller -n ingress-nginx -o template --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")
Configuring the DNS record depends on your DNS provider. If using
AWS Route53, create an Alias
record which points both names directly to the
INGRESS_SERVICE_HOSTNAME
. For other providers, create
a CNAME
which points them to
the INGRESS_SERVICE_HOSTNAME
.
Logging In
Once the DNS records exist, you can now open
https://${INGRESS_HOSTNAME}
in a browser and login. Since
we are using single-user authentication mode, the username will be
the value of SINGLE_USER_USERNAME
in the cluster
spec, admin in the example, the password can be obtained by running:
$ kubectl get secret developer-user -n logging -o=template --template={{.data.password}} | base64 -D
Note, this command uses base64 -D, but you may need to use base64 --decode if using linux.
Sending Data to the Cluster
To send data to the cluster, we will create a new repository, obtain the ingest token, and then configure fluentbit to gather logs from all the pods in our Kubernetes cluster and send them to LogScale.
Create Repo, Parser and Ingest Token
Create the repository using the Humio Operator by running the
following. Using a simple text editor, create a file named,
humiorepository.yaml
and
copy the following lines into it:
apiVersion: core.humio.com/v1alpha1
kind: HumioRepository
metadata:
name: quickstart-cluster-logs
namespace: logging
spec:
managedClusterName: humio-quickstart
name: quickstart-cluster-logs
description: "Cluster logs repository"
retention:
timeInDays: 30
ingestSizeInGB: 50
storageSizeInGB: 10
$ kubectl apply -f humiorepository.yaml
Next, create a parser which will be assigned to the repository and
later on to the ingest token. It is also possible to skip this step
and rely on one of the built-in parsers. Using a simple text editor,
create a file named,
humioparser.yaml
and copy
the following lines into it:
apiVersion: core.humio.com/v1alpha1
kind: HumioParser
metadata:
name: quickstart-cluster-parser
namespace: logging
spec:
managedClusterName: humio-quickstart
name: quickstart-cluster-parser
repositoryName: quickstart-cluster-logs
parserScript: |
case {
kubernetes.pod_name=/fluentbit/
| /\[(?<@timestamp>[^\]]+)\]/
| /^(?<@timestamp>.*)\[warn\].*/
| parseTimestamp(format="yyyy/MM/dd' 'HH:mm:ss", field=@timestamp);
parseJson();
* | kvParse()
}
Apply the changes:
$ kubectl apply -f humioparser.yaml
Now create an Ingest Token using the Humio Operator and assign it to
the repository and use the parser that were created in the previous
steps. Using a simple text editor, create a file named,
humioingesttoken.yaml
and
copy the following lines into it:
apiVersion: core.humio.com/v1alpha1
kind: HumioIngestToken
metadata:
name: quickstart-cluster-ingest-token
namespace: logging
spec:
managedClusterName: humio-quickstart
name: quickstart-cluster-ingest-token
repositoryName: quickstart-cluster-logs
parserName: quickstart-cluster-parser
tokenSecretName: quickstart-cluster-ingest-token
Then update the configuration:
$ kubectl apply -f humioingesttoken.yaml
Since we set tokenSecretName
in the Ingest Token spec, the token content is stored as a secret in
Kubernetes. We can then fetch the token:
$ export INGEST_TOKEN=$(kubectl get secret quickstart-cluster-ingest-token -n logging -o template --template '{{.data.token}}' | base64 -D)
Note
This command uses base64 -D, but you may need to use base64 --decode if using linux.
Ingest Logs into the Cluster
Now we'll install fluentbit into the Kubernetes cluster and
configure the endpoint to point to our
$INGRESS_ES_HOSTNAME
, and use the
$INGEST_TOKEN
that was just created.
$ helm repo add humio https://humio.github.io/humio-helm-charts
$ helm repo update
Using a simple text editor, create a file named,
humio-agent.yaml
and copy the
following lines into it:
humio-fluentbit:
enabled: true
humioHostname: $INGRESS_ES_HOSTNAME
es:
tls: true
port: 443
inputConfig: |-
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
# The path to the DB file must be unique and
# not conflict with another fluentbit running on the same nodes.
DB /var/log/flb_kube.db
Tag kube.*
Refresh_Interval 5
Mem_Buf_Limit 512MB
Skip_Long_Lines On
resources:
limits:
cpu: 100m
memory: 1024Mi
requests:
cpu: 100m
memory: 512Mi
$ helm install humio humio/humio-helm-charts \
--namespace logging \
--set humio-fluentbit.token=$INGEST_TOKEN \
--values humio-agent.yaml
Verify Logs are Ingested
Go to the LogScale UI and click on the quickstart-cluster-logs repository
In the search field, enter
"kubernetes.container_name" = "humio-operator"
and clickVerify you can see the Humio Operator logs
Cleanup
It's possible to run an individual kubectl delete on each resource, but since we have created a dedicated EKS cluster, we will delete everything we just created by deleting the cluster resource and then running terraform destroy.
First, delete the cluster so pods no longer write to the S3 bucket:
$ kubectl delete -f humiocluster.yaml
Prior to running terraform destroy, it will be necessary to ensure the S3 bucket that was created by terraform is emptied. The name of the S3 bucket can be obtained by running:
$ terraform output s3_bucket_name
Now empty the S3 bucket either through the AWS console or CLI.
Next we'll need to ensure the nginx-ingress-controller's service is removed. This way the NLB will be removed from AWS. If this is not done, terraform will get stuck deleting the subnets when performing a terraform destroy:
$ kubectl delete -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.35.0/deploy/static/provider/aws/deploy.yaml
Once the bucket has been emptied and the
nginx-ingress-controller
has been deleted, run:
$ terraform destroy
Also delete the DNS records which were created for
$INGRESS_HOSTNAME
and
$INGRESS_ES_HOSTNAME
.