Use Case: Migrating from Helm Chart to Operator

This guide describes how to migrate from an existing cluster running the LogScale Helm Chart to the LogScale Operator and LogScaleCluster custom resource definitions.

Pre-Requisites

Identify Method of Deployment

There are two different approaches to migration depending on how the existing helm chart is deployed.

  • Using ephemeral nodes with bucket storage

  • Using PVCs

By default, the original helm chart uses PVCs. If the existing chart is deployed with the environment variable S3_STORAGE_BUCKET, then it is using ephemeral nodes with bucket storage.

Migrate Kafka & ZooKeeper

The LogScale Operator does not run Kafka and ZooKeeper built-in alongside LogScale as the LogScale Helm Charts do. In order to migrate to the Operator, LogScale must point to a Kafka and ZooKeeper that is not managed by LogScale. There are a number of Open Source Operators for running Kafka and ZooKeeper, for example:

If you're running on AWS, then MSK is recommended for ease of use: MSK

It is necessary to perform the Kafka and ZooKeeper migration before continuing with the migration to the operator. This can be done by taking these steps:

  1. Start up Kafka and ZooKeeper (not managed by the operator)

  2. Shut down LogScale nodes

  3. Reconfigure the values.yaml to use the new Kafka and ZooKeeper connection. For example:

yaml
humio-core:
  external:
    kafkaBrokers: 192.168.0.10:9092,192.168.1.10:9092,192.168.2.10:9092
    zookeeperServers: 192.168.0.20:2181,192.168.1.20:2181,192.168.2.20:2181
  1. Start LogScale back up

Migrating Using Ephemeral Nodes & Bucket Storage

When migrating to the Operator using ephemeral nodes and bucket storage, first install the Operator but bring down the existing LogScale pods prior to creating the LogScaleCluster. Configure the new LogScaleCluster to use the same kafka and zookeeper servers as the existing cluster. The Operator will create pods that assume the identity of the existing nodes and will pull data from bucket storage as needed.

  1. Install the Operator according to the Operator Installation Guide.

  2. Bring down existing pods by changing the replicas of the LogScale stateful set to 0.

  3. Create a LogScaleCluster by referring to the LogScaleCluster Resource. Ensure that this resource is configured the same as the existing chart's values.yaml file. See special considerations. Ensure that TLS is disabled for the LogScaleCluster, see Configuring TLS with Operator. Ensure that autoRebalancePartitions is set to false (default).

  4. Validate that the new LogScale pods are running with the existing node identities and they show up in the Cluster Administration page of the LogScale UI.

  5. Follow either Ingress Migration or Service Migration depending on whether you are using services or ingress to access the LogScale cluster.

  6. Modify the LogScale Helm Chart values.yaml so that it no longer manages LogScale. If using fluentbit, ensure es autodiscovery is turned off:

yaml
humio-core:
  enabled: false
humio-fluentbit:
  es:
    autodiscovery: false

And then run: helm upgrade --values values.yaml humio humio/humio-helm-charts. This will continue to keep fluentbit and/or metricbeat if they are enabled. If you do not wish to keep fluentbit and/or metricbeat or they are not enabled, you can uninstall the LogScale Helm chart by running helm delete --purge humio where humio is the name of the original Helm Chart. Be cautious to delete the original Helm Chart and not the Helm Chart used to install the Operator.

  1. Enable TLS

Migrating Using PVCs

When migrating to the Operator using PVCs, install the Operator while the existing cluster is running and configure the new LogScaleCluster to use the same kafka and zookeeper servers as the existing cluster. The Operator will create new nodes as part of the existing cluster. From there, change the partition layout such that they are assigned to only the new nodes, and then we can uninstall the old helm chart.

  1. Install the Operator according to the Operator Installation Guide.

  2. Create a LogScaleCluster by referring to the LogScaleCluster Resource. Ensure that this resource is configured the same as the existing chart's values.yaml file. See special considerations. Ensure that TLS is disabled for the LogScaleCluster, see TLS. Ensure that autoRebalancePartitions is set to false (default).

  3. Validate that the new LogScale pods are running and show up in the Cluster Administration page of the LogScale UI.

  4. Evict the old pods created by the Helm Chart. LogScale will then move data away from those pods.

  5. Wait until all new nodes contain all the data and the old nodes contain no data.

  6. Follow either Ingress Migration or Service Migration depending on whether you are using services or ingress to access the LogScale cluster.

  7. Modify the LogScale Helm Chart values.yaml so that it no longer manages LogScale. If using fluentbit, ensure es autodiscovery is turned off:

yaml
humio-core:
  enabled: false
humio-fluentbit:
  es:
    autodiscovery: false

And then run: helm upgrade --values values.yaml humio humio/humio-helm-charts. This will continue to keep fluentbit and/or metricbeat if they are enabled. If you do not wish to keep fluentbit and/or metricbeat or they are not enabled, you can uninstall the LogScale Helm chart by running helm delete --purge humio where humio is the name of the original Helm Chart. Be cautious to delete the original Helm Chart and not the Helm Chart used to install the Operator.

  1. Enable TLS.

Service Migration

This section is only applicable if the method of accessing the cluster is via the service resources. If you are using ingress, refer to the Ingress Migration.

This section is only applicable if the method of accessing the cluster is via the service resources. If you are using ingress, refer to the Ingress Migration.

The LogScale Helm Chart manages three services: the http service, the es service and a headless service which is required by the statefulset. All of these services will be replaced by a single service which is named with the name of the LogScaleCluster.

After migrating the pods, it will no longer be possible to access the cluster using any of the old services. Ensure that the new service in the LogScaleCluster is exposed the same way (e.g., type: LoadBalancer) and then begin using the new service to access the cluster.

Ingress Migration

This section is only applicable if the method of accessing the cluster is via the ingress resources. If you are using services, refer to the Service Migration.

When migrating using ingress, be sure to enable and configure the LogScaleCluster ingress using the same hostnames that the Helm Chart uses. See Ingress. As long as the ingress resources use the same ingress controller, they should migrate seamlessly as DNS will resolve to the same nginx controller. The ingress resources managed by the Helm Chart will be deleted when the Helm Chart is removed or when humio-core.enabled is set to false in the values.yaml.

If you wish to use the same certificates that were generated for the old ingress resource for the new ingresses, you must copy the old secrets to the new name format of <cluster name>-certificate and <cluster name>-es-certificate. It is possible to use a custom secret name for the certificates by setting spec.ingress.secretName and spec.ingress.esSecretName on the LogScaleCluster resource, however you cannot simply set this to point to the existing secrets as they are managed by the Helm Chart and will be deleted when the Helm Chart is removed or when humio-core.enabled is set to false in the values.yaml.

Special Considerations

There are many situations that when migrating from the LogScale Helm Chart to the Operator where the configuration does not transfer directly from the values.yaml to the LogScaleCluster resource. This section lists some common configurations with the original Helm Chart values.yaml and the replacement LogScaleCluster spec configuration. Only the relevant parts of the configuration are present starting from the top-level key for the subset of the resource.

It is not necessary to migrate every one of the listed configurations, but instead use these as a reference on how to migrate only the configurations that are relevant to your cluster.

TLS

The LogScale Helm Chart supports TLS for Kafka communication but does not support TLS for LogScale-to-LogScale communication. This section refers to LogScale-to-LogScale TLS. For Kafka, see extra kafka configuration.

By default, TLS is enabled when creating a LogScaleCluster resource. This is recommended, however, when performing a migration from the Helm Chart, TLS should be disabled and then after the migration is complete TLS can be enabled.

LogScale Helm Chart

Not supported

LogScaleCluster
yaml
spec:
  tls:
    enabled: false
Host Path

The Operator creates LogScale pods with a stricter security context than the LogScale Helm Charts. To support this stricter context, it is necessary for the permissions of the hostPath.path (i.e. the path on the kubernetes node that is mounted into the LogScale pods) has a group owner of the nobody user which is user id 65534.

LogScale Helm Chart
yaml
humio-core:
   primaryStorage:
    type: hostPath
  hostPath:
    path: /mnt/disks/vol1
    type: Directory
LogScaleCluster
yaml
spec:
  dataVolumeSource:
    hostPath:
      path: /mnt/disks/vol1
      type: Directory
Persistent Volumes

By default, the Helm Chart uses persistent volumes for storage of the LogScale data volume. This changed in the Operator, where it is now required to define the storage medium.

LogScale Helm Chart
yaml
humio-core:
  storageVolume:
    size: 50Gi
LogScaleCluster
yaml
spec:
  dataVolumePersistentVolumeClaimSpecTemplate:
    accessModes: [ReadWriteOnce]
    resources:
      requests:
        storage: 50Gi
Custom Storage Class for Persistent Volumes

LogScale Helm Chart

Create a storage class:

yaml
humio-core:
  storageClass:
    provisioner: kubernetes.io/gce-pd
    parameters:
      type: pd-ssd

Use a custom storage class:

yaml
humio-core:
  storageClassName: custom-storage-class-name
LogScaleCluster

Creating a storage class is no longer supported. First, create your storage class by following the offical docs and then use the following configuration to use it.

yaml
spec:
  dataVolumePersistentVolumeClaimSpecTemplate:
    storageClassName: my-storage-class
Pod Resources
LogScale Helm Chart
yaml
humio-core:
  resources:
    limits:
      cpu: "4"
      memory: 6Gi
    requests:
      cpu: 2
      memory: 4Gi
LogScaleCluster
yaml
spec:
  resources:
    limits:
      cpu: "4"
      memory: 6Gi
    requests:
      cpu: 2
      memory: 4Gi
JVM Settings
LogScale Helm Chart
yaml
jvm:
  xss: 2m
  xms: 256m
  xmx: 1536m
  maxDirectMemorySize: 1536m
  extraArgs: "-XX:+UseParallelGC"
Pod Anti-Affinity

It is highly recommended to have anti-affinity policies in place and required for when using hostPath for storage.

Note that the LogScale pod labels are different between the Helm Chart and operator. In the Helm Chart, the pod label that is used for anti-affinity is app=humio-core, while the operator is app.kubernetes.io/name=humio. If migrating PVCs, it is important to ensure that the new pods created by the operator are not scheduled on the nodes that run the old pods created by the LogScale Helm Chart. To do this, ensure there is a matchExpressions with DoesNotExist on the app key. See below for the example.

LogScale Helm Chart
yaml
humio-core:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - humio-core
          topologyKey: kubernetes.io/hostname
LogScaleCluster
yaml
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - humio
              - key: app
                operator: DoesNotExist
          topologyKey: kubernetes.io/hostname
Service Type
LogScale Helm Chart
yaml
humio-core:
  service:
    type: LoadBalancer
LogScaleCluster
yaml
spec:
  humioServiceType: LoadBalancer
Ingress
LogScale Helm Chart
yaml
humio-core:
  ingress:
    enabled: true
    config:
      - name: general
        annotations:
          certmanager.k8s.io/acme-challenge-type: http01
          certmanager.k8s.io/cluster-issuer: letsencrypt-prod
          kubernetes.io/ingress.class: nginx
          kubernetes.io/tls-acme: "true"
        hosts:
          - host: my-cluster.example.com
            paths:
              - /
        tls:
          - secretName: my-cluster-crt
            hosts:
              - my-cluster.example.com
      - name: ingest-es
        annotations:
          certmanager.k8s.io/acme-challenge-type: http01
          cert-manager.io/cluster-issuer: letsencrypt-prod
          kubernetes.io/ingress.class: nginx
          kubernetes.io/tls-acme: "true"
        rules:
          - host: my-cluster-es.humio.com
            http:
              paths:
                - path: /
                  backend:
                    serviceName: humio-humio-core-es
                    servicePort: 9200
        tls:
          - secretName: my-cluster-es-crt
            hosts:
              - my-cluster-es.humio.com
      ...
LogScaleCluster
yaml
spec:
  hostname: "my-cluster.example.com"
  esHostname: "my-cluster-es.example.com"
  ingress:
    enabled: true
    controller: nginx
    # optional secret names. do not set these to the secrets created by the helm chart as they will be deleted when the
    # helm chart is removed
    # secretName: my-cluster-certificate
    # esSecretName: my-cluster-es-certificate
    annotations:
      use-http01-solver: "true"
      cert-manager.io/cluster-issuer: letsencrypt-prod
      kubernetes.io/ingress.class: nginx
Bucket Storage GCP
LogScale Helm Chart
yaml
humio-core:
 bucketStorage:
    backend: gcp
  env:
    - name: GCP_STORAGE_BUCKET
      value: "example-cluster-storage"
    - name: GCP_STORAGE_ENCRYPTION_KEY
      value: "example-random-encryption-string"
    - name: LOCAL_STORAGE_PERCENTAGE
      value: "80"
    - name: LOCAL_STORAGE_MIN_AGE_DAYS
      value: "7"
LogScaleCluster
yaml
spec:
  extraHumioVolumeMounts:
    - name: gcp-storage-account-json-file
      mountPath: /var/lib/humio/gcp-storage-account-json-file
      subPath: gcp-storage-account-json-file
      readOnly: true
  extraVolumes:
    - name: gcp-storage-account-json-file
      secret:
        secretName: gcp-storage-account-json-file
 environmentVariables:
    - name: GCP_STORAGE_ACCOUNT_JSON_FILE
      value: "/var/lib/humio/gcp-storage-account-json-file"
    - name: GCP_STORAGE_BUCKET
      value: "my-cluster-storage"
    - name: GCP_STORAGE_ENCRYPTION_KEY
      value: "my-encryption-key"
    - name: LOCAL_STORAGE_PERCENTAGE
      value: "80"
   - name: LOCAL_STORAGE_MIN_AGE_DAYS
      value: "7"
Bucket Storage S3

The S3 bucket storage configuration is the same, with the exception to how the enivronment variables are set.

LogScale Helm Chart
yaml
humio-core:
  env:
    - name: S3_STORAGE_BUCKET
      value: "example-cluster-storage"
    - name: S3_STORAGE_REGION
      value: "us-west-2"
    - name: S3_STORAGE_ENCRYPTION_KEY
      value: "example-random-encryption-string"
    - name: LOCAL_STORAGE_PERCENTAGE
      value: "80"
    - name: LOCAL_STORAGE_MIN_AGE_DAYS
      value: "7"
    - name: S3_STORAGE_PREFERRED_COPY_SOURCE
      value: "true"
LogScaleCluster
yaml
spec:
  environmentVariables:
    - name: S3_STORAGE_BUCKET
      value: "example-cluster-storage"
    - name: S3_STORAGE_REGION
      value: "us-west-2"
    - name: S3_STORAGE_ENCRYPTION_KEY
      value: "example-random-encryption-string"
    - name: LOCAL_STORAGE_PERCENTAGE
      value: "80"
    - name: LOCAL_STORAGE_MIN_AGE_DAYS
      value: "7"
    - name: S3_STORAGE_PREFERRED_COPY_SOURCE
      value: "true"
Ephemeral Nodes and Cluster Identity

There are three main parts to using ephemeral nodes: setting the USING_EPHEMERAL_DISKS environment variable, selecting ZooKeeper cluster identity and setting AWS Bucket Storage or Google Cloud Bucket Storage (described in the separate linked section). In the Helm Chart, zookeeper identity is explicitly configured, but the operator now defaults to using zookeeper for identity regardless of the ephemeral disks setting.

LogScale Helm Chart
yaml
humio-core:
  clusterIdentity:
    type: zookeeper
  env:
    - name: ZOOKEEPER_URL_FOR_NODE_UUID
      value: "$(ZOOKEEPER_URL)"
    - name: USING_EPHEMERAL_DISKS
      value: "true"
LogScaleCluster
yaml
spec:
  environmentVariables:
    - name: USING_EPHEMERAL_DISKS
      value: "true"
Cache Configuration

Cache configuration is no longer supported in the LogScale operator. It's recommended to use ephemeral nodes and bucket storage instead.

LogScale Helm Chart
yaml
humio-core:
  cache:
    localVolume:
      enabled: true
LogScaleCluster

Not supported

Authentication - OAuth Google
LogScale Helm Chart
yaml
humio-core:
  authenticationMethod: oauth
  oauthConfig:
    autoCreateUserOnSuccessfulLogin: true
    publicUrl: https://my-cluster.example.com
  env:
    - name: GOOGLE_OAUTH_CLIENT_SECRET
      valueFrom:
        secretKeyRef:
          name: humio-google-oauth-secret
          key: supersecretkey
    - name: GOOGLE_OAUTH_CLIENT_ID
      value: YOURCLIENTID
LogScaleCluster
yaml
spec:
  environmentVariables:
    - name: AUTHENTICATION_METHOD
      value: oauth
    - name: AUTO_CREATE_USER_ON_SUCCESSFUL_LOGIN
      value: "true"
    - name: PUBLIC_URL
      value: https://my-cluster.example.com
    - name: GOOGLE_OAUTH_CLIENT_SECRET
      valueFrom:
        secretKeyRef:
          name: humio-google-oauth-secret
          key: supersecretkey
    - name: GOOGLE_OAUTH_CLIENT_ID
      value: YOURCLIENTID
Authentication - OAuth Github
LogScale Helm Chart
yaml
humio-core:
  authenticationMethod: oauth
  env:
    - name: PUBLIC_URL
      value: https://my-cluster.example.com
    - name: GITHUB_OAUTH_CLIENT_ID
      value: client-id-from-github-oauth
    - name: GITHUB_OAUTH_CLIENT_SECRET
      value: client-secret-from-github-oauth
LogScaleCluster
yaml
spec:
  environmentVariables:
    - name: AUTHENTICATION_METHOD
      value: oauth
    - name: AUTO_CREATE_USER_ON_SUCCESSFUL_LOGIN
      value: "true"
    - name: PUBLIC_URL
      value: https://my-cluster.example.com
    - name: GITHUB_OAUTH_CLIENT_ID
      value: client-id-from-github-oauth
    - name: GITHUB_OAUTH_CLIENT_SECRET
      value: client-secret-from-github-oauth
Authentication - OAuth BitBucket
LogScale Helm Chart
yaml
humio-core:
  authenticationMethod: oauth
  env:
    - name: PUBLIC_URL
      value: https://my-cluster.example.com
    - name: BITBUCKET_OAUTH_CLIENT_ID
      value: client-id-from-bitbucket-oauth
    - name: BITBUCKET_OAUTH_CLIENT_SECRET
      value: client-secret-from-bitbucket-oauth
LogScaleCluster
yaml
spec:
  environmentVariables:
    - name: AUTHENTICATION_METHOD
      value: oauth
    - name: AUTO_CREATE_USER_ON_SUCCESSFUL_LOGIN
      value: "true"
    - name: BITBUCKET_OAUTH_CLIENT_ID
      value: client-id-from-bitbucket-oauth
    - name: BITBUCKET_OAUTH_CLIENT_SECRET
      value: client-secret-from-bitbucket-oauth
Authentication - SAML

When using SAML, it's necessary to follow the Configuration & Authentication with SAML instructions and once the IDP certificate is obtained, you must create a secret containing that certificate using kubectl. The secret name is slightly different in the LogScaleCluster vs the Helm Chart as the LogScaleCluster secret must be prefixed with the cluster name.

Creating the secret:

Helm Chart:

yaml
kubectl create secret generic idp-certificate --from-file=idp-certificate=./my-idp-certificate.pem -n <namespace>

LogScaleCluster:

yaml
kubectl create secret generic <cluster-name>-idp-certificate --from-file=idp-certificate.pem=./my-idp-certificate.pem -n <namespace>
LogScale Helm Chart
yaml
humio-core:
  authenticationMethod: saml
  samlConfig:
    publicUrl: https://my-cluster.example.com
    idpSignOnUrl: https://accounts.google.com/o/saml2/idp?idpid=idptoken
    idpEntityId: https://accounts.google.com/o/saml2/idp?idpid=idptoken
  env:
    - name: GOOGLE_OAUTH_CLIENT_SECRET
      valueFrom:
        secretKeyRef:
          name: humio-google-oauth-secret
          key: supersecretkey
    - name: GOOGLE_OAUTH_CLIENT_ID
      value: YOURCLIENTID
LogScaleCluster
yaml
spec:
  environmentVariables:
    - name: AUTHENTICATION_METHOD
      value: saml
    - name: AUTO_CREATE_USER_ON_SUCCESSFUL_LOGIN
      value: "true"
    - name: PUBLIC_URL
      value: https://my-cluster.example.com
    - name: SAML_IDP_SIGN_ON_URL
      value: https://accounts.google.com/o/saml2/idp?idpid=idptoken
    - name: SAML_IDP_ENTITY_ID
      value: https://accounts.google.com/o/saml2/idp?idpid=idptoken
Authentication - By Proxy
LogScale Helm Chart
yaml
humio-core:
  authenticationMethod: byproxy
  authByProxyConfig:
    headerName: name-of-http-header
LogScaleCluster
yaml
spec:
  environmentVariables:
    - name: AUTHENTICATION_METHOD
      value: byproxy
    - name: AUTH_BY_PROXY_HEADER_NAME
      value: name-of-http-header
Authentication - Single User

The Helm Chart generated a password for developer user when using single-user mode. The operator does not do this so you must supply your own password. This can be done via a plain text environment variable or using a kuberenetes secret that is referenced by an environment variable. If supplying a secret, you must populate this secret prior to creating the LogScaleCluster resource otherwise the pods will fail to start.

LogScale Helm Chart
yaml
humio-core:
  authenticationMethod: single-user
LogScaleCluster

Note that the AUTHENTICATION_METHOD defaults to single-user.

By setting a password using an environment variable plain text value:

yaml
spec:
  environmentVariables:
    - name: "SINGLE_USER_PASSWORD"
      value: "MyVeryS3cretPassword"

By setting a password using an environment variable secret reference:

yaml
spec:
  environmentVariables:
    - name: "SINGLE_USER_PASSWORD"
      valueFrom:
        secretKeyRef:
          name: developer-user-password
          key: password
Extra Kafka Configs
LogScale Helm Chart
yaml
humio-core:
  extraKafkaConfigs: "security.protocol=SSL"
LogScaleCluster
yaml
spec:
  extraKafkaConfigs: "security.protocol=SSL"
Prometheus

The LogScale Helm chart supported setting the prometheus.io/port and prometheus.io/scrape annotations on the LogScale pods. The Operator no longer supports this.

LogScale Helm Chart
yaml
humio-core:
  prometheus:
    enabled: true
LogScaleCluster

Not supported

Pod Security Context
LogScale Helm Chart
yaml
humio-core:
  podSecurityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
LogScaleCluster
yaml
spec:
  podSecurityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
Container Security Context
LogScale Helm Chart
yaml
humio-core:
  containerSecurityContext:
    capabilities:
      add: ["SYS_NICE"]
LogScaleCluster
yaml
spec:
  containerSecurityContext:
    capabilities:
      add: ["SYS_NICE"]
Initial Partitions

The Helm Chart accepted both ingest.initialPartitionsPerNode and storage.initialPartitionsPerNode. The Operator no longer supports the per-node setting, so it's up to the administrator to set the initial partitions such that they are divisible by the node count.

LogScale Helm Chart
yaml
humio-core:
  ingest:
    initialPartitionsPerNode: 4
  storage:
    initialPartitionsPerNode: 4
LogScaleCluster

Assuming a three node cluster:

yaml
spec:
  environmentVariables:
    - name: "INGEST_QUEUE_INITIAL_PARTITIONS"
      value: "12"
    - name: "DEFAULT_PARTITION_COUNT"
      value: "12"
Log Storage

The Helm Chart supports the use of separate storage for logs. This is not supported in the Operator and instead defaults to running LogScale with the environment variable LOG4J_CONFIGURATION=log4j2-json-stdout.xml which outputs to stdout in json format.

LogScale Helm Chart
yaml
humio-core:
  jvm:
    xss: 2m
    xms: 256m
    xmx: 1536m
    maxDirectMemorySize: 1536m
    extraArgs: "-XX:+UseParallelGC"
LogScaleCluster

Not supported