LogScale on Bare Metal - Deploying Amazon Managed Streaming for Apache Kafka (MSK)

As an alternative to using Apache Kafka, when deploying LogScale within Amazon Web Services (AWS), Amazon Managed Streaming for Apache Kafka (MSK) can be used instead.

See the Amazon AWS MSK documentation for more information on this Amazon service.

Pre-Requisites

There are a couple of pre-requisites to using Amazon MSK with LogScale. First, ensure that the AWS CLI tools installed and configured. Also confirm the Access key and Secret key that will be used to connect. This is so that the custom configurations for your MSK instance.

A Virtual Private Cloud (VPC) must be configured set up for your Availability Zones on AWS, with a subnet for each Kafka broker. For more information, see Getting Started Using Amazon MSK.

Custom MSK Configuration

Read the documentation on how to add a custom configuration file to MSK. If no configuration file is supplied, MSK will change these configuration parameters from normal Kafka defaults. Because LogScale requires certain configuration parameters to be implemented for Kafka, a custom configuration file for MSK must be created.

  1. Create a file named kafka.properties and add the following values to it:

    ini
    replica.fetch.max.bytes=104857600
    message.max.bytes=104857600
    compression.type=producer
    unclean.leader.election.enable=false

    The full list of other MSK parameters that can be used can be found in AWS documentation.

  2. Create the configuration file for use within MSK. The name and description can be anything, but the name can't contain spaces.

    shell
    $ aws kafka create-configuration \
      --name "LogScale-MSK-Configuration"
      --description "Custom LogScale configuration for MSK"
      --kafka-versions "2.3.1"
      --server-properties file://config-file-path
  3. A success message similar to this should be returned:

    json
    {
      "Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/LogScale-MSK-Configuration/abcdabcd-abcd-1234-abcd-abcd123e8e8e-1",
      "CreationTime": "2019-05-21T00:54:23.591Z",
      "Description": "Custom LogScale Configuration for MSK",
      "KafkaVersions": ["2.3.1"],
      "LatestRevision": {
        "CreationTime": "2019-05-21T00:54:23.591Z",
        "Description": "Custom LogScale Configuration for MSK",
        "Revision": 1
      },
      "Name": "LogScale-MSK-Configuration"
    }

The MSK custom configuration should now be available for use later in the deployment process.

Creating MSK Cluster using AWS Console

Creating MSK Cluster using AWS Console

Figure 5. Creating MSK Cluster using AWS Console


To create a cluster:

  1. Login to the console, go to AWS MSK Service and then click on Create Cluster.

  2. Give the Cluster any name. Pick the VPC created for this MSK Cluster. See the Amazon MSK page for setting up the VPC.

  3. The Kafka version should be selected; version 2.4.0 for LogScale is the minimum recommended.

  4. Select the Availability Zones and a subnet for each one. The minimum number of Availability Zones recommended is three.

  5. Then add the custom configuration file (see Custom MSK Configuration). Select Use a Custom Configuration and select the name of the configuration file uploaded.

    AWS MSK Configuration

    Figure 6. AWS MSK Configuration


  6. Next, create the brokers. Kafka brokers use m5 instance types. Specifications for these can be found on the Instance Types under the m5 tab. Define the number of brokers for each availability zone.

  7. Optionally, add tags to identify the cluster. For more information on tagging, see AWS Tagging Strategy.

  8. Define how much Storage each broker will have. MSK uses AWS Elastic Block Storage. The amount of storage should correlate to how much data is going to be ingested. Once created, the storage cannot be decreased.

  9. If the AWS instance as part of a cluster, encryption is recommended to be enabled. Encryption between clients and brokers is possible, but requires some additional steps. For more information, see Configuring Encryption. If plaintext brokers have been selected, these will be on available port 9092; brokers using TLS will be accessible on port 9094.

  10. If TLS Client authentication is required, see Mutual TLS Authentication for more information.

  11. Basic monitoring is available for free, but the enhanced monitoring costs extra. More information can be found on the Monitoring an Amazon MSK Cluster.

  12. When selecting the security group, it's important to note that the LogScale instance must be able to connect to MSK. This can be either allowing inbound and outbound rules for the IP of the LogScale instance, or if LogScale is running on AWS, adding them to the same security group.

Once these steps have been completed, click on the button to create the cluster.

Configuring LogScale for MSK

Once the MSK cluster has been created, deploy LogScale, specifying the correct Kafka and ZooKeeper host information. To find out the Kafka and ZooKeeper host information, visit the MSK installation in the AWS Console and click View Client Information. If using the PLAINTEXT Kafka host information, this will be on port 9092, and TLS brokers will be on port 9094.

When running LogScale on AWS EC2 instances you should ensure that the security group rules allow LogScale to access MSK and vice versa, by adding both to the same security group. See the MSK documentation for more information.

The configuration information will need to be added to the LogScale server.conf configuration file. For example:

ini
KAFKA_SERVERS=b-1.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:9092,b-2.test-msk-cluster.luq8jf.c3.kafka.us-east--2.amazonaws.com:9092,b-3.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:9092

ZOOKEEPER_URL=z-2.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:2181,z-3.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:2181,z-1.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:2181

Configuring Encryption

It's possible to enable encrypted connections between LogScale and Kafka brokers. To do this a file must be created on each LogScale node that contains this parameter:

ini
security.protocol=SSL

In the LogScale configuration file, add the parameter EXTRA_KAFKA_CONFIGS_FILE which points to the name of the file with this configuration line. Ensure that the KAFKA_SERVERS LogScale parameter points to the Kafka brokers that are using TLS which should be on port 9094.

Once LogScale has been started, visit the administration dashboard to ensure the MSK Kafka Brokers and ZooKeeper nodes are visible.