LogScale on Bare Metal - Deploying Amazon Managed Streaming for Apache Kafka (MSK)
As an alternative to using Apache Kafka, when deploying LogScale within Amazon Web Services (AWS), Amazon Managed Streaming for Apache Kafka (MSK) can be used instead.
See the Amazon AWS MSK documentation for more information on this Amazon service.
Pre-Requisites
There are a couple of pre-requisites to using Amazon MSK with LogScale. First, ensure that the AWS CLI tools installed and configured. Also confirm the Access key and Secret key that will be used to connect. This is so that the custom configurations for your MSK instance.
A Virtual Private Cloud (VPC) must be configured set up for your Availability Zones on AWS, with a subnet for each Kafka broker. For more information, see Getting Started Using Amazon MSK.
Custom MSK Configuration
Read the documentation on how to add a custom configuration file to MSK. If no configuration file is supplied, MSK will change these configuration parameters from normal Kafka defaults. Because LogScale requires certain configuration parameters to be implemented for Kafka, a custom configuration file for MSK must be created.
Create a file named
kafka.properties
and add the following values to it:inireplica.fetch.max.bytes=104857600 message.max.bytes=104857600 compression.type=producer unclean.leader.election.enable=false
The full list of other MSK parameters that can be used can be found in AWS documentation.
Create the configuration file for use within MSK. The name and description can be anything, but the name can't contain spaces.
shell$
aws kafka create-configuration \ --name "LogScale-MSK-Configuration" --description "Custom LogScale configuration for MSK" --kafka-versions "2.3.1" --server-properties file://config-file-path
A success message similar to this should be returned:
json{ "Arn": "arn:aws:kafka:us-east-1:123456789012:configuration/LogScale-MSK-Configuration/abcdabcd-abcd-1234-abcd-abcd123e8e8e-1", "CreationTime": "2019-05-21T00:54:23.591Z", "Description": "Custom LogScale Configuration for MSK", "KafkaVersions": ["2.3.1"], "LatestRevision": { "CreationTime": "2019-05-21T00:54:23.591Z", "Description": "Custom LogScale Configuration for MSK", "Revision": 1 }, "Name": "LogScale-MSK-Configuration" }
The MSK custom configuration should now be available for use later in the deployment process.
Creating MSK Cluster using AWS Console
Figure 5. Creating MSK Cluster using AWS Console
To create a cluster:
Login to the console, go to AWS MSK Service and then click on Create Cluster.
Give the Cluster any name. Pick the VPC created for this MSK Cluster. See the Amazon MSK page for setting up the VPC.
The Kafka version should be selected; version 2.4.0 for LogScale is the minimum recommended.
Select the Availability Zones and a subnet for each one. The minimum number of Availability Zones recommended is three.
Then add the custom configuration file (see Custom MSK Configuration). Select Use a Custom Configuration and select the name of the configuration file uploaded.
Figure 6. AWS MSK Configuration
Next, create the brokers. Kafka brokers use
m5
instance types. Specifications for these can be found on the Instance Types under the m5 tab. Define the number of brokers for each availability zone.Optionally, add tags to identify the cluster. For more information on tagging, see AWS Tagging Strategy.
Define how much Storage each broker will have. MSK uses AWS Elastic Block Storage. The amount of storage should correlate to how much data is going to be ingested. Once created, the storage cannot be decreased.
If the AWS instance as part of a cluster, encryption is recommended to be enabled. Encryption between clients and brokers is possible, but requires some additional steps. For more information, see Configuring Encryption. If
plaintext
brokers have been selected, these will be on available port 9092; brokers using TLS will be accessible on port 9094.If TLS Client authentication is required, see Mutual TLS Authentication for more information.
Basic monitoring is available for free, but the enhanced monitoring costs extra. More information can be found on the Monitoring an Amazon MSK Cluster.
When selecting the security group, it's important to note that the LogScale instance must be able to connect to MSK. This can be either allowing inbound and outbound rules for the IP of the LogScale instance, or if LogScale is running on AWS, adding them to the same security group.
Once these steps have been completed, click on the button to create the cluster.
Configuring LogScale for MSK
Once the MSK cluster has been created, deploy LogScale, specifying the correct Kafka and ZooKeeper host information. To find out the Kafka and ZooKeeper host information, visit the MSK installation in the AWS Console and click View Client Information. If using the PLAINTEXT Kafka host information, this will be on port 9092, and TLS brokers will be on port 9094.
When running LogScale on AWS EC2 instances you should ensure that the security group rules allow LogScale to access MSK and vice versa, by adding both to the same security group. See the MSK documentation for more information.
The configuration information will need to be added to the LogScale
server.conf
configuration file.
For example:
KAFKA_SERVERS=b-1.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:9092,b-2.test-msk-cluster.luq8jf.c3.kafka.us-east--2.amazonaws.com:9092,b-3.test-msk-cluster.luq8jf.c3.kafka.us-east-2.amazonaws.com:9092
Configuring Encryption
It's possible to enable encrypted connections between LogScale and Kafka brokers. To do this a file must be created on each LogScale node that contains this parameter:
security.protocol=SSL
In the LogScale configuration file, add the parameter
EXTRA_KAFKA_CONFIGS_FILE
which points to the name of the
file with this configuration line. Ensure that the
KAFKA_SERVERS
LogScale parameter points to the Kafka
brokers that are using TLS which should be on port 9094.
Once LogScale has been started, visit the administration dashboard to ensure the MSK Kafka Brokers nodes are visible.