Integrating Amazon MSK with ClickHouse
Prerequisites
- You are familiar with the Amazon MSK and Confluent Platform, specifically Kafka Connect. We recommend the Amazon MSK Getting Started guide and MSK Connect guide.
- The MSK broker is publicly accessible. See the Public Access section of the Developer Guide.
- If you wish to allow-list the static IPs for ClickPipes, they can be found here.
The official Kafka connector from ClickHouse with Amazon MSK
Steps
- Create an MSK instance.
- Create and assign IAM role.
- Download a
jar
file from ClickHouse Connect Sink Release page. - Install the downloaded
jar
file on Custom plugin page of Amazon MSK console. - If Connector communicates with a public ClickHouse instance, enable internet access.
- Provide a topic name, ClickHouse instance hostname, and password in config.
connector.class=com.clickhouse.kafka.connect.ClickHouseSinkConnector
tasks.max=1
topics=<topic_name>
ssl=true
security.protocol=SSL
hostname=<hostname>
database=<database_name>
password=<password>
ssl.truststore.location=/tmp/kafka.client.truststore.jks
port=8443
value.converter.schemas.enable=false
value.converter=org.apache.kafka.connect.json.JsonConverter
exactlyOnce=true
username=default
schemas.enable=false
Performance tuning
One way of increasing performance is to adjust the batch size and the number of records that are fetched from Kafka by adding the following to the worker configuration:
consumer.max.poll.records=[NUMBER OF RECORDS]
consumer.max.partition.fetch.bytes=[NUMBER OF RECORDS * RECORD SIZE IN BYTES]
The specific values you use are going to vary, based on desired number of records and record size. For example, the default values are:
consumer.max.poll.records=500
consumer.max.partition.fetch.bytes=1048576
You can find more details (both implementation and other considerations) in the official Kafka and Amazon MSK documentation.