After you start working with Kafka, you will soon find yourself asking the question, “how can I generate test data into my Kafka cluster?” Well, I’m here to show you have many options for generating test data in Kafka. In this post and demonstration video, we’ll cover a few of the ways you can generate test data into your Kafka cluster.
Now, before we begin, let’s cover a possible edge case. If you are wondering about test data in Kafka Streams applications, you might find my previous post on testing Kafka Streams helpful. Well, also, I might find it helpful if you read it and comment on it too.
With that out of the way, let’s go through a few of your options. I’ll cover ways to generate test data in Kafka from both Apache Kafka and Confluent Platform.
Table of Contents
- Kafka Test Data Screencast
- Part 1 with Kafkacat
- Test Data with Apache Kafka Connect Options
- Kafka Test Data in Confluent Platform
- Resources and Helpful References
Kafka Test Data Screencast
Check out the screencast below to see a demo of examples using kafkacat
, Kafka Connectors Datagen and Voluble and finally, ksql-datagen
Part 1 with Kafkacat
Our first example utilizes the kafkacat
which is freely available at https://github.com/edenhill/kafkacat/
Here are the steps (more or less) in the above screencast
- Start Zookeeper and Kafka on localhost
kafkacat
is installed and in my pathcat /var/log/system.log | kafkacat -b localhost:9092 -t syslog
kafkacat -b localhost:9092 -t syslog -J
- curl -s “http://api.openweathermap.org/data/2.5/weather?q=Minneapolis,USA&APPID=my-key-get-your-own” |\
kafkacat -b localhost:9092 -t minneapolis_weather -P - kafkacat -b localhost:9092 -t minneapolis_weather
- Show other fun, good time resources such as Mockeroo and JSON-server
Test Data with Apache Kafka Connect Options
There are a couple of available Kafka Connect source connectors to assist in generating test data into Kafka. There is the Kafka Connect Datagen connector which has been around for a while. The Datagen connector includes two quickstart schemas to ahh, well, you know, get you started quickly. See the Reference section below for the link.
In the screencast, I showed how both connectors are already installed.
Next, run some commands such as
- confluent local config datagen-pageviews — -d ./share/confluent-hub-components/confluentinc-kafka-connect-datagen/etc/connector_pageviews.config (your path might be different)
- kafkacat -b localhost:9092 -t pageviews
Next, we switched to another option for Kafka Connect based Kafka mock (or stub) data generation is a connector called Voluble. I like how it integrates the Java Faker project which provides support for creating cross-topic relationships such as seen the examples
'genkp.users.with' = '#{Name.full_name}'
'genvp.users.with' = '#{Name.blood_group}'
'genkp.publications.matching' = 'users.key'
'genv.publications.title.with' = '#{Book.title}'
See how the users.key
is referenced in the above example. Anyhow, much more documentation available from Github repo in the link below.
Steps with Voluble
- Listed topics
kafka-topics --list --bootstrap-server localhost:9092
- Then, I loaded using a sample properties file found in my Github repo. See the Resources below.
confluent local load voluble-source -- -d voluble-source.properties
(bonus points and a chance to join me on a future Big Time TV Show if you post how to load it in vanilla Kafka in the comments below. )kafka-topics --list --bootstrap-server localhost:9092
kafkacat -b localhost:9092 -t owners
Kafka Test Data in Confluent Platform
If you are a user of the Confluent Platform, you have an easy button available from the CLI with ksql-datagen
tool. It has a couple of quickstart schemas to get you rolling quickly as shown in the following screencast
Quickly, let’s run through the following commands
ksql-datagen quickstart=orders format=avro topic=orders maxInterval=100
confluent local consume orders -- --value-format avro --from-beginning
kafkacat -b localhost:9092 -t orders -s avro -r http://localhost:8081
Resources and Helpful References
- Read Martin Kleppmann https://martin.kleppmann.com/2015/08/05/kafka-samza-unix-philosophy-distributed-data.html
- Mockaroo https://mockaroo.com
- JSON-server https://github.com/typicode/json-server
- Kafka data generation resources (in no particular order)
- https://www.confluent.io/blog/easy-ways-generate-test-data-kafka/
- https://github.com/confluentinc/kafka-connect-datagen
- https://github.com/MichaelDrogalis/voluble
- Voluble example properties file https://github.com/tmcgrath/kafka-connect-examples/tree/master/voluble
- https://rmoff.net/2018/05/10/quick-n-easy-population-of-realistic-test-data-into-kafka/
- New tutorial: Kafka Streaming Test Data for Joins
Featured image credit https://pixabay.com/photos/still-life-bottles-color-838387/