Kafka Test Data Generation Examples

November 14, 2022April 7, 2020 by Todd M

After you start working with Kafka, you will soon find yourself asking the question, “how can I generate test data into my Kafka cluster?” Well, I’m here to show you have many options for generating test data in Kafka. In this post and demonstration video, we’ll cover a few of the ways you can generate test data into your Kafka cluster.

Now, before we begin, let’s cover a possible edge case. If you are wondering about test data in Kafka Streams applications, you might find my previous post on testing Kafka Streams helpful. Well, also, I might find it helpful if you read it and comment on it too.

With that out of the way, let’s go through a few of your options. I’ll cover ways to generate test data in Kafka from both Apache Kafka and Confluent Platform.

Table of Contents

Kafka Test Data Screencast
Part 1 with Kafkacat
- Here are the steps (more or less) in the above screencast
Test Data with Apache Kafka Connect Options

Kafka Test Data in Confluent Platform
Resources and Helpful References

Kafka Test Data Screencast

Check out the screencast below to see a demo of examples using kafkacat, Kafka Connectors Datagen and Voluble and finally, ksql-datagen

Part 1 with Kafkacat

Our first example utilizes the kafkacat which is freely available at https://github.com/edenhill/kafkacat/

Here are the steps (more or less) in the above screencast

Start Zookeeper and Kafka on localhost
kafkacatis installed and in my path
cat /var/log/system.log | kafkacat -b localhost:9092 -t syslog

kafkacat -b localhost:9092 -t syslog -J
curl -s “http://api.openweathermap.org/data/2.5/weather?q=Minneapolis,USA&APPID=my-key-get-your-own” |\
kafkacat -b localhost:9092 -t minneapolis_weather -P
kafkacat -b localhost:9092 -t minneapolis_weather

Show other fun, good time resources such as Mockeroo and JSON-server

Test Data with Apache Kafka Connect Options

There are a couple of available Kafka Connect source connectors to assist in generating test data into Kafka. There is the Kafka Connect Datagen connector which has been around for a while. The Datagen connector includes two quickstart schemas to ahh, well, you know, get you started quickly. See the Reference section below for the link.

In the screencast, I showed how both connectors are already installed.

Next, run some commands such as

confluent local config datagen-pageviews — -d ./share/confluent-hub-components/confluentinc-kafka-connect-datagen/etc/connector_pageviews.config (your path might be different)
kafkacat -b localhost:9092 -t pageviews

Next, we switched to another option for Kafka Connect based Kafka mock (or stub) data generation is a connector called Voluble. I like how it integrates the Java Faker project which provides support for creating cross-topic relationships such as seen the examples

'genkp.users.with' = '#{Name.full_name}'
'genvp.users.with' = '#{Name.blood_group}'

'genkp.publications.matching' = 'users.key'
'genv.publications.title.with' = '#{Book.title}'

See how the users.keyis referenced in the above example. Anyhow, much more documentation available from Github repo in the link below.

Steps with Voluble

Listed topics kafka-topics --list --bootstrap-server localhost:9092
Then, I loaded using a sample properties file found in my Github repo. See the Resources below.
confluent local load voluble-source -- -d voluble-source.properties (bonus points and a chance to join me on a future Big Time TV Show if you post how to load it in vanilla Kafka in the comments below. )

kafka-topics --list --bootstrap-server localhost:9092
kafkacat -b localhost:9092 -t owners

Kafka Test Data in Confluent Platform

If you are a user of the Confluent Platform, you have an easy button available from the CLI with ksql-datagentool. It has a couple of quickstart schemas to get you rolling quickly as shown in the following screencast

Quickly, let’s run through the following commands

ksql-datagen quickstart=orders format=avro topic=orders maxInterval=100
confluent local consume orders -- --value-format avro --from-beginning

kafkacat -b localhost:9092 -t orders -s avro -r http://localhost:8081

Resources and Helpful References

Read Martin Kleppmann https://martin.kleppmann.com/2015/08/05/kafka-samza-unix-philosophy-distributed-data.html
Mockaroo https://mockaroo.com

JSON-server https://github.com/typicode/json-server
Kafka data generation resources (in no particular order)
- https://www.confluent.io/blog/easy-ways-generate-test-data-kafka/
- https://github.com/confluentinc/kafka-connect-datagen
- https://github.com/MichaelDrogalis/voluble
- Voluble example properties file https://github.com/tmcgrath/kafka-connect-examples/tree/master/voluble
- https://rmoff.net/2018/05/10/quick-n-easy-population-of-realistic-test-data-into-kafka/
New tutorial: Kafka Streaming Test Data for Joins

Featured image credit https://pixabay.com/photos/still-life-bottles-color-838387/

See also Kafka Producer in Scala

About Todd M

Todd has held multiple software roles over his 20 year career. For the last 5 years, he has focused on helping organizations move from batch to data streaming. In addition to the free tutorials, he provides consulting, coaching for Data Engineers, Data Scientists, and Data Architects. Feel free to reach out directly or to connect on LinkedIn

...

Leave a Comment Cancel reply