Kafka Test Data Generation Examples


After you start working with Kafka, you will soon find yourself asking the question, “how can I generate test data into my Kafka cluster?”  Well, I’m here to show you have many options for generating test data in Kafka.  In this post and demonstration video, we’ll cover a few of the ways you can generate test data into your Kafka cluster.

Now, before we begin, let’s cover a possible edge case.  If you are wondering about test data in Kafka Streams applications, you might find my previous post on testing Kafka Streams helpful. Well, also, I might find it helpful if you read it and comment on it too.

With that out of the way, let’s go through a few of your options.  I’ll cover ways to generate test data in Kafka from both Apache Kafka and Confluent Platform.

Table of Contents

Kafka Test Data Screencast

Check out the screencast below to see a demo of examples using kafkacat, Kafka Connectors Datagen and Voluble and finally, ksql-datagen

Part 1 with Kafkacat

Our first example utilizes the kafkacat which is freely available at https://github.com/edenhill/kafkacat/

Here are the steps (more or less) in the above screencast

  1. Start Zookeeper and Kafka on localhost
  2. kafkacatis installed and in my path
  3. cat /var/log/system.log | kafkacat -b localhost:9092 -t syslog
  4. kafkacat -b localhost:9092 -t syslog -J
  5. curl -s “http://api.openweathermap.org/data/2.5/weather?q=Minneapolis,USA&APPID=my-key-get-your-own” |\
    kafkacat -b localhost:9092 -t minneapolis_weather -P
  6. kafkacat -b localhost:9092 -t minneapolis_weather
  7. Show other fun, good time resources such as Mockeroo and JSON-server

Test Data with Apache Kafka Connect Options

There are a couple of available Kafka Connect source connectors to assist in generating test data into Kafka.   There is the Kafka Connect Datagen connector which has been around for a while.  The Datagen connector includes two quickstart schemas to ahh, well, you know, get you started quickly.  See the Reference section below for the link.

In the screencast, I showed how both connectors are already installed.

Next, run some commands such as

  1. confluent local config datagen-pageviews — -d ./share/confluent-hub-components/confluentinc-kafka-connect-datagen/etc/connector_pageviews.config (your path might be different)
  2. kafkacat -b localhost:9092 -t pageviews

Next, we switched to another option for Kafka Connect based Kafka mock (or stub) data generation is a connector called Voluble.  I like how it integrates the Java Faker project which provides support for creating cross-topic relationships such as seen the examples

'genkp.users.with' = '#{Name.full_name}'
'genvp.users.with' = '#{Name.blood_group}'

'genkp.publications.matching' = 'users.key'
'genv.publications.title.with' = '#{Book.title}'

See how the users.keyis referenced in the above example.  Anyhow, much more documentation available from Github repo in the link below.

Steps with Voluble

  1. Listed topics kafka-topics --list --bootstrap-server localhost:9092
  2. Then, I loaded using a sample properties file found in my Github repo.  See the Resources below.
  3. confluent local load voluble-source -- -d voluble-source.properties (bonus points and a chance to join me on a future Big Time TV Show if you post how to load it in vanilla Kafka in the comments below.  )
  4. kafka-topics --list --bootstrap-server localhost:9092
  5. kafkacat -b localhost:9092 -t owners

Kafka Test Data in Confluent Platform

If you are a user of the Confluent Platform, you have an easy button available from the CLI with ksql-datagentool.  It has a couple of quickstart schemas to get you rolling quickly as shown in the following screencast

Quickly, let’s run through the following commands

  1. ksql-datagen quickstart=orders format=avro topic=orders maxInterval=100
  2. confluent local consume orders -- --value-format avro --from-beginning
  3. kafkacat -b localhost:9092 -t orders -s avro -r http://localhost:8081

Resources and Helpful References

Featured image credit https://pixabay.com/photos/still-life-bottles-color-838387/

See also  Apache Kafka Architecture - Delivery Guarantees
About Todd M

Todd has held multiple software roles over his 20 year career. For the last 5 years, he has focused on helping organizations move from batch to data streaming. In addition to the free tutorials, he provides consulting, coaching for Data Engineers, Data Scientists, and Data Architects. Feel free to reach out directly or to connect on LinkedIn

Leave a Comment