Kafka and Dead Letter Queues Support? Yes and No

In this post, let’s answer the question of Kafka and Dead Letter Quest. But first, let’s start with an overview.

A dead letter queue (DLQ) is a queue, or a topic in Kafka, used to hold messages which can not be processed successfully. The origin of DLQs are traditional messaging systems which were popular before the arrival of Apache Kafka.

In traditional message queue system such as RabbitMQ, Amazon SQS, and others, messages are routed from a sender to a receiver through a queue.

If the receiver is unable to process the message, or if the message cannot be routed to its intended destination, it is placed in the dead letter queue.

The purpose of a dead letter queue is to store messages which cannot be processed, so that they can be examined and potentially resolved at a later time without blocking. Systems implement DLQs in order to continue the processing of messages and not stop because of one or a few bad messages.

The concept of a dead letter queue is also known as the “poison message” pattern.

In this post, let’s examine the concept of dead letter queues with BK ancestry and how to apply to the world AK.

Now, you may have noticed I just used two acronyms BK and AK which stand-for “Before Kafka” and “After Kafka”.

You may be wondering, how or why do I do that? I’m not exactly sure either, but since I’m the big shot boss around here, I can freestyle.

Like Randy the Macho Man savage would say, oooh yeah.

Does Kafka support Dead Letter Queues?
What’s an example of Dead Letter Queue in a Kafka Consumer?

Does Kafka Streams support Dead Letter Queues?
Is there Dead Letter Queue support in Kafka Connect?
Any tradeoffs to consider with Dead Letter Queues and Kafka?

Further Resources

Does Kafka support Dead Letter Queues?

Yes and No. Kafka does support the concept of a dead letter queue, so yes. But, to some the answer could be No, because it needs to be manually implemented in Kafka Producers and Consumers in Kafka. There will be an example of a DLQ in Kafka Consumer below.

Now, there is first class support of Dead Letter Queues in Kafka Connect which will be explored later as well.

What’s an example of Dead Letter Queue in a Kafka Consumer?

The following Kafka Python client shows an example of implementing a DLQ in a Consumer. It is illustrative purposes only and should be easily transferrable to a Producer; i.e. send it to a DLQ in the exception handling block

from kafka import KafkaConsumer

# Connect to the Kafka broker and consume messages from the "my-topic" topic
consumer = KafkaConsumer("my-topic", group_id="my-consumer-group")

# Set the topic to use for dead letter queue
dead_letter_queue_topic = "failed-messages-for-randy-the-macho-man"

for message in consumer:
    try:
        process_message(message)
    except Exception as e:
        # Message cannot be processed, so send it dead letter queue topic
        producer.send(dead_letter_queue_topic, message.value)

Hopefully, this goes without saying the dead_letter_queue_topic has already been created or your Kafka cluster has been configured to allow auto-topic creation.

Also, note this is a simplified example and does not consider if the Exception is retry-able or not. See the DLQ tradeoff question below.

Does Kafka Streams support Dead Letter Queues?

Yes and No. It’s not as convenient as support for DLQs in Kafka Connect which is covered next, but it is possible similar to Producer/Consumer approach.

See KIP-161 which introduced the default.deserialization.exception.handler configuration option.

Is there Dead Letter Queue support in Kafka Connect?

Yes, Kafka Connect does supports the dead letter queue construct. Similar to what has already been described, in Kafka Connect a dead letter queue is also a separate topic. It is used to hold messages that could not be processed or delivered by a connector.

As I’m sure you are already aware, connectors are used to transfer data between Kafka and external systems. Examples of sources and sinks which transfer to and from Kafka are databases, file systems, SaaS APIs, etc.

When a connector is unable to process a message for any reason, the connector can send the troublesome message to a DLQ topic.

Again, the intention is the same. By sending to a DLQ topic, this allows the connector to continue processing. In addition, it provides a mechanism by which an administrator can examine the message at a later time. At this later time, it may be determined why the message could not be processed.

It’s easier to use a dead letter queue in Kafka Connect. You can configure the connector to send failed messages to the designated DLQ topic. The errors.deadletterqueue.topic.name and errors.tolerance configuration properties are used in the connector’s configuration file. For example:

errors.tolerance = all
errors.deadletterqueue.topic.name=failed-messages-for-randy-review

When writing to a DLQ, the connector will write debug messages in the message header to help diagnose the issue. To enable debug aid writtent to message header this set the errors.deadletterqueue.context.headers.enable configuration property to true.

Any tradeoffs to consider with Dead Letter Queues and Kafka?

Yes. We need to consider Kafka retries and how that can affect DLQs. Brokers that are leaders of partitions can go offline. This is expected and accounted for in Kafka. In situations such as this and others, it makes more sense to retry than immediately send to a DLQ.

In Kafka Connect, we can control the retry approach with errors.retry.timeout and errors.retry.delay.max.ms configuration properties. At the time of this writing, the default for errors.retry.timeout config is 0 which means no retries are attempted by default.

Further Resources

For a much deeper dive of error handling in Kafka Connect including DLQs, see this excellent Confluent blog post.

See Kafka Connect tutorials for more on Kafka Connect and Kafka Streams tutorials for more Kafka Streams.