Why Kafka Connect and Why Not?


Apache Kafka Connect is a development framework for data integration between Apache Kafka and other systems. It facilitates moving data between Kafka and other systems, such as databases, message brokers, and file systems. A connector which moves data INTO Kafka is called a “Source”, while a connector which moves data OUT OF Kafka is called a “Sink”.

Much more is covered on the Kafka Connect Tutorials landing page, but in short:

Kafka Connect can be run as a distributed or standalone service.  In either case, it can be started and stopped independently of the Kafka brokers. Kafka Connect works by deploying connectors to transfer data between Kafka and the other systems. These connectors can be either pre-built or custom-developed, depending on your specific integration needs.  They can also be proprietary or free to run open-source which is covered below.

Kafka Connect is designed to be highly scalable, fault-tolerant, and ability to scale-out by running on a cluster of workers. Kafka Connect sinks and source connectors may automatically recover from failures. The framework includes features such as automatic offset management, configurable data transformation, monitoring, and consistent REST API for management. There is support for a wide range of data sources and sinks.

Table of Contents

Why Kafka Connect?

Here are some reasons why you might want to use Kafka Connect:

  1. Scalability: Kafka Connect is designed to scale horizontally by running multiple instances of the connector worker processes in distributed mode. This allows it to handle large volumes of data and to recover from failures without requiring manual intervention.
  2. Reliability: Kafka Connect is designed to be fault-tolerant and to recover from failures automatically. It stores running state in Kafka itself, so if a connector worker process fails, another worker can pick up where it left off by utliziing the existing Consumer Group mechanisms for rebalancing.
  3. Ease of use: Kafka Connect makes it easy to get started with streaming data between systems. It provides a simple REST API for creating and configuring connectors, and it includes a large number of connectors for common systems out of the box.
  4. Flexibility: Kafka Connect can be used to stream data between a wide variety of sources and sinks, including databases, message brokers, and file systems. It is also possible to write custom connectors for systems that are not supported out of the box.
  5. Integration with Kafka: Because Kafka Connect is built on top of Kafka, it is easy to integrate with other Kafka-based tools and systems. This makes it a good choice for building data pipelines that involve multiple systems and tools.
See also  What You Need to Know About Debezium

Why Not Kafka Connect?

There are a few situations in which Kafka Connect may not be the most appropriate solution for data integration:

  1. Complex data transformation: Kafka Connect is primarily designed for moving data between systems, so it may not be well-suited for complex data transformation tasks. While Kafka Connect does support some basic transformations, it is not as powerful as a full-fledged stream processing platform like Apache Flink or Apache Spark.
  2. Limited data sources: Kafka Connect currently supports a limited number of data sources and sinks, so if you need to integrate with a data source or sink that is not currently supported, you may need to look for a different solution or write your own connector.
  3. Cost: Some Kafka Connect Sinks or Sources are proprietary and required a license. Covered more in next section.
  4. Limited Need: Kafka Connect may be overkill if you have a limited number of integrations. If you only need to move data between a few systems and don’t need the scalability and fault tolerance that Kafka Connect provides, you might be better off using a simpler solution like a script or a standalone too.

How to determine if Kafka Connect Sink or Source Connector is proprietary?

There are a few ways you can determine whether a Kafka Connect connector is proprietary or not:

  1. Check the documentation: Many connectors will state in the documentation whether their connector is proprietary or not.
  2. Check the license: The license of a connector will provide clear indication as to whether it is proprietary or not. For example, connectors that are released under open-source licenses such as Apache License 2.0 or GPL are generally not proprietary, while connectors that are released under proprietary licenses such as Confluent Software Evaluation License will eventually require a license.
  3. Check the source code: If you have access to the source code of a connector, you can often determine whether it is proprietary or not by examining the license distributed with the code.
  4. Check the provider: Some connector providers, such as Confluent, offer both proprietary and open-source connectors. In these cases, you can check the provider’s website to see which connectors they offer and whether they are proprietary or not.
See also  Azure Kafka Connect Example - Blob Storage

Further Resources

About Todd M

Todd has held multiple software roles over his 20 year career. For the last 5 years, he has focused on helping organizations move from batch to data streaming. In addition to the free tutorials, he provides consulting, coaching for Data Engineers, Data Scientists, and Data Architects. Feel free to reach out directly or to connect on LinkedIn

Leave a Comment