Kafka Tutorials Overview
This is the Kafka tutorial landing page with brief descriptions and links to specific Kafka tutorials around components such as Kafka Connect, Kafka architecture, Kafka Streams, and Kafka monitoring and operations. We’ll start with a short background on what and why of Kafka. Then, a list of Kafka tutorials and examples organized by the component are provided.
What is Apache Kafka?
Apache Kafka is an open-source, distributed, scalable publish-subscribe messaging system. The organization responsible for Kafka is the Apache Software Foundation. The code is written in Scala and was initially developed by the LinkedIn Company. It was open-sourced in 2011 and became a top-level Apache project.
The project has the intention of providing a unified low-latency platform capable of handling data feeds in real-time. It is becoming more and more valuable for different enterprise infrastructures requiring integration between systems. Systems wishing to integrate may publish or subscribe to particular Kafka topics based on their needs.
Kafka was heavily influenced by the construct of transaction logs. Transaction logs are often overlooked, but essential backbone component of numerous enterprise systems such as databases, fault-tolerant replication, web servers, e-commerce, etc. Apache Kafka is a massively scalable queue for messages which is constructed like a distributed transaction log.
- 1 Kafka Tutorials Overview
- 2 What is Apache Kafka?
- 3 Why Apache Kafka?
- 4 Apache Kafka
- 5 Kafka Tutorials and Examples
- 6 Kafka Connect
- 7 Kafka Streams
- 8 Apache Kafka Operations
- 9 Kafka Testing
- 10 Comparisons
- 11 Certification
Why Apache Kafka?
Kafka is designated to work as a real-time distributed log. A real-time, distributed log is the required, foundational element when implementing streaming architectures.
Kafka allows the organization of data under particular topics. Data producers write to topics as “publishers”. Then, consumers or “subscribers” are configured and programmed to read from one or more topics.
Topic messages are persisted on disk and replicated within the cluster to prevent data loss. Kafka has a cluster-centric design offering strong durability and fault-tolerance guarantees.
If you are new to Kafka, start with the following. Each tutorial is approximately a 3-5 minute read and will present Kafka from a high-level perspective. In any case, understanding the Kafka principles presented in this section will put you in the best position to proceed if you choose to do so.
Apache Kafka Architecture
- Kafka Delivery Guarantees
- Kafka Zookeeper
- Kafka Topic Internals
- Kafka Brokers
Kafka Tutorials and Examples
What is Kafka Connect?
Kafka Connect is a framework for Kafka used to interact with external systems such as files, databases, Hadoop clusters, and equivalent cloud-based versions. It’s an open-source component of Apache Kafka.
Kafka Connect is used to move data in and out of Kafka without writing your own Kafka producer and consumer code.
Kafka Connect key concepts include source and sink connectors as well as standalone or distributed execution modes.
A Source connector is used to ingest data into Kafka topics while a Sink connector is used to deliver data from Kafka to the desired destination.
Kafka Connect can be run either standalone isolated process or distributed across multiple workers.
Kafka Connect Tutorials and Examples
- Kafka Connect mySQL examples of source and sink
- Kafka Connect S3 examples of source and sinks
- GCP Kafka Connect Example Google Cloud Storage (GCS)
- Azure Kafka Connect Blob Storage Examples
Kafka Streams is a client library used for building applications such as stream processors that move data in or out of Kafka. There are options for Java or Scala.
Any application (regardless if Java or Scala) which uses the Kafka Streams client library is considered a Kafka Streams application. You can run a Kafka Streams application as a standalone single instance or across multiple instances. There are no requirements for a resource manager such as YARN.
The logic of a Kafka Streams application is defined through a processor topology which is a graph of stream processors (aka: nodes) and streams (aka: edges).
For a thorough description of Kafka Streams, see What is Kafka Streams.
Kafka Streams Tutorials and Examples in Scala
The following Kafka Streams tutorials are intended to be hands-on and include source code, descriptions, and research on the particular subject and usually a screencast to demonstrate how to run the Kafka Streams tutorial examples.
- Kafka Streams with Scala Example
- Kafka Streams Testing with Scala
- Kafka Streams – Examples of Joins
- Kafka Streams Transformations
- GlobalKTable vs KTable in Kafka Streams
Apache Kafka Operations
Monitoring coming soon
More coming soon, but to start us off
- Kafka vs Kinesis
- Kafka Streams vs Spark Streaming
Featured image adapted from https://flic.kr/p/bGR8bZ