Kafka Tutorials

Kafka Tutorial

Kafka TutorialS Overview

This is the Kafka tutorial landing page with brief descriptions and links to specific Kafka tutorials around components such as Kafka Connect, Kafka architecture, Kafka Streams and Kafka monitoring and operations.  We’ll start with a short background on what and why of Kafka.  Then, a list of Kafka tutorials and examples organized by component are provided.

What is Apache Kafka?

Apache Kafka is an open-source, distributed, scalable publish-subscribe messaging system.  The organization responsible for Kafka is the Apache Software Foundation.  The code is written in Scala and was initially developed by the LinkedIn Company.  It was open-sourced in 2011 and became a top-level Apache project.

The project has the intention of providing a unified low-latency platform capable of handling data feeds in real-time. It is becoming more and more valuable for different enterprise infrastructures requiring integration between systems.  Systems wishing to integrate may publish or subscribe to particular Kafka topics based on their needs.     

Kafka was heavily influenced by the construct of transaction logs.  Transaction logs are often overlooked, but essential backbone component of numerous enterprise systems such as databases, fault-tolerant replication, web servers, e-commerce, etc.  Apache Kafka is a massively scalable queue for messages which is constructed like a distributed transaction log.

Why Apache Kafka?

Kafka is designated to work as a real-time distributed log.  A real-time, distributed log is the required, foundational element when implementing streaming architectures.  

Kafka allows the organization of data under particular topics. Data producers write to topics as “publishers”.  Consumers or “subscribers” are configured and programmed to read off topic queues.  

Topic messages are persisted on disk and replicated within the cluster to prevent data loss. Kafka has a cluster-centric design which offers strong durability and fault-tolerance guarantees.

 

Apache Kafka

If you are new to Kafka, start with the following.  Each tutorial is approximately a 3-5 minute read and will present Kafka from a high-level perspective.  In any case, understanding the Kafka principles presented in this section will put you in the best position to proceed if you choose to do so.

 

Apache Kafka Architecture

 

Kafka Tutorials and Examples

 

Kafka Connect

What is Kafka Connect?

Kafka Connect is a framework for Kafka used to interact with external systems such as files, databases, Hadoop clusters, and equivalent cloud-based versions.  It’s an open source component of Apache Kafka.

Kafka Connect is used to move data in and out of Kafka without writing your own Kafka producer and consumer code.

Kafka Connect key concepts include source and sink connectors as well as standalone or distributed execution modes.

Connectors

A Source connector is used to ingest data into Kafka topics while a Sink connector is used to deliver data from Kafka to the desired destination.

Execution Modes

Kafka Connect can be run either standalone isolated process or distributed across multiple workers.

Kafka Connect Tutorials and Examples

 

Kafka Streams

Kafka Streams is a client library used for building applications such as stream processors which move data in or out of Kafka.  There are options for Java or Scala.

Any application (regardless if Java or Scala) which uses the Kafka Streams client library is considered a Kafka Streams application.  You can run a Kafka Streams application as a standalone single instance or across multiple instances.  There are no requirements for resource manager such as YARN.

The logic of a Kafka Streams application is defined through a processor topology which is a graph of stream processors (aka: nodes) and streams (aka: edges).

For a thorough description of Kafka Streams, see What is Kafka Streams.

Kafka Streams Tutorials and Examples

The following Kafka Streams tutorials are intended to be hands-on and include source code, descriptions, and research on the particular subject and usually a screencast to demonstrate how to run the Kafka Streams tutorial examples.

 

Apache Kafka Operations

Monitoring coming soon

 

Comparisons

More coming soon, but to start us off

 

Featured image adapted from https://flic.kr/p/bGR8bZ