Spark Streaming with Scala: Getting Started Guide

Spark Streaming Scala

Spark Streaming enables scalable, fault-tolerant processing of real-time data streams such as Kafka and Kinesis. Spark Streaming is an extension of the core Spark API that provides high-throughput processing of live data streams. Scala is a programming language that is designed to run on the Java Virtual Machine (JVM). It is a statically-typed language that … Read more

Spark Streaming Testing with Scala by Example

Spark Streaming Testing

Stream processing applications built with Apache Spark Streaming provide organizations the ability to ingest and analyze real-time data from sources like Kafka, Kinesis, and more. However, like any complex distributed system, Spark Streaming applications require thorough testing to ensure correct functionality and prevent bugs or errors from causing issues in production. Comprehensive Spark Streaming testing … Read more

Spark Structured Streaming with Kafka Example – Part 1

Spark Structured Streaming with Kafka Examples

In this post, let’s explore an example of updating an existing Spark Streaming application to newer Spark Structured Streaming.  We will start simple and then move to a more advanced Kafka Spark Structured Streaming examples. My original Kafka Spark Streaming post is three years old now.  On the Spark side, the data abstractions have evolved … Read more

Spark Streaming with Kafka Example

Spark Streaming with Kafka

Spark Streaming with Kafka is becoming so common in data pipelines these days, it’s difficult to find one without the other.   This tutorial will present an example of streaming Kafka from Spark.  In this example, we’ll be feeding weather data into Kafka and then processing this data from Spark Streaming in Scala.  As the data … Read more