Spark Streaming with Scala

Spark Streaming with Scala

Let’s start Apache Spark Streaming with Scala with small steps to build up our skills and confidence.  These small steps will create the forward momentum needed when learning new skills.  The quickest way to gain confidence and momentum in learning new software development skills is executing code that performs without error.  Right?  I mean, right!?  This is pure software psychology here.  Dropping pearls of wisdom here folks, pearls I tell you, pearls.

In this post, we’re going to set up and run Apache Spark Streaming with Scala code.  Then, we should be confident in taking the next step to Part 2 of learning Apache Spark Streaming.

Before we begin though, I assume you already have a high-level understanding of Apache Spark Streaming at this point, but if not, check out the Spark Streaming tutorials or Spark Streaming with Scala section of this site.

Spark Streaming with Scala Overview

Spark comes with some great examples and convenient scripts for running Streaming code.  Let’s make sure you can run these examples.  In case it helps, I made a screencast of me running through these steps.  Link to the screencast below.

Running the NetworkWordCount example out-of-the-box

  1. Open a shell or command prompt on Windows and go to your Spark root directory.
  2. Start Spark Master:  sbin/  **
  3. Start a Worker: sbin/ spark://todd-mcgraths-macbook-pro.local:7077
  4. Start netcat on port 9999: nc -lk 9999  (*** Windows users:  Let me know in page comments if this works well on Windows)
  5. Run network word count using handy run-example script: bin/run-example streaming.NetworkWordCount localhost 9999

** Windows users, please adjust accordingly; i.e. sbin/start-master.cmd instead of sbin/

See also  Spark Kinesis Example - Moving Beyond Word Count

Here’s a screencast of me running these steps

Making and Running Our Own NetworkWordCount

Ok, that’s good.  We’ve succeeded in running the Scala Spark Streaming NetworkWordCount example, but what about running our own Spark Streaming program in Scala?  Let’s take another step towards that goal.  In this step, we’re going to setup our own Scala/SBT project, compile, package and deploy a modified NetworkWordCount.  Again, I made a screencast of the following steps with a link to the screencast below.

  1. Choose or create a new directory for a new Spark Streaming Scala project.
  2. Make dirs to make things convenient for SBT: src/main/scala
  3. Create Scala object code file called NetworkWordCount.scala in src/main/scala directory
  4. Copy-and-paste NetworkWordCount.scala code from Spark examples directory to your version created in the previous step
  5. Remove or comment out package and StreamingExamples references
  6. Change AppName to “MyNetworkWordCount”
  7. Create a build.sbt file (source code below)
  8. sbt compile to smoke test
  9. Deploy: ~/Development/spark-1.5.1-bin-hadoop2.4/bin/spark-submit –class “NetworkWordCount” –master spark://todd-mcgraths-macbook-pro.local:7077 target/scala-2.11/streaming-example_2.11-1.0.jar localhost 9999
  10. Start netcat on port 9999: nc -lk 9999  and start typing
  11. Check things out in the Spark UI

build.sbt source

name := "streaming-example"

version := "1.0"

scalaVersion := "2.11.4"

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.5.1",
    "org.apache.spark" %% "spark-streaming" % "1.5.1"

If you watched the video, notice this has been corrected to “streaming-example” and not “steaming-example” 🙂

Spark Streaming With Scala Part 1 Conclusion

At this point, I hope you were successful in running both Spark Streaming examples in Scala.  If so, you should be more confident when we continue to explore Spark Streaming in Part 2.   If you have any questions, feel free to add comments below.

See also  Spark Structured Streaming with Kafka Example - Part 1

You may also find the following landing page helpful for more information on Spark and Spark with Scala and Python.

Spark Tutorial

Featured image credit

Leave a Reply

Your email address will not be published. Required fields are marked *