
How do you deploy a Scala program to a Spark Cluster? In this tutorial, we’ll cover how to build, deploy and run a Scala driver program to a Spark Cluster. The focus will be on a simple example in order to gain confidence and set the foundation for more advanced examples in the future. To keep things interesting, we’re going to add some SBT and Sublime 3 editor for fun.
This post assumes Scala and SBT experience, but if not, it’s a chance to gain further understanding of the Scala language and simple build tool (SBT).
Requirements to Deploy to Spark Cluster
- You need to have SBT installed
- Make sure your Spark cluster master and at least one worker is running. Refer to the previous post, Running a local Standalone Spark Cluster if you run into any issues.
Deploy Scala Program to Spark Cluster Steps
1. Create a directory for the project: mkdir sparksample
2. Create some directories for SBT:
cd sparksample mkdir project mkdir src/main/scala
Ok, so you should now be in the sparksample directory and have project/ and src/ dirs.
(3. We’re going to sprinkle this Spark tutorial with using Sublime 3 text editor and SBT plugins. So, this step isn’t necessary for deploying a scala program to a spark cluster. This is an optional step.)
In any text editor, create a plugins.sbt file in projects directory.
Add the sublime plugin according to: Added sublime plugin:https://github.com/orrsella/sbt-sublime)
4. Create a SBT file in root directory. For this tutorial, the root directory is sparksample/. Name the file “sparksample.sbt” with the following content
name := "Spark Sample"
version := "1.0"
scalaVersion := "2.10.3"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.1"
5. Create a file named SparkPi.scala in the src/main/scala directory. Because this is an introductory tutorial, let’s keep things simple and cut-and-paste this code from the Spark samples. The code is:
import scala.math.random
import org.apache.spark._
/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi")
val spark = new SparkContext(conf)
val slices = if (args.length > 0) args(0).toInt else 2
val n = 100000 * slices
val count = spark.parallelize(1 to n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y < 1) 1 else 0
}.reduce(_ + _)
println("Pi is roughly " + 4.0 * count / n)
spark.stop()
}
}
6. Start SBT from a command prompt: sbt
Running sbt may trigger many file downloads of 3rd party library jars. It depends on if you attempted something similar with SBT in the past and whether your local cache already has the files.
(If you want to continue with Sublime example, run the ‘gen-sublime’ command from SBT console and open the Sublime project. In the next step, step 6, you can create the sample Scala code in Sublime.)
7. In SBT console, run ‘package’ to create a jar. The jar will be created in the target/ directory. Note the name of the generated jar; if you follow the previous sparksample.sbt step exactly, the filename will be spark-sample_2.10-1.0.jar
8. Exit SBT, or in a different terminal window, call the “spark-submit” script with the appropriate –master arg value. For example:
../spark-1.6.1-bin-hadoop2.4/bin/spark-submit --class "SparkPi" --master spark://todd-mcgraths-macbook-pro.local:7077 target/scala-2.10/spark-sample_2.10-1.0.jar
So, in this example, it’s safe to presume I have the following directory structure:
parentdir
-spark-1.6.1-bin/hadoop2.4
-sparksample
We can assume this because I’m running ../spark-1.6.0-bin-hadoop2.4/bin/spark-submit from the sparksample directory.
9. You should see output “Pi is roughly…” and if you goto Spark UI, you should see the “Spark Pi” in completed applications:

Conclusion
That’s it. You’ve built, deployed and ran a Scala driver program to Spark Cluster. Simple, I know, but with this experience, you are in good position to move to more complex examples and use cases. Let me know if you have any questions in the comments below.
Screencast
Here’s a screencast of the steps above:
Further Reference
- http://spark.apache.org/docs/latest/submitting-applications.html
There are other tutorials which may be interesting including the tutorial on deploying to Spark Cluster with 3rd party jar dependencies
Also, if you are just getting started with Scala and Spark, check out the Scala for Spark course.
And finally, make sure to bookmark the Spark Tutorials with Scala page for the latest Scala Spark Tutorials.
Thank you so much! Worked like a charm
Hi Todd!
Will you be so kind, to point me a part of documentation, how to and where put config files (property, loggging) for spark-streaming scala application?
I don’t want to hardcode parameters into my classes.
Thank you!