How to Debug Scala Spark in IntelliJ


Have you struggled to configure debugging in IntelliJ for your Spark programs?  Yeah, me too.  Debugging with Scala code was easy, but when I moved to Spark things didn’t work as expected.  So, in this tutorial, let’s cover debugging Scala based Spark programs in IntelliJ tutorial.  We’ll go through a few examples and utilize the occasional help of SBT.  These instructions and screencasts will hopefully allow you to start debugging Spark apps in IntelliJ and help me remember in the future.

We’ll break the topic of debugging Scala based Spark programs into two sections:

  1. Local Spark Debugging
  2. Remote Spark Debugging

As you’ll see in this tutorial there a few different options to choose from which depend on your Scala debug needs as well as if you are wishing to debug Spark in standalone mode vs debugging Spark job running on your cluster.  It is assumed that you are already familiar with the concepts and value of debugging your Spark Scala code, but we’ll quickly review a few key concepts before diving into the screencast examples.

Scala Breakpoints and Debuggers

First, let’s cover some background.  What’s the difference between Breakpoints and Debuggers?

  • breakpoint is a marker that you can set to specify where execution should pause when you are running your application
  • Breakpoints are stored in IntelliJ (not in your application’s code)

We’ll go through a few examples in this Scala Spark Debugging tutorial, but first, let’s get the requirements out of the way.

Debug Scala Spark Requirements

Portions of this tutorial require SBT and I’ve provided a sample project available for you to pull from Github below.  See Resources section below.

See also  Spark Broadcast and Accumulators by Examples

And here’s a shocker, another requirement is IntelliJ.  I know, I know, I caught you off guard their right?

The ultimate goal will be to use your own code and environment.  I know.  Totally get it.  Pull the aforementioned repo if you want to take a slow approach to debug Scala in Spark.  But, feel free to translate the following to your own code right away as well.  Hey man, I’m not telling you what to do.  You do what you do.  Ain’t no stoppin you now. Sing it with me. 🙂

Oh, nearly forgot, one more thing.  If you are entirely new to using IntelliJ for building Scala based Spark apps, you might wish to check out my previous tutorial on Scala Spark IntelliJ.  I’ll mention it in the resource section below as well.

Scala Breakpoints in IntelliJ Debugger Example

Let’s go through some examples.  Depending on your version of IntelliJ, you may not have the second option as I mentioned in the screencast.  But the first one has been working for me for a couple of years.  Don’t just stand there, let’s get to it.

Spark Debug Breakpoints in Scala Config Highlights

In the screencast above, there are two options covered.  One or both may work in your environment.  In part 1, we utilize the SBT configuration of the intelliJRunner seen in the `build.sbt` file

lazy val intellijRunner = project.in(file("intellijRunner")).dependsOn(RootProject(file("."))).settings(
  scalaVersion := "2.11.11",
  libraryDependencies ++= sparkDependencies.map(_ % "compile")
).disablePlugins(sbtassembly.AssemblyPlugin)

I showed how to use this `val` from within IntelliJ as a classpath module for one option of debugging Scala based Spark code in IntelliJ

See also  Begin Apache Spark Transformations in Scala [15 Examples]

Next, I showed a checkbox option from within IntelliJ itself for including the “Provided” scope.  As I mentioned on the screencast, this second option was new to me.  Must be in newer versions of the plugin or IntelliJ.  It wasn’t an option when I first started debugging Spark in IntelliJ.

Let me know in the comments below if you run into issues.  Again, there is a sample project to download from Github in the Resources section below.

Remote Spark Debug example

In this next screencast, I show how to set up remote debugging of Scala-based Spark code from IntelliJ.  In other words, how do you run remotely deployed Scala programs in the debugger?  Is this even an option?  Well, yes it is an option.  Now, “remotely” might be your Spark code running on a cluster in your cloud environment.  Or, it might be Spark code that has been deployed to a cluster running on the same machine as IntelliJ.  Either of these scenarios will apply.  You just need to modify the variables for your own situation.

In this screencast, I’ll show you the concepts and a working example.  Now, your environment might vary for security access, hostnames, etc. so try to stay focused on the key concepts shown in this Remote Debugging of Scala programs in IntelliJ

Remote Spark DEBUG Configuration Notes

As you saw, the key in this example is setting the `SPARK_SUBMIT_OPTS` variable–

export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

Next, we configured IntelliJ for a remote debugger based on the`SPARK_SUBMIT_OPTS` values such as `address`.

Hope this helps!  Let me know if you have any questions, suggestions for improvement or any free beer and tacos in the comments below.

See also  IntelliJ Scala and Apache Spark Happy Together

Further Resources

Image credit https://pixabay.com/en/virus-microscope-infection-illness-1812092/

About Todd M

Todd has held multiple software roles over his 20 year career. For the last 5 years, he has focused on helping organizations move from batch to data streaming. In addition to the free tutorials, he provides consulting, coaching for Data Engineers, Data Scientists, and Data Architects. Feel free to reach out directly or to connect on LinkedIn

Leave a Comment