Spark Performance Monitoring Tools – A List of Options

Spark performance monitoring tools

Which spark performance monitoring tools are available to monitor the performance of your Spark cluster?  Let’s find out.  Before we address this question, I assume we already know Spark includes monitoring through the Spark UI.  In addition, Spark includes support for monitoring and performance debugging through the History Server as well as support for the Java Metrics library.  But, are there other spark performance monitoring tools available?  In this short post, let’s list a few more options to consider.

Sparklint

https://github.com/groupon/sparklint

Developed at Groupon. Sparklint uses Spark metrics and a custom Spark event listener.  It is easily attached to any Spark job.  It can also run standalone against historical event logs or be configured to use an existing Spark History server.  It presents good looking charts through a web UI for analysis.  It also provides a resource focused view of the application runtime.

Presentation Spark Summit 2017 Presentation on Sparklint

Dr. Elephant

https://github.com/linkedin/dr-elephant

From LinkedIn, Dr. Elephant is a spark performance monitoring tool for Hadoop and Spark. Dr. Elephant gathers metrics, runs analysis on these metrics, and presents them back in a simple way for easy consumption. The goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs.

“It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.”

Presentation: Spark Summit 2017 Presentation on Dr. Elephant

SparkOscope

https://github.com/ibm-research-ireland/sparkoscope

Born from IBM Research in Dublin.  SparkOscope was developed to better understand Spark resource utilization.  One of the reasons SparkOscope was developed to “address the inability to derive temporal associations between system-level metrics (e.g. CPU utilization) and job-level metrics (e.g. stage ID)”. Example: authors were not able to trace back the root cause of a peak in HDFS Reads or CPU usage to the Spark application code. To overcome these limitations, SparkOscope was developed.

SparkOscope extends (augments) the Spark UI and History server.

SparkOscope dependencies include Hyperic Sigar library and HDFS.

Presentation: Spark Summit 2017 Presentation on SparkOscope

History Server

Don’t forget about the Spark History Server.  I wrote up a tutorial on Spark History Server recently.

Metrics

Spark’s support for the Metrics Java library available at http://metrics.dropwizard.io/ is what facilitates many of the Spark Performance monitoring options above.  It also provides a way to integrate with external monitoring tools such as Ganglia and Graphite.  There is a short tutorial on integrating Spark with Graphite presented on this site.

Conclusion

Hopefully, this list of Spark Performance monitoring tools presents you with some options to explore.  Let me know if I missed any other options or if you have any opinions on the options above.  Thank you and good night.

 

Featured image https://flic.kr/p/e4rCVb

Spark Tutorial – Performance Monitoring with History Server

Spark Tutorial Perf Metrics with History Server

In this Apache Spark tutorial, we will explore the performance monitoring benefits when using the Spark History server.  This Spark tutorial will review a simple Spark application without history server and then revisit the same Spark app with the history server.  We will explore all the necessary steps to configure Spark History server for measuring performance metrics.  At the end of this post, there is a screencast of me going through all the tutorial steps.

What is the Spark History Server?

The Spark History server allows us to review Spark application metrics after the application has completed.  Without the History Server, the only way to obtain performance metrics is through the Spark UI while the application is running.  Don’t worry if this doesn’t make sense yet.  I’m going to show you in examples below.

The Spark History server is bundled with Apache Spark distributions by default.  The steps we take to configure and run it in this tutorial should be applicable to various distributions.

Spark Tutorial Overview

In this spark tutorial on performance metrics with Spark History Server, we will run through the following steps:

  1. Run a Spark application without History Server
  2. Review Spark UI
  3. Update Spark configuration to enable History Server
  4. Start History Server
  5. Re-run Spark application
  6. Review Performance Metrics in History Server
  7. Boogie

 

Step 1 Spark App without History

To start, we’re going to run a simple example in a default Spark 2 cluster.  The Spark app example is based on a Spark 2 github repo found here https://github.com/tmcgrath/spark-2.  But the Spark application really doesn’t matter.  It can be anything that we run to show a before and after perspective.

This will give us the before picture.  Or, in other words, this will show what your life is like without the History server. To run, this Spark app, clone the repo and run `sbt assembly` to build the Spark deployable jar.  If you have any questions on how to do this, leave a comment at the bottom of this page.  Again, the screencast below might answer questions you might have as well.

The entire `spark-submit` command I run in this example is:

`spark-submit –class com.supergloo.Skeleton –master spark://tmcgrath-rmbp15.local:7077 ./target/scala-2.11/spark-2-assembly-1.0.jar`

but again, the Spark application doesn’t really matter.

STEP 2 SPARK UI Review

After we run the application, let’s review the Spark UI.  As we will see, the application is listed under completed applications.

spark tutorial completed applications
spark tutorial completed applications

If we click this link, we are unable to review any performance metrics of the application.  Without access to the perf metrics, we won’t be able to establish a performance monitor baseline.  Also, we won’t be able to analyze areas of our code which could be improved.  So, we are left with the option of guessing on how we can improve.  Guessing is not an optimal place to be.  Let’s use the History Server to improve our situation.

Step 3 Update Spark Configuration for History Server

Spark is not configured for history server by default.  We need to make a few changes.  For this tutorial, we’re going to make the minimal amount of changes in order to highlight the history server.  I’ll highlight areas which should be addressed if deploying history server in a production or closer-to-a-production environment.

We’re going to update the conf/spark-defaults.conf in this tutorial.  In a default Spark distro, this file is called spark-defaults.conf.template.  Just copy the template file to a new file called spark-defaults.conf if you have not done so already.

Next, update 3 configuration variables

  • Set `spark.eventLog.enabled` to true
  • Set `spark.eventLog.dir` to a directory **
  • Set `spark.history.fs.logDirectory` to a directory **

 

** In this example, I set the directories to a directory on my local machine.  You will want to set this to a distributed file system (S3, HDFS, DSEFS, etc.) if you are enabling History server outside your local environment.

STEP 4 Start HISTORY SERVER

Consider this the easiest step in the entire tutorial.  All we have to do now is run `start-history-server.sh` from your Spark `sbin` directory.  It should start up in just a few seconds and you can verify by opening a web browser to http://localhost:18080/

The most common error is the events directory not being available.  If you discover any issues during history server startup, verify the events log directory is available.

STEP 5 Rerun the Spark Application

Ok, this should be another easy one.  Let’s just rerun the Spark app from Step 1.  There is no need to rebuild or change how we deployed because we updated default configuration in the spark-defaults.conf file previously.  So now we’re all set, so let’s just re-run it.

STEP 6 Review Spark APplication Performance metrics in HISTORY SERVER

Alright, the moment of truth…. drum roll, please…

Refresh the http://localhost:18080/ and you will see the completed application.  Click around you history-server-running-person-of-the-world you!  You now are able to review the Spark application’s performance metrics even though it has completed.

Step 7 Boogie

That’s right.  Let’s boogie down.  This means, let’s dance and celebrate.  Now, don’t celebrate like you just won the lottery… don’t celebrate that much!  But a little dance and a little celebration can not hurt.  Yell “whoooo hoooo” if you are unable to do a little dance.   If you can’t dance or yell a bit, then I don’t know what to tell you bud.

In any case, as you can now see your Spark History server, you’re now able to review Spark performance metrics of a completed application.  And just in case you forgot, you were not able to do this before.  But now you can.  Slap yourself on the back kid.

Conclusion

I hope this Spark tutorial on performance monitoring with History Server was helpful.  See the screencast below in case you have any questions.  If you still have questions, let me know in the comments section below.

Resources

 

Screencast

Can’t get enough of my Spark tutorials?  Well, if so, the following is a screencast of me running through most of the steps above

Spark Tutorial – Performance Metrics with History Server

 

 

 

 

Spark Performance Monitoring with Metrics, Graphite and Grafana

spark-performance-monitor-with-graphite

Spark is distributed with the Metrics Java library which can greatly enhance your abilities to diagnose issues with your Spark jobs.  In this post, we’ll cover how to configure Metrics to report to a Graphite backend and view the results with Grafana.

Optional, 20 Second Background

If you already know about Metrics, Graphite and Grafana, you can skip this section.  But for those of you that do not, here is some quick background on these tools.

Metrics is described as “Metrics provides a powerful toolkit of ways to measure the behavior of critical components in your production environment”.   Similar to other open source applications, such as Apache Cassandra, Spark is deployed with Metrics support.  In this post, we’re going to configure Metrics to report to a Graphite backend.  Graphite is described as “Graphite is an enterprise-ready monitoring tool that runs equally well on cheap hardware or Cloud infrastructure”.   Finally, we’re going to view metric data collected in Graphite from Grafana which is “the leading tool for querying and visualizing time series and metrics”.

This post is just one approach on how Metrics can be utilized for Spark monitoring.  Metrics is flexible and can be configured to report other options besides Graphite.  Checkout the Metrics docs for more.  Link in Reference section below.

Sample App Requirements

  1. Spark
  2. Cassandra

Overview

We’re going to move quickly.  I assume you already have Spark downloaded and running.  We’re going to configure your Spark environment to use Metrics reporting to a Graphite backend.  We’ll download a sample application to use to collect metrics.  Finally, for illustrative purposes and to keep things moving quickly, we’re going to use a hosted Graphite/Grafana service.  YMMV.  Please adjust accordingly.

Outline

  1. Sign up for Graphite/Grafana service
  2. Configure Metrics
  3. Clone and run sample application with Spark Components
  4. Confirm Graphite and Configure Grafana
  5. Eat, drink, be merry

Let’s do this.

1. Sign up for Graphite/Grafana Service

Sign up for a free trial account at http://hostedgraphite.com.  At the time of this writing, they do NOT require a credit card during sign up.   After signing up/logging in, you’ll be at the “Overview” page where you can retrieve your API Key as shown here

Spark Performance Monitoring

Done.  Movin on.

2. Configure Metrics

Go to your Spark root dir and enter the conf/ directory.  There should be a `metrics.properties.template` file present.  Copy this file to create a new one.  For example on a *nix based machine, `cp metrics.properties.template metrics.properties`.  Open `metrics.properties` in a text editor and do 2 things:

2.1 Uncomment lines at the bottom of the file

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource

worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource

driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

 

2.2 Add the following lines and update the `*.sink.graphite.prefix` with your API Key from the previous step

*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink

*.sink.graphite.host=carbon.hostedgraphite.com

*.sink.graphite.port=2003

*.sink.graphite.period=10

*.sink.graphite.unit=seconds

*.sink.graphite.prefix=<your-api-key-from-previous-step>

Kapow. Done. Moving on.

3. Clone and run sample application

We’re going to use Killrweather for the sample app.  It requires a Cassandra backend.  If you don’t have Cassandra installed yet, do that first.  Don’t complain, it’s simple.

3.1 Clone Killrweather

`git clone https://github.com/killrweather/killrweather.git`

3.2 Switch to `version_upgrade` branch *

`cd killrweather`

`git checkout version_upgrade`

* We’re using the version_upgrade branch because the Streaming portion of the app has been extrapolated into it’s own module.

3.3 Prepare Cassandra

To prepare Cassandra, we run two `cql` scripts within `cqlsh`.  Super easy if you are familiar with Cassandra.  And if not, watch the screencast mentioned in Reference section below to see me go through the steps.  In essence, start `cqlsh` from the killrvideo/data directory and then run

 cqlsh> source 'create-timeseries.cql';
 cqlsh> source 'load-timeseries.cql';

3.4 Start-up app

`sbt run/app`

3.5 Package Streaming Jar to deploy to Spark

`sbt streaming/package`

3.6 Deploy JAR

Example from the killrweather/killrweather-streaming directory: `

~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-submit –master spark://tmcgrath-rmbp15.local:7077 –packages org.apache.spark:spark-streaming-kafka_2.10:1.6.3,datastax:spark-cassandra-connector:1.6.1-s_2.10 –class com.datastax.killrweather.WeatherStreaming –properties-file=conf/application.conf target/scala-2.10/streaming_2.10-1.0.1-SNAPSHOT.jar`

At this point, metrics should be recorded in hostedgraphite.com.  Let’s go there now.

4. Confirm Graphite and Configure Grafana

Let’s go back to hostedgraphite.com and confirm we’re receiving metrics.  There are few ways to do this as shown in the screencast avaialable in the References section of this post.  One way to confirm is to go to Metrics -> Metrics Traffic as shown here:

spark-performance-monitor-with-graphite

 

Once metrics receipt is confirmed, go to Dashboard -> Grafana

spark-performance-monitor

At this point, I believe it will be more efficient to show you examples how to configure Grafana rather than describe it.  Check out this short screencast

Eat, Drink, Be Merry

Seriously.  Do that.  Eat, drink and be merry.  Because, as far as I know, we get one go around.  So, make sure to enjoy the ride when you can.  Hopefully, this ride worked for you and you can celebrate a bit.  And if not, leave questions or comments below.

References

Screencast of key steps from this tutorial

Spark Performance Monitoring with Metrics, Graphite and Grafana

Notes

You can also specify Metrics on a more granular basis during spark-submit; e.g.

~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-submit –master spark://tmcgrath-rmbp15.local:7077 –packages org.apache.spark:spark-streaming-kafka_2.10:1.6.3,datastax:spark-cassandra-connector:1.6.1-s_2.10 –class com.datastax.killrweather.WeatherStreaming –properties-file=conf/application.conf target/scala-2.10/streaming_2.10-1.0.1-SNAPSHOT.jar –conf spark.metrics.conf=metrics.properties –files=~/Development/spark-1.6.3-bin-hadoop2.6/conf/metrics.properties