Spark Performance Monitoring Tools – A List of Options


Which Spark performance monitoring tools are available to monitor the performance of your Spark cluster?  In this tutorial, we’ll find out.  But, before we address this question, I assume you already know Spark includes monitoring through the Spark UI?  And, in addition, you know Spark includes support for monitoring and performance debugging through the Spark History Server as well as Spark support for the Java Metrics library?

But, are there other spark performance monitoring tools available?  In this short post, let’s list a few more options to consider.

Sparklint

https://github.com/groupon/sparklint

Developed at Groupon. Sparklint uses Spark metrics and a custom Spark event listener.  It is easily attached to any Spark job.  It can also run standalone against historical event logs or be configured to use an existing Spark History server.  It presents good looking charts through a web UI for analysis.  It also provides a resource focused view of the application runtime.

Presentation Spark Summit 2017 Presentation on Sparklint

Dr. Elephant

https://github.com/linkedin/dr-elephant

From LinkedIn, Dr. Elephant is a spark performance monitoring tool for Hadoop and Spark. Dr. Elephant gathers metrics, runs analysis on these metrics, and presents them back in a simple way for easy consumption. The goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs.

“It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.”

Presentation: Spark Summit 2017 Presentation on Dr. Elephant

See also  Spark Performance Monitoring with History Server
SparkOscope

https://github.com/ibm-research-ireland/sparkoscope

Born from IBM Research in Dublin.  SparkOscope was developed to better understand Spark resource utilization.  One of the reasons SparkOscope was developed to “address the inability to derive temporal associations between system-level metrics (e.g. CPU utilization) and job-level metrics (e.g. stage ID)”. Example: authors were not able to trace back the root cause of a peak in HDFS Reads or CPU usage to the Spark application code. To overcome these limitations, SparkOscope was developed.

SparkOscope extends (augments) the Spark UI and History server.

SparkOscope dependencies include Hyperic Sigar library and HDFS.

Presentation: Spark Summit 2017 Presentation on SparkOscope

History Server

Don’t forget about the Spark History Server.  As mentioned above, I wrote up a tutorial on Spark History Server recently.

Metrics

Spark’s support for the Metrics Java library available at http://metrics.dropwizard.io/ is what facilitates many of the Spark Performance monitoring options above.  It also provides a way to integrate with external monitoring tools such as Ganglia and Graphite.  There is a short tutorial on integrating Spark with Graphite presented on this site.

Spark Performance Monitoring Tools Conclusion

Hopefully, this list of Spark Performance monitoring tools presents you with some options to explore.  Let me know if I missed any other options or if you have any opinions on the options above.  Thank you and good night.

Check Spark Monitoring section for more tutorials around Spark Performance and debugging.

Featured image https://flic.kr/p/e4rCVb

About Todd M

Todd has held multiple software roles over his 20 year career. For the last 5 years, he has focused on helping organizations move from batch to data streaming. In addition to the free tutorials, he provides consulting, coaching for Data Engineers, Data Scientists, and Data Architects. Feel free to reach out directly or to connect on LinkedIn

1 thought on “Spark Performance Monitoring Tools – A List of Options”

Leave a Comment