Which Spark performance monitoring tools are available to monitor the performance of your Spark cluster? In this tutorial, we’ll find out. But, before we address this question, I assume you already know Spark includes monitoring through the Spark UI? And, in addition, you know Spark includes support for monitoring and performance debugging through the Spark History Server as well as Spark support for the Java Metrics library?
But, are there other spark performance monitoring tools available? In this short post, let’s list a few more options to consider.
Sparklint
https://github.com/groupon/sparklint
Developed at Groupon. Sparklint uses Spark metrics and a custom Spark event listener. It is easily attached to any Spark job. It can also run standalone against historical event logs or be configured to use an existing Spark History server. It presents good looking charts through a web UI for analysis. It also provides a resource focused view of the application runtime.
Presentation Spark Summit 2017 Presentation on Sparklint
Dr. Elephant
https://github.com/linkedin/dr-elephant
From LinkedIn, Dr. Elephant is a spark performance monitoring tool for Hadoop and Spark. Dr. Elephant gathers metrics, runs analysis on these metrics, and presents them back in a simple way for easy consumption. The goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs.
“It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.”
Presentation: Spark Summit 2017 Presentation on Dr. Elephant
SparkOscope
https://github.com/ibm-research-ireland/sparkoscope
Born from IBM Research in Dublin. SparkOscope was developed to better understand Spark resource utilization. One of the reasons SparkOscope was developed to “address the inability to derive temporal associations between system-level metrics (e.g. CPU utilization) and job-level metrics (e.g. stage ID)”. Example: authors were not able to trace back the root cause of a peak in HDFS Reads or CPU usage to the Spark application code. To overcome these limitations, SparkOscope was developed.
SparkOscope extends (augments) the Spark UI and History server.
SparkOscope dependencies include Hyperic Sigar library and HDFS.
Presentation: Spark Summit 2017 Presentation on SparkOscope
History Server
Don’t forget about the Spark History Server. As mentioned above, I wrote up a tutorial on Spark History Server recently.
Metrics
Spark’s support for the Metrics Java library available at http://metrics.dropwizard.io/ is what facilitates many of the Spark Performance monitoring options above. It also provides a way to integrate with external monitoring tools such as Ganglia and Graphite. There is a short tutorial on integrating Spark with Graphite presented on this site.
Spark Performance Monitoring Tools Conclusion
Hopefully, this list of Spark Performance monitoring tools presents you with some options to explore. Let me know if I missed any other options or if you have any opinions on the options above. Thank you and good night.
Check Spark Monitoring section for more tutorials around Spark Performance and debugging.
Featured image https://flic.kr/p/e4rCVb
This is a really useful post. thanks a lot.