After you have a Spark cluster running, how do you deploy Python programs to a Spark Cluster?
In this post, we’ll deploy a couple of examples of Spark Python programs. We’ll start with a simple example and then progress to more complicated examples which include utilizing spark-packages and Spark SQL.
Ok, now that we’ve deployed a few examples as shown in the above screencast, let’s review a Python program which utilizes code we’ve already seen in this Spark with Python tutorials on this site. It’s a Python program which analyzes New York City Uber data using Spark SQL. The video will show the program in the Sublime Text editor, but you can use any editor you wish.
When deploying our driver program, we need to do things differently than we have while working with pyspark. For example, we need to obtain a SparkContext and SQLContext. We need to specify Python imports.
bin/spark-submit – master spark://todd-mcgraths-macbook-pro.local:7077 – packages com.databricks:spark-csv_2.10:1.3.0 uberstats.py Uber-Jan-Feb-FOIL.csv
Let’s return to the Spark UI now we have an available worker in the cluster and we have deployed some Python programs.
The Spark UI is the tool for Spark Cluster diagnostics, so we’ll review the key attributes of the tool.
If you find these videos of deploying Python programs to an Apache Spark cluster interesting, you will find the entire Apache Spark with Python Course valuable. Make sure to check it out.
Additional Spark Python Resources
Featured Image credit https://flic.kr/p/bpd8Ht