
In this tutorial, we will cover using Spark SQL with a mySQL database.
Overview
Let’s show examples of using Spark SQL mySQL. We’re going to use mySQL with Spark in this tutorial, but you can apply the concepts presented here to any relational database which has a JDBC driver.
By the way, If you are not familiar with Spark SQL, there are a few Spark SQL tutorials on this site.
Requirements
1. MySQL instance
2. MySQL JDBC driver (download available https://dev.mysql.com/downloads/connector/j/)
3. Previously used baby_names.csv file as source data.
4. Apache Spark
Quick Setup
The Spark SQL with MySQL JDBC example assumes a mysql db named “sparksql” with table called “baby_names”. The “baby_names” table has been populated with the baby_names.csv data used in previous Spark tutorials.
Here’s a screencast on YouTube of how I set up my environment:
The SQL to create the baby_names table:
DROP TABLE IF EXISTS `baby_names`;
CREATE TABLE `baby_names` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`year` int(11) DEFAULT NULL,
`first_name` varchar(100) DEFAULT NULL,
`county` varchar(100) DEFAULT NULL,
`sex` varchar(5) DEFAULT NULL,
`count` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
If you have any questions about the environment setup, leave comments on this post.
Methodology
We need to pass in the mySQL JDBC driver jar when we start up the Spark Shell. (In a Spark application, any third party libs such as a JDBC driver would be included in package.)
From Spark shell we’re going to establish a connection to the mySQL db and then run some queries via Spark SQL. This is a getting started with Spark mySQL example. To build and deploy and Spark application with mySQL JDBC driver you may wish to check out the Spark cluster deploy with extra jars tutorial.
Spark SQL with MySQL (JDBC) Example Tutorial
1. Start the spark shell with –jars argument
$SPARK_HOME/bin/spark–shell –jars mysql-connector-java-5.1.26.jar
This example assumes the mySQL connector JDBC jar file is located in the same directory as where you are calling spark-shell. If it is not, you can specify the path location such as:
$SPARK_HOME/bin/spark–shell –jars /home/example/jars/mysql-connector-java-5.1.26.jar
2. Once the Spark shell is running, let’s establish a connection to mySQL db and read the baby_names table:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
...
SQL context available as sqlContext.
scala> val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost/sparksql").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "baby_names").option("user", "root").option("password", "root").load()
Change the mySQL url and user/password values in the above code appropriate to your environment.
3. Let’s confirm the dataframe by show the schema of the table
scala> dataframe_mysql.show
3. Register the data as a temp table for future SQL queries
scala> dataframe_mysql.registerTempTable("names")
4. We are now in a position to run some SQL such as
scala> dataframe_mysql.sqlContext.sql("select * from names").collect.foreach(println)
Conclusion Spark SQL with MySQL (JDBC)
This example was designed to get you up and running with Spark SQL and mySQL or any JDBC compliant database quickly. What other examples would you like to see with Spark SQL and JDBC? Please leave ideas or questions in comments below.
For more Spark tutorials, check out Spark SQL with Scala
And to keep up with changes in Spark SQL, especially DataFrame vs DataSet, check the Apache Spark SQL documentation from time-to-time.
Featured Image credit: https://flic.kr/p/f8KB7L
its showing me error that value not found sqlContext
what version of spark? do you see the sqlContext when you first startup the shell?
For me also it is giving the same error!
2.0.2 veriosn. I am using
In Spark 2, you need to `SparkSession`. See https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html for more info. Hope this helps.
How could we store the queried data to a MYSQL DB?
Maybe I’m not understanding your question entirely, but you could write results back to mySQL outside of data frames of Spark SQL.
Can you please show how to write data (e.g. insert into statement) into mysql?
There is a problem when i try to do this on huge table.
although I use the right statistic for num of partitions and lower and upper bounds I am dying because
This connector generate 1 statement of ` select * from table ` instead of splitting it like it should. any idea?
I am trying to make my password secure using –password-file option but didn’t get success .can I do the same in spark for a secure access to database?
Supporting the comment above. Could you please show how to write data? Thanks in advance.
hi
Im trying to execute spark program with my sql . Its running fine withing Eclipse Ide but not able to submit the jar file . It is saying java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.load()Lorg/apache/spark/sql/DataFrame;
I even tried changing pom.xml with different versions but of no use
Could you help me out in resolving this
I am getting the error “java.lang.ClassNotFoundException: com.mysql.jdbc.Driver”
Although I have downloaded required jar and I am running spark shell as below.
./bin/spark-shell –jars mysql-connector-java-5.1.39-bin.jar
Can you please help ?
Is `mysql-connector-java-5.1.39-bin.jar` in the same directory as where you are running `./bin/spark-shell`?
I want Kafka fetching data from any RDBMS i.e. MySQL and storing in hive/hbase
Brother You Rock!!!!! Truly.. You saved my time and you made it look so simple to implement. . 🙂
Just 1 minor update is required in the article..
spark-shell –jars –> This has double dash. In your article, it looks like single dash..
But Thanks again. Keep posting 🙂
scala> val dataframe_mysql = sqlContext.read.format(“jdbc”).option(“url”, “jdbc:mysql://localhost/sparksql”).option(“driver”, “com.mysql.jdbc.Driver”).option(“dbtable”, “baby_names”).option(“user”, “root”).option(“password”, “root”).load()
executing above statement throws an error => :1: error: unclosed string literal
It seems to me your double quotes are not standard 1s.. Check the image. https://uploads.disquscdn.com/images/3f7b65174c9a8227a6e6fc91bada50effc5945afb0d9c09b635071dbbc408abb.png
🙂 Got it Thank You
Hi Todd, do you have an example for the reverse scenario? From Spark to Mysql Database?
Error:
Caused by: java.net.ConnectException: Call From host to localhost:9000 failed on connection exception:
I do not understand why it is pointing to hdfs location here ?