Spark SQL MySQL Example with JDBC

Spark SQL mySQL JDBC

In this tutorial, we will cover using Spark SQL with a mySQL database.

Overview

Let’s show examples of using Spark SQL mySQL.  We’re going to use mySQL with Spark in this tutorial, but you can apply the concepts presented here to any relational database which has a JDBC driver.

By the way, If you are not familiar with Spark SQL, there are a few Spark SQL tutorials on this site.

Requirements

1. MySQL instance

2. MySQL JDBC driver (download available https://dev.mysql.com/downloads/connector/j/)

3. Previously used baby_names.csv file as source data.

4. Apache Spark

Quick Setup

The Spark SQL with MySQL JDBC example assumes a mysql db named “sparksql” with table called “baby_names”.  The “baby_names” table has been populated with the baby_names.csv data used in previous Spark tutorials.

Here’s a screencast on YouTube of how I set up my environment:

mysql setup for Spark SQL with MySQL (JDBC) examples

The SQL to create the baby_names table:

DROP TABLE IF EXISTS `baby_names`;

CREATE TABLE `baby_names` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `year` int(11) DEFAULT NULL,
  `first_name` varchar(100) DEFAULT NULL,
  `county` varchar(100) DEFAULT NULL,
  `sex` varchar(5) DEFAULT NULL,
  `count` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

If you have any questions about the environment setup, leave comments on this post.

Methodology

We need to pass in the mySQL JDBC driver jar when we start up the Spark Shell.  (In a Spark application, any third party libs such as a JDBC driver would be included in package.)

From Spark shell we’re going to establish a connection to the mySQL db and then run some queries via Spark SQL.  This is a getting started with Spark mySQL example.  To build and deploy and Spark application with mySQL JDBC driver you may wish to check out the Spark cluster deploy with extra jars tutorial.

Spark SQL with MySQL (JDBC) Example Tutorial

1. Start the spark shell with –jars argument

$SPARK_HOME/bin/sparkshell  –jars mysql-connector-java-5.1.26.jar

This example assumes the mySQL connector JDBC jar file is located in the same directory as where you are calling spark-shell.  If it is not, you can specify the path location such as:

$SPARK_HOME/bin/sparkshell  –jars /home/example/jars/mysql-connector-java-5.1.26.jar

2. Once the Spark shell is running, let’s establish a connection to mySQL db and read the baby_names table:

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
...
SQL context available as sqlContext.

scala> val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost/sparksql").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "baby_names").option("user", "root").option("password", "root").load()

Change the mySQL url and user/password values in the above code appropriate to your environment.

3. Let’s confirm the dataframe by show the schema of the table

scala> dataframe_mysql.show

3. Register the data as a temp table for future SQL queries

scala> dataframe_mysql.registerTempTable("names")

4. We are now in a position to run some SQL such as

scala> dataframe_mysql.sqlContext.sql("select * from names").collect.foreach(println)

Conclusion Spark SQL with MySQL (JDBC)

This example was designed to get you up and running with Spark SQL and mySQL or any JDBC compliant database quickly.  What other examples would you like to see with Spark SQL and JDBC?  Please leave ideas or questions in comments below.

For more Spark tutorials, check out Spark SQL with Scala

And to keep up with changes in Spark SQL, especially DataFrame vs DataSet, check the Apache Spark SQL documentation from time-to-time.

Featured Image credit: https://flic.kr/p/f8KB7L

20 thoughts on “Spark SQL MySQL Example with JDBC

    1. Maybe I’m not understanding your question entirely, but you could write results back to mySQL outside of data frames of Spark SQL.

  1. There is a problem when i try to do this on huge table.
    although I use the right statistic for num of partitions and lower and upper bounds I am dying because
    This connector generate 1 statement of ` select * from table ` instead of splitting it like it should. any idea?

  2. I am trying to make my password secure using –password-file option but didn’t get success .can I do the same in spark for a secure access to database?

  3. hi

    Im trying to execute spark program with my sql . Its running fine withing Eclipse Ide but not able to submit the jar file . It is saying java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.load()Lorg/apache/spark/sql/DataFrame;
    I even tried changing pom.xml with different versions but of no use
    Could you help me out in resolving this

  4. I am getting the error “java.lang.ClassNotFoundException: com.mysql.jdbc.Driver”
    Although I have downloaded required jar and I am running spark shell as below.
    ./bin/spark-shell –jars mysql-connector-java-5.1.39-bin.jar

    Can you please help ?

  5. Brother You Rock!!!!! Truly.. You saved my time and you made it look so simple to implement. . 🙂

    Just 1 minor update is required in the article..

    spark-shell –jars –> This has double dash. In your article, it looks like single dash..

    But Thanks again. Keep posting 🙂

  6. scala> val dataframe_mysql = sqlContext.read.format(“jdbc”).option(“url”, “jdbc:mysql://localhost/sparksql”).option(“driver”, “com.mysql.jdbc.Driver”).option(“dbtable”, “baby_names”).option(“user”, “root”).option(“password”, “root”).load()
    executing above statement throws an error => :1: error: unclosed string literal

  7. Error:
    Caused by: java.net.ConnectException: Call From host to localhost:9000 failed on connection exception:
    I do not understand why it is pointing to hdfs location here ?

Leave a Reply

Your email address will not be published. Required fields are marked *