PySpark Join Examples with DataFrame join function


PySpark joins are used to combine data from two or more DataFrames based on a common field between them.  There are many different types of joins.  The specific join type used is usually based on the business use case as well as most optimal for performance.  Joins can be an expensive operation in distributed systems like Spark as it can often lead to network shuffling.

Join functionality predates Spark.  It is most commonly found in SQL language, so it makes sense to start are exploration with SQL.  The different types of common SQL joins include INNER, LEFT, RIGHT, and FULL.  These types of joins can be achieved in PySpark SQL in two primary ways.  The first way, which will be covered in this tutorial, is through the join DataFrame function.  This post is a deep dive into all the different types of PySpark joins with examples using the joinDataFrame function.  The other approach is to use SQL within PySpark when constructing the JOIN type.  Using SQL for joins will be covered in a separate tutorial.

Note: this tutorial covers DataFrame join functions vs. PySpark SQL Join examples 

Table of Contents

PySpark join Function Overview

Before we begin all the examples, let’s confirm your understanding of a few key points.

First, the type of join is set by sending a string value to the join function.  The available options of join type string values include inner, cross, outer, full, fullouter, full_outer, left, leftouter, left_outer, right, rightouter, right_outer, semi, leftsemi, left_semi, anti, leftanti and left_anti.  The default join type is inner.

No other string value may be used.  All these join types will produce different results as we will see below.

Second, there is some overlap in name types for convenience; i.e. either leftouteror left_outermay be used and will produce the exact same outcome.  Again, this is simply for convenience.

Third, the following table shows the DataFrame join function as well as the SQL equivalent.  The SQL equivalent may be helpful to those new to joins because there are so many free available resources covering them.

The following table shows the available string param for setting join type in the left column matched to the SQL equivalent in the right column.

PySpark Joins Comparison
PySpark Join comparison table

Again, for convenience, any of the options listed in a particular box above may be used and produce the same result.

PySpark join Function Deep Dive

The signature of the joinfunction is:

def join( self, other: "DataFrame", on: Optional[Union[str, List[str], Column, List[Column]]] = None, how: Optional[str] = None, ) -> "DataFrame":

The parameter arguments include:

  • parameter other: the other DataFrame to join
  • parameter on: a string for the join column name
  • parameter how: the previously described type of join such as inner, full, left,etc. and defaults to inner

This will become much clearer with the examples below.

PySpark Join Examples Initial Setup

We will use the PySpark shell and will explore join examples using two small, manually constructed DataFrames.  This will keep things simple, concise, and focus on the different outcomes of join types.  The following DataFrames will be required to complete all the tutorial examples.

products = [
  (1,"Syrup - Golden, Lyles","41-889-0877",4,30.95), \
  (2,"Huck White Towels","10-857-2683",21,11.13), \
  (3,"Pasta - Lasagna Noodle, Frozen","08-151-1046",2,22.72), \
  (4,"Fiddlehead - Frozen","15-125-2352",1,22.66), \
  (5,"Juice - Clamato, 341 Ml","40-753-5219",2,37.72), \
  (6,"Lamb - Racks, Frenched","52-656-0114",2,32.78), \
  (7,"Beer - Alexander Kieths, Pale Ale","79-864-2525",18,20.73), \
  (8,"Oil - Avocado","41-264-0597",4,11.71), \
  (9,"Juice - V8, Tomato","47-401-5889",8,13.05), \
  (10,"Lotus Rootlets - Canned","12-923-5239",5,39.76), \
  (11,"Oats Large Flake","70-628-9900",2,12.57), \
  (12,"Cheese - Brie,danish","33-116-0464",1,5.00), \
  (13,"Bread - Pullman, Sliced","67-046-6746",1,7.52), \
  (14,"Lettuce - Green Leaf","77-181-3088",3,19.16), \
  (15,"Creamers - 10%","81-764-7420",7,36.32) \
]
productColumns = ["id","name","sku","category_id", "current_price" ]

productCategories = [
  (1,"Good Times?"), \
  (2,"Let's Eat!"), \
  (3,"Big Time TV Show Snack"), \
  (4,"Not Vegan"), \
  (5,"More carnivore") \
]
productCategoryColumns = ["id","name" ]

cat = spark.createDataFrame(data=productCategories,schema=productCategoryColumns)

prod = spark.createDataFrame(data=products, schema=productColumns)

cat.show()
prod.show()

This is an example of what the above code looks like in the pyspark shell

$ pyspark

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.1
      /_/

Using Python version 3.9.12 (main, Mar 26 2022 15:51:15)
Spark context Web UI available at http://192.168.1.15:4040
Spark context available as 'sc' (master = local[*], app id = local-1666813863341).
SparkSession available as 'spark'.
>>> products = [
...   (1,"Syrup - Golden, Lyles","41-889-0877",4,30.95), \
...   (2,"Huck White Towels","10-857-2683",21,11.13), \
...   (3,"Pasta - Lasagna Noodle, Frozen","08-151-1046",2,22.72), \
...   (4,"Fiddlehead - Frozen","15-125-2352",1,22.66), \
...   (5,"Juice - Clamato, 341 Ml","40-753-5219",2,37.72), \
...   (6,"Lamb - Racks, Frenched","52-656-0114",2,32.78), \
...   (7,"Beer - Alexander Kieths, Pale Ale","79-864-2525",18,20.73), \
...   (8,"Oil - Avocado","41-264-0597",4,11.71), \
...   (9,"Juice - V8, Tomato","47-401-5889",8,13.05), \
...   (10,"Lotus Rootlets - Canned","12-923-5239",5,39.76), \
...   (11,"Oats Large Flake","70-628-9900",2,12.57), \
...   (12,"Cheese - Brie,danish","33-116-0464",1,5.00), \
...   (13,"Bread - Pullman, Sliced","67-046-6746",1,7.52), \
...   (14,"Lettuce - Green Leaf","77-181-3088",3,19.16), \
...   (15,"Creamers - 10%","81-764-7420",7,36.32) \
... ]
>>> productColumns = ["id","name","sku","category_id", "current_price" ]
>>>
>>> productCategories = [
...   (1,"Good Times?"), \
...   (2,"Let's Eat!"), \
...   (3,"Big Time TV Show Snack"), \
...   (4,"Not Vegan"), \
...   (5,"More carnivore") \
... ]
>>> productCategoryColumns = ["id","name" ]
>>> cat = spark.createDataFrame(data=productCategories,schema=productCategoryColumns)
>>> prod = spark.createDataFrame(data=products, schema=productColumns)
>>>
>>> cat.show()
+---+--------------------+
| id|                name|
+---+--------------------+
|  1|         Good Times?|
|  2|          Let's Eat!|
|  3|Big Time TV Show ...|
|  4|           Not Vegan|
|  5|      More carnivore|
+---+--------------------+

>>> prod.show()
+---+--------------------+-----------+-----------+-------------+
| id|                name|        sku|category_id|current_price|
+---+--------------------+-----------+-----------+-------------+
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|
|  2|   Huck White Towels|10-857-2683|         21|        11.13|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|
|  7|Beer - Alexander ...|79-864-2525|         18|        20.73|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|
|  9|  Juice - V8, Tomato|47-401-5889|          8|        13.05|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|
| 15|      Creamers - 10%|81-764-7420|          7|        36.32|
+---+--------------------+-----------+-----------+-------------+

Ok, some of you might notice in the sample data the products are food items and the categories are not really food categories. It’s true. I made up the categories to have some fun. I can do that because I’m the big time boss man around here.

See also  Mastering PySpark Filter: A Step-by-Step Guide through Examples

PySpark Join Type Examples

Inner Join

An inner join selects rows having matching values in both relations

>>> prod.join(cat, prod.category_id == cat.id, 'inner').show()
+---+--------------------+-----------+-----------+-------------+---+--------------------+
| id|                name|        sku|category_id|current_price| id|                name|
+---+--------------------+-----------+-----------+-------------+---+--------------------+
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|  1|         Good Times?|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|  1|         Good Times?|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|  1|         Good Times?|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|  2|          Let's Eat!|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|  2|          Let's Eat!|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|  2|          Let's Eat!|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|  2|          Let's Eat!|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|  3|Big Time TV Show ...|
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|  4|           Not Vegan|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|  4|           Not Vegan|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|  5|      More carnivore|
+---+--------------------+-----------+-----------+-------------+---+--------------------+

or we can order specific rows in the result set as shown in the following

>>> prod.join(cat, prod.category_id == cat.id, 'inner').orderBy(prod.id).show();
+---+--------------------+-----------+-----------+-------------+---+--------------------+
| id|                name|        sku|category_id|current_price| id|                name|
+---+--------------------+-----------+-----------+-------------+---+--------------------+
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|  4|           Not Vegan|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|  2|          Let's Eat!|
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|  1|         Good Times?|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|  2|          Let's Eat!|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|  2|          Let's Eat!|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|  4|           Not Vegan|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|  5|      More carnivore|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|  2|          Let's Eat!|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|  1|         Good Times?|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|  1|         Good Times?|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|  3|Big Time TV Show ...|
+---+--------------------+-----------+-----------+-------------+---+--------------------+

See how there are now rows for products with IDs of 2, 7, 9 and 15.  That’s because there are no matching categories for the category_id.

See also  PySpark UDFs Demystified: Learn with Step-by-Step Examples

Full Outer Join

A full outer join (or any of the following outer,full,fullouter,full_outer, see table above) returns all values from both relations, appending NULL values on the side that does not have a match.

>>> prod.join(cat, prod.category_id == cat.id, 'outer').orderBy(prod.id).show()
+---+--------------------+-----------+-----------+-------------+----+--------------------+
| id|                name|        sku|category_id|current_price|  id|                name|
+---+--------------------+-----------+-----------+-------------+----+--------------------+
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|   4|           Not Vegan|
|  2|   Huck White Towels|10-857-2683|         21|        11.13|null|                null|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|   2|          Let's Eat!|
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|   1|         Good Times?|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|   2|          Let's Eat!|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|   2|          Let's Eat!|
|  7|Beer - Alexander ...|79-864-2525|         18|        20.73|null|                null|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|   4|           Not Vegan|
|  9|  Juice - V8, Tomato|47-401-5889|          8|        13.05|null|                null|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|   5|      More carnivore|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|   2|          Let's Eat!|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|   1|         Good Times?|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|   1|         Good Times?|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|   3|Big Time TV Show ...|
| 15|      Creamers - 10%|81-764-7420|          7|        36.32|null|                null|
+---+--------------------+-----------+-----------+-------------+----+--------------------+

Notice the “null” values in rows for categories that don’t exist.  Compare that to previous inner.

Left Outer Join | Left Join

A left outer join (or any of the following left, leftouter, left_outer, again, see table above)  returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match.

>>> prod.join(cat, prod.category_id == cat.id, 'left_outer').orderBy(prod.id).show()
+---+--------------------+-----------+-----------+-------------+----+--------------------+
| id|                name|        sku|category_id|current_price|  id|                name|
+---+--------------------+-----------+-----------+-------------+----+--------------------+
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|   4|           Not Vegan|
|  2|   Huck White Towels|10-857-2683|         21|        11.13|null|                null|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|   2|          Let's Eat!|
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|   1|         Good Times?|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|   2|          Let's Eat!|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|   2|          Let's Eat!|
|  7|Beer - Alexander ...|79-864-2525|         18|        20.73|null|                null|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|   4|           Not Vegan|
|  9|  Juice - V8, Tomato|47-401-5889|          8|        13.05|null|                null|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|   5|      More carnivore|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|   2|          Let's Eat!|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|   1|         Good Times?|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|   1|         Good Times?|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|   3|Big Time TV Show ...|
| 15|      Creamers - 10%|81-764-7420|          7|        36.32|null|                null|
+---+--------------------+-----------+-----------+-------------+----+--------------------+

Right Outer Join | Right Join

A right outer join (or any of the following right, rightouter, right_outer) returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match.

>>> prod.join(cat, prod.category_id == cat.id, 'right_outer').orderBy(prod.id).show()
+---+--------------------+-----------+-----------+-------------+---+--------------------+
| id|                name|        sku|category_id|current_price| id|                name|
+---+--------------------+-----------+-----------+-------------+---+--------------------+
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|  4|           Not Vegan|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|  2|          Let's Eat!|
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|  1|         Good Times?|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|  2|          Let's Eat!|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|  2|          Let's Eat!|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|  4|           Not Vegan|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|  5|      More carnivore|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|  2|          Let's Eat!|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|  1|         Good Times?|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|  1|         Good Times?|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|  3|Big Time TV Show ...|
+---+--------------------+-----------+-----------+-------------+---+--------------------+

but compare it with the following where we switch the right and left DataFrames in the join.

>>> cat.join(prod, cat.id == prod.category_id, 'right_outer').orderBy(prod.id).show()
+----+--------------------+---+--------------------+-----------+-----------+-------------+
|  id|                name| id|                name|        sku|category_id|current_price|
+----+--------------------+---+--------------------+-----------+-----------+-------------+
|   4|           Not Vegan|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|
|null|                null|  2|   Huck White Towels|10-857-2683|         21|        11.13|
|   2|          Let's Eat!|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|
|   1|         Good Times?|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|
|   2|          Let's Eat!|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|
|   2|          Let's Eat!|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|
|null|                null|  7|Beer - Alexander ...|79-864-2525|         18|        20.73|
|   4|           Not Vegan|  8|       Oil - Avocado|41-264-0597|          4|        11.71|
|null|                null|  9|  Juice - V8, Tomato|47-401-5889|          8|        13.05|
|   5|      More carnivore| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|
|   2|          Let's Eat!| 11|    Oats Large Flake|70-628-9900|          2|        12.57|
|   1|         Good Times?| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|
|   1|         Good Times?| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|
|   3|Big Time TV Show ...| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|
|null|                null| 15|      Creamers - 10%|81-764-7420|          7|        36.32|
+----+--------------------+---+--------------------+-----------+-----------+-------------+

Cross Join

A cross join returns the Cartesian product of two relations being compared.

>>> prod.join(cat, prod.category_id == cat.id, 'cross').orderBy(prod.id).show()
+---+--------------------+-----------+-----------+-------------+---+--------------------+
| id|                name|        sku|category_id|current_price| id|                name|
+---+--------------------+-----------+-----------+-------------+---+--------------------+
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|  4|           Not Vegan|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|  2|          Let's Eat!|
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|  1|         Good Times?|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|  2|          Let's Eat!|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|  2|          Let's Eat!|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|  4|           Not Vegan|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|  5|      More carnivore|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|  2|          Let's Eat!|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|  1|         Good Times?|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|  1|         Good Times?|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|  3|Big Time TV Show ...|
+---+--------------------+-----------+-----------+-------------+---+--------------------+

>>> cat.join(prod, cat.id == prod.category_id, 'cross').orderBy(prod.id).show()
+---+--------------------+---+--------------------+-----------+-----------+-------------+
| id|                name| id|                name|        sku|category_id|current_price|
+---+--------------------+---+--------------------+-----------+-----------+-------------+
|  4|           Not Vegan|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|
|  2|          Let's Eat!|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|
|  1|         Good Times?|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|
|  2|          Let's Eat!|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|
|  2|          Let's Eat!|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|
|  4|           Not Vegan|  8|       Oil - Avocado|41-264-0597|          4|        11.71|
|  5|      More carnivore| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|
|  2|          Let's Eat!| 11|    Oats Large Flake|70-628-9900|          2|        12.57|
|  1|         Good Times?| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|
|  1|         Good Times?| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|
|  3|Big Time TV Show ...| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|
+---+--------------------+---+--------------------+-----------+-----------+-------------+

Anti Join

An anti join (or any of the following the table above including anti,leftanti,left_anti),  returns values from the left relation that has no match with the right. It is also commonly referred to as a “left anti join”.

>>> prod.join(cat, prod.category_id == cat.id, 'anti').orderBy(prod.id).show()
+---+--------------------+-----------+-----------+-------------+
| id|                name|        sku|category_id|current_price|
+---+--------------------+-----------+-----------+-------------+
|  2|   Huck White Towels|10-857-2683|         21|        11.13|
|  7|Beer - Alexander ...|79-864-2525|         18|        20.73|
|  9|  Juice - V8, Tomato|47-401-5889|          8|        13.05|
| 15|      Creamers - 10%|81-764-7420|          7|        36.32|
+---+--------------------+-----------+-----------+-------------+

It was already mentioned, but it’s worth repeating for emphasis. See how how there are no fields from category DataFrame returned in the results above?

See also  PySpark Read CSV with SQL Examples

Semi Join

A semi join (or any of the following the table above including semi,leftsemi,left_semi) returns values from the left side of the relation that has a match with the right. It is also referred to as a “left semi join”.

>>> prod.join(cat, prod.category_id == cat.id, 'semi').orderBy(prod.id).show()
+---+--------------------+-----------+-----------+-------------+
| id|                name|        sku|category_id|current_price|
+---+--------------------+-----------+-----------+-------------+
|  1|Syrup - Golden, L...|41-889-0877|          4|        30.95|
|  3|Pasta - Lasagna N...|08-151-1046|          2|        22.72|
|  4| Fiddlehead - Frozen|15-125-2352|          1|        22.66|
|  5|Juice - Clamato, ...|40-753-5219|          2|        37.72|
|  6|Lamb - Racks, Fre...|52-656-0114|          2|        32.78|
|  8|       Oil - Avocado|41-264-0597|          4|        11.71|
| 10|Lotus Rootlets - ...|12-923-5239|          5|        39.76|
| 11|    Oats Large Flake|70-628-9900|          2|        12.57|
| 12|Cheese - Brie,danish|33-116-0464|          1|          5.0|
| 13|Bread - Pullman, ...|67-046-6746|          1|         7.52|
| 14|Lettuce - Green Leaf|77-181-3088|          3|        19.16|
+---+--------------------+-----------+-----------+-------------+

 

Did you find these examples helpful?  If so, please consider sharing this page on your social networks such as LinkedIn, Reddit, etc. to help spread the word.

Also, let me know and if there are any examples you would like to see.

Further Resources

About Todd M

Todd has held multiple software roles over his 20 year career. For the last 5 years, he has focused on helping organizations move from batch to data streaming. In addition to the free tutorials, he provides consulting, coaching for Data Engineers, Data Scientists, and Data Architects. Feel free to reach out directly or to connect on LinkedIn

Leave a Comment