Join Methods

In addition to the code snippets included below, all of the Groovy and Python code for the recipes below can be downloaded as premade Deephaven Notebooks. To learn more about using Deephaven Notebooks, please refer to the Notebook section of the User Manual.

Download the Query Cookbook Notebooks

Exact Join And Natural Join

This recipe demonstrates how to join data from different tables using the exactJoin and naturalJoin methods.  The following methods are demonstrated in this recipe:

  • exactJoin
  • tail
  • naturalJoin

Step 1:  Access the namespace and table, pull trade activity data from a certain date, and remove some columns and rows.

trades = db.t("LearnDeephaven", "StockTrades")
     .where("Date=`2017-08-25`")
     .view("Sym", "Last", "Size")
     .headBy(3, "Sym")

Step 2: Create a new table named summary, which will hold the average price and total shares traded for each symbol.

summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")

Step 3: Start the process by using the exactJoin method to join columns from the summary table with the trades table.

ej = trades.exactJoin(summary, "Sym", "AvgPrice, TotalShares")

Every row in the new ej table now has the average price and total shares for their corresponding symbol. Let's dig a bit deeper.

Step 4: For a variation, drop the first symbol from summary and try the exactJoin again.

summaryTrim = summary.tail(9)

ej = trades.exactJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")

The summaryTrim table will be generated, but an error will result when the query containing the exactJoin method runs. This is because the exactJoin method requires that each row in the left table have exactly one match in the right table.  If you desire this type of matching when joining tables, the exactJoin method will ensure you get an error instead of incorrect results.

If you don't mind getting empty cells when there is no match between the left and right table, the naturalJoin method can be used.  See Step 5, which follows.

Step 5: Substitute naturalJoin for exactJoin in the previous query.

nj = trades.naturalJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")

As you can see, when a match is not found in the right table, an empty cell is added to the joined column in the resulting table rather than throwing an error.

Complete Code Blocks for the Exact Join and Natural Join Recipe

trades = db.t("LearnDeephaven", "StockTrades")
     .where("Date=`2017-08-25`")
     .headBy(3, "Sym")
     .view("Sym", "Last", "Size")

summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")
					
ej = trades.exactJoin(summary, "Sym", "AvgPrice, TotalShares")

summaryTrim = summary.tail(9)

nj = trades.naturalJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")
        
# Initial import

from deephaven import *

print("Provides:\n"
      "\tdeephaven.Calendars as cals\n"
      "\tdeephaven.ComboAggregateFactory as caf\n"
      "\tdeephaven.DBTimeUtils as dbtu\n"
      "\tdeephaven.Plot as plt\n"
      "\tdeephaven.Plot.figure_wrapper as figw\n"
      "\tdeephaven.QueryScope as qs\n"
      "\tdeephaven.TableManagementTools as tmt\n"
      "\tdeephaven.TableTools as ttools\n"
      "See print(sorted(dir())) for the full namespace contents.")

# Exact Join and Natural Join Recipe

trades = db.t("LearnDeephaven", "StockTrades")\
    .where("Date=`2017-08-25`")\
    .headBy(3, "Sym")\
    .view("Sym", "Last", "Size")

summary = trades.by(caf.AggCombo(caf.AggAvg("AvgPrice=Last"), caf.AggSum("TotalShares=Size")), "Sym")

ej = trades.exactJoin(summary, "Sym", "AvgPrice, TotalShares")

summaryTrim = summary.tail(9)

nj = trades.naturalJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")

Learn more about exactJoin and naturalJoin in the Deephaven documentation.

Join

This recipe demonstrates how to join data from different tables using the join method.  The following methods are demonstrated in this recipe:

  • join
  • merge

Step 1: Use the same tables as the previous recipe.

trades = db.t("LearnDeephaven", "StockTrades")
    .where("Date=`2017-08-25`")
     .headBy(3, "Sym")
     .view("Sym", "Last", "Size")

summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")

summaryTrim = summary.tail(9)

Step 2: Apply the join method.

j1 = trades.join(summaryTrim, "Sym", "AvgPrice, TotalShares")

In the previous recipe, using exactJoin in this query resulted in an error because left table rows with a MSFT symbol couldn't find a match. And, using the naturalJoin method resulted in left table rows with a MSFT symbol adding empty cells rather than throwing an error. However, when we use the join method, we see that there is no error, but there are also no MSFT rows in the resulting table.  In a join operation, when a left row does not find a match, it is simply not included in the resulting table.

Step 3: Instead of dropping a symbol from summary, we're going to add a duplicate.

duplicate = emptyTable(1)
     .updateView("Sym=`MSFT`", "AvgPrice=72.9", "TotalShares=(long)23")

summaryDup = merge(summary, duplicate)

Step 4: Before we demonstrate the join method with the summaryDup table, let's try the exactJoin and naturalJoin methods first.

ej = trades.exactJoin(summaryDup, "Sym", "AvgPrice, TotalShares")

nj = trades.naturalJoin(summaryDup, "Sym", "AvgPrice, TotalShares")

In both of these cases, the queries fail with errors. This is because rows cannot have multiple matches in the right table when using the  exactJoin and naturalJoin methods. However, the join method can.

Step 5: Use the join method

j2 = trades.join(summaryDup, "Sym", "AvgPrice, TotalShares")

As you can see, MSFT has six rows in the resulting table. Every MSFT row in the left table, trades, matched two times in the right table, summaryDup. Using the join method enables you to drop rows in the left table that don't find a match and/or allow left rows to match multiple times.

Complete Code Blocks for the Join Method Recipe

trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
     .headBy(3, "Sym")
     .view("Sym", "Last", "Size")

summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")

summaryTrim = summary.tail(9)

j1 = trades.join(summaryTrim, "Sym", "AvgPrice, TotalShares")

duplicate = emptyTable(1)
     .updateView("Sym=`MSFT`", "AvgPrice=72.9", "TotalShares=(long)23")

summaryDup = merge(summary, duplicate)

j2 = trades.join(summaryDup, "Sym", "AvgPrice, TotalShares")
        
# Initial import

from deephaven import *

# Join

trades = db.t("LearnDeephaven", "StockTrades")\
    .where("Date=`2017-08-25`")\
    .headBy(3, "Sym")\
    .view("Sym", "Last", "Size")

summary = trades.by(caf.AggCombo(caf.AggAvg("AvgPrice=Last"), caf.AggSum("TotalShares=Size")), "Sym")

summaryTrim = summary.tail(9)

j1 = trades.join(summaryTrim, "Sym", "AvgPrice, TotalShares")

duplicate = ttools.emptyTable(1)\
    .updateView("Sym=`MSFT`", "AvgPrice=72.9", "TotalShares=23")

summaryDup = ttools.merge(summary, duplicate)

j2 = trades.join(summaryDup, "Sym", "AvgPrice, TotalShares")      

Learn more about the join method in the Deephaven documentation.

Left Join

This recipe demonstrates how to join data from different tables using the leftJoin method.  The following methods are demonstrated in this recipe:

  • leftJoin

Step 1: Pull trades from a certain date and reduce some columns and rows.

trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
     .view("Sym", "Last")
     .headBy(3, "Sym")

Step 2: Get the average price for each symbol.

summary = trades.view("Sym", "AvgPrice=Last").avgBy("Sym")

Step 3: Unlike the previous join recipes, use summary as the left table and trades as the right table.

lj = summary.leftJoin(trades, "Sym", "Last")

As you can see, every row in the summary table now has the average price as well as the group of prices used to formulate the average price. You can think of leftJoin as the "grouping" join. For more information, refer to the Grouping content in the Deephaven documentation.

Complete Code Blocks for the Left Join Method Recipe

trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
     .headBy(3, "Sym")
     .view("Sym", "Last")

summary = trades.view("Sym", "AvgPrice=Last").avgBy("Sym")

lj = summary.leftJoin(trades, "Sym", "Last")
        
# Initial import

from deephaven import *

# Left Join

trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")\
    .headBy(3, "Sym")\
    .view("Sym", "Last")

summary = trades.view("Sym", "AvgPrice=Last").avgBy("Sym")

lj = summary.leftJoin(trades, "Sym", "Last")

Learn more about the leftJoin method in the Deephaven documentation.

As-of Join and Reverse As-of Join

This recipe demonstrates how to join data from different tables using the as-of join (or aj) method and the reverse as-of join (raj) method. While most join methods require an exact match in the key column, the as-of join method enables you to join data from one table to another when the value in the right table is closest without going over the value in the matching column of the left table.  (Think of it as the "Price is Right" join method. )  The following methods are demonstrated in this recipe:

  • aj
  • raj

Step 1: Pull trades from a certain date, remove some columns, and rename the columns used for time values.

trades = db.t("LearnDeephaven", "StockTrades")
     .where("Date=`2017-08-25`")
     .view("Sym", "TradeTime=ExchangeTimestamp", "Last")

quotes = db.t("LearnDeephaven", "StockQuotes")
     .where("Date=`2017-08-25`")
     .view("Sym", "QuoteTime=ExchangeTimestamp", "Bid", "Ask")

Step 2: Calculate the Mid price in quotes and then sort by symbol.

quotesMid = quotes.updateView("Mid=(Bid+Ask)/2").sort("Sym")

Step 3: Use aj to join the Mid and QuoteTime columns from the quotesMid table to the trades table.

aj = trades.aj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")

When you look at the TradeTime and QuoteTime columns in the resulting table, you'll see the values that have been joined. However, unlike other join methods, the data in the columns being matched (TradeTime and QuoteTime) do not match.  Rather, the data in the rows from the left table are matched to the values in the right table only when the value in the right table is the closest value without going over the corresponding value in the left table row. Essentially, in this example, we are finding the most recent Mid price from the quotes table at the time of each trade as shown in the trades table.

This is why the as-of join is so unique; it has a more accepting matching policy. If there is no value in the right table that is less than or equal to the corresponding value in the left table, then an empty cell will be added.

Step 4: Rerun the previous query, but use the reverse as-of join method instead.

raj = trades.raj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")

This time, each row in the left table will be matched to the right table when the right table value is equal  or the closest without going under the corresponding value in the left table. In this example, we're finding the Mid price at the exact time of each trade or the closest value immediately following each trade. If there is no value greater than or equal to the value in the left table, then an empty cell will be added.

Complete Code Block for the As-of Join and the Reverse As-of Join Methods Recipes

trades = db.t("LearnDeephaven", "StockTrades")
     .where("Date=`2017-08-25`")
     .view("Sym", "TradeTime=ExchangeTimestamp", "Last")

quotes = db.t("LearnDeephaven", "StockQuotes")
     .where("Date=`2017-08-25`")
     .view("Sym", "QuoteTime=ExchangeTimestamp", "Bid", "Ask")

quotesMid = quotes.updateView("Mid=(Bid+Ask)/2").sort("Sym")

aj = trades.aj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")

raj = trades.raj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")
        
# Initial import

from deephaven import *

# As-of Join and Reverse As-of Join

trades = db.t("LearnDeephaven", "StockTrades")\
    .where("Date=`2017-08-25`")\
    .view("Sym", "TradeTime=ExchangeTimestamp", "Last")

quotes = db.t("LearnDeephaven", "StockQuotes")\
    .where("Date=`2017-08-25`")\
    .view("Sym", "QuoteTime=ExchangeTimestamp", "Bid", "Ask")

quotesMid = quotes.updateView("Mid=(Bid+Ask)/2").sort("Sym")

aj = trades.aj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")

raj = trades.raj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")

Learn more about the aj and raj methods in the Deephaven documentation.


Last Updated: 28 February 2020 12:20 -05:00 UTC    Deephaven v.1.20200121  (See other versions)

Deephaven Documentation     Copyright 2016-2020  Deephaven Data Labs, LLC     All Rights Reserved