Join Methods
In addition to the code snippets included below, all of the Groovy and Python code for the recipes below can be downloaded as premade Deephaven Notebooks. To learn more about using Deephaven Notebooks, please refer to the Notebook section of the User Manual.
Download the Query Cookbook Notebooks
Exact Join And Natural Join
This recipe demonstrates how to join data from different tables using the exactJoin
and naturalJoin
methods. The following methods are demonstrated in this recipe:
exactJoin
tail
naturalJoin
Step 1: Access the namespace and table, pull trade activity data from a certain date, and remove some columns and rows.
trades = db.t("LearnDeephaven", "StockTrades")
.where("Date=`2017-08-25`")
.view("Sym", "Last", "Size")
.headBy(3, "Sym")
Step 2: Create a new table named summary
, which will hold the average price and total shares traded for each symbol.
summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")
Step 3: Start the process by using the exactJoin
method to join columns from the summary
table with the trades
table.
ej = trades.exactJoin(summary, "Sym", "AvgPrice, TotalShares")
Every row in the new ej
table now has the average price and total shares for their corresponding symbol. Let's dig a bit deeper.
Step 4: For a variation, drop the first symbol from summary
and try the exactJoin
again.
summaryTrim = summary.tail(9)
ej = trades.exactJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")
The summaryTrim
table will be generated, but an error will result when the query containing the exactJoin
method runs. This is because the exactJoin
method requires that each row in the left table have exactly one match in the right table. If you desire this type of matching when joining tables, the exactJoin
method will ensure you get an error instead of incorrect results.
If you don't mind getting empty cells when there is no match between the left and right table, the naturalJoin
method can be used. See Step 5, which follows.
Step 5: Substitute naturalJoin
for exactJoin
in the previous query.
nj = trades.naturalJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")
As you can see, when a match is not found in the right table, an empty cell is added to the joined column in the resulting table rather than throwing an error.
Complete Code Blocks for the Exact Join and Natural Join Recipe
trades = db.t("LearnDeephaven", "StockTrades")
.where("Date=`2017-08-25`")
.headBy(3, "Sym")
.view("Sym", "Last", "Size")
summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")
ej = trades.exactJoin(summary, "Sym", "AvgPrice, TotalShares")
summaryTrim = summary.tail(9)
nj = trades.naturalJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")
# Initial import
from deephaven import *
print("Provides:\n"
"\tdeephaven.Calendars as cals\n"
"\tdeephaven.ComboAggregateFactory as caf\n"
"\tdeephaven.DBTimeUtils as dbtu\n"
"\tdeephaven.Plot as plt\n"
"\tdeephaven.Plot.figure_wrapper as figw\n"
"\tdeephaven.QueryScope as qs\n"
"\tdeephaven.TableManagementTools as tmt\n"
"\tdeephaven.TableTools as ttools\n"
"See print(sorted(dir())) for the full namespace contents.")
# Exact Join and Natural Join Recipe
trades = db.t("LearnDeephaven", "StockTrades")\
.where("Date=`2017-08-25`")\
.headBy(3, "Sym")\
.view("Sym", "Last", "Size")
summary = trades.by(caf.AggCombo(caf.AggAvg("AvgPrice=Last"), caf.AggSum("TotalShares=Size")), "Sym")
ej = trades.exactJoin(summary, "Sym", "AvgPrice, TotalShares")
summaryTrim = summary.tail(9)
nj = trades.naturalJoin(summaryTrim, "Sym", "AvgPrice, TotalShares")
Learn more about exactJoin
and naturalJoin
in the Deephaven documentation.
Join
This recipe demonstrates how to join data from different tables using the join
method. The following methods are demonstrated in this recipe:
join
merge
Step 1: Use the same tables as the previous recipe.
trades = db.t("LearnDeephaven", "StockTrades")
.where("Date=`2017-08-25`")
.headBy(3, "Sym")
.view("Sym", "Last", "Size")
summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")
summaryTrim = summary.tail(9)
Step 2: Apply the join
method.
j1 = trades.join(summaryTrim, "Sym", "AvgPrice, TotalShares")
In the previous recipe, using exactJoin
in this query resulted in an error because left table rows with a MSFT symbol couldn't find a match. And, using the naturalJoin
method resulted in left table rows with a MSFT symbol adding empty cells rather than throwing an error. However, when we use the join
method, we see that there is no error, but there are also no MSFT rows in the resulting table. In a join
operation, when a left row does not find a match, it is simply not included in the resulting table.
Step 3: Instead of dropping a symbol from summary
, we're going to add a duplicate.
duplicate = emptyTable(1)
.updateView("Sym=`MSFT`", "AvgPrice=72.9", "TotalShares=(long)23")
summaryDup = merge(summary, duplicate)
Step 4: Before we demonstrate the join
method with the summaryDup
table, let's try the exactJoin
and naturalJoin
methods first.
ej = trades.exactJoin(summaryDup, "Sym", "AvgPrice, TotalShares")
nj = trades.naturalJoin(summaryDup, "Sym", "AvgPrice, TotalShares")
In both of these cases, the queries fail with errors. This is because rows cannot have multiple matches in the right table when using the exactJoin
and naturalJoin
methods. However, the join
method can.
Step 5: Use the join
method
j2 = trades.join(summaryDup, "Sym", "AvgPrice, TotalShares")
As you can see, MSFT
has six rows in the resulting table. Every MSFT
row in the left table, trades
, matched two times in the right table, summaryDup
. Using the join
method enables you to drop rows in the left table that don't find a match and/or allow left rows to match multiple times.
Complete Code Blocks for the Join Method Recipe
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.headBy(3, "Sym")
.view("Sym", "Last", "Size")
summary = trades.by(AggCombo(AggAvg("AvgPrice=Last"), AggSum("TotalShares=Size")), "Sym")
summaryTrim = summary.tail(9)
j1 = trades.join(summaryTrim, "Sym", "AvgPrice, TotalShares")
duplicate = emptyTable(1)
.updateView("Sym=`MSFT`", "AvgPrice=72.9", "TotalShares=(long)23")
summaryDup = merge(summary, duplicate)
j2 = trades.join(summaryDup, "Sym", "AvgPrice, TotalShares")
# Initial import
from deephaven import *
# Join
trades = db.t("LearnDeephaven", "StockTrades")\
.where("Date=`2017-08-25`")\
.headBy(3, "Sym")\
.view("Sym", "Last", "Size")
summary = trades.by(caf.AggCombo(caf.AggAvg("AvgPrice=Last"), caf.AggSum("TotalShares=Size")), "Sym")
summaryTrim = summary.tail(9)
j1 = trades.join(summaryTrim, "Sym", "AvgPrice, TotalShares")
duplicate = ttools.emptyTable(1)\
.updateView("Sym=`MSFT`", "AvgPrice=72.9", "TotalShares=23")
summaryDup = ttools.merge(summary, duplicate)
j2 = trades.join(summaryDup, "Sym", "AvgPrice, TotalShares")
Learn more about the join
method in the Deephaven documentation.
Left Join
This recipe demonstrates how to join data from different tables using the leftJoin
method. The following methods are demonstrated in this recipe:
leftJoin
Step 1: Pull trades from a certain date and reduce some columns and rows.
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.view("Sym", "Last")
.headBy(3, "Sym")
Step 2: Get the average price for each symbol.
summary = trades.view("Sym", "AvgPrice=Last").avgBy("Sym")
Step 3: Unlike the previous join recipes, use summary
as the left table and trades
as the right table.
lj = summary.leftJoin(trades, "Sym", "Last")
As you can see, every row in the summary
table now has the average price as well as the group of prices used to formulate the average price. You can think of leftJoin
as the "grouping" join. For more information, refer to the Grouping content in the Deephaven documentation.
Complete Code Blocks for the Left Join Method Recipe
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")
.headBy(3, "Sym")
.view("Sym", "Last")
summary = trades.view("Sym", "AvgPrice=Last").avgBy("Sym")
lj = summary.leftJoin(trades, "Sym", "Last")
# Initial import
from deephaven import *
# Left Join
trades = db.t("LearnDeephaven", "StockTrades").where("Date=`2017-08-25`")\
.headBy(3, "Sym")\
.view("Sym", "Last")
summary = trades.view("Sym", "AvgPrice=Last").avgBy("Sym")
lj = summary.leftJoin(trades, "Sym", "Last")
Learn more about the
in the Deephaven documentation.leftJoin
method
As-of Join and Reverse As-of Join
This recipe demonstrates how to join data from different tables using the as-of join (or aj
) method and the reverse as-of join (raj
) method. While most join methods require an exact match in the key column, the as-of join method enables you to join data from one table to another when the value in the right table is closest without going over the value in the matching column of the left table. (Think of it as the "Price is Right" join method. ) The following methods are demonstrated in this recipe:
aj
raj
Step 1: Pull trades from a certain date, remove some columns, and rename the columns used for time values.
trades = db.t("LearnDeephaven", "StockTrades")
.where("Date=`2017-08-25`")
.view("Sym", "TradeTime=ExchangeTimestamp", "Last")
quotes = db.t("LearnDeephaven", "StockQuotes")
.where("Date=`2017-08-25`")
.view("Sym", "QuoteTime=ExchangeTimestamp", "Bid", "Ask")
Step 2: Calculate the Mid
price in quotes
and then sort by symbol.
quotesMid = quotes.updateView("Mid=(Bid+Ask)/2").sort("Sym")
Step 3: Use aj
to join the Mid
and QuoteTime
columns from the quotesMid
table to the trades
table.
aj = trades.aj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")
When you look at the TradeTime
and QuoteTime
columns in the resulting table, you'll see the values that have been joined. However, unlike other join methods, the data in the columns being matched (TradeTime
and QuoteTime
) do not match. Rather, the data in the rows from the left table are matched to the values in the right table only when the value in the right table is the closest value without going over the corresponding value in the left table row. Essentially, in this example, we are finding the most recent Mid
price from the quotes
table at the time of each trade as shown in the trades
table.
This is why the as-of join is so unique; it has a more accepting matching policy. If there is no value in the right table that is less than or equal to the corresponding value in the left table, then an empty cell will be added.
Step 4: Rerun the previous query, but use the reverse as-of join method instead.
raj = trades.raj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")
This time, each row in the left table will be matched to the right table when the right table value is equal or the closest without going under the corresponding value in the left table. In this example, we're finding the Mid
price at the exact time of each trade or the closest value immediately following each trade. If there is no value greater than or equal to the value in the left table, then an empty cell will be added.
Complete Code Block for the As-of Join and the Reverse As-of Join Methods Recipes
trades = db.t("LearnDeephaven", "StockTrades")
.where("Date=`2017-08-25`")
.view("Sym", "TradeTime=ExchangeTimestamp", "Last")
quotes = db.t("LearnDeephaven", "StockQuotes")
.where("Date=`2017-08-25`")
.view("Sym", "QuoteTime=ExchangeTimestamp", "Bid", "Ask")
quotesMid = quotes.updateView("Mid=(Bid+Ask)/2").sort("Sym")
aj = trades.aj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")
raj = trades.raj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")
# Initial import
from deephaven import *
# As-of Join and Reverse As-of Join
trades = db.t("LearnDeephaven", "StockTrades")\
.where("Date=`2017-08-25`")\
.view("Sym", "TradeTime=ExchangeTimestamp", "Last")
quotes = db.t("LearnDeephaven", "StockQuotes")\
.where("Date=`2017-08-25`")\
.view("Sym", "QuoteTime=ExchangeTimestamp", "Bid", "Ask")
quotesMid = quotes.updateView("Mid=(Bid+Ask)/2").sort("Sym")
aj = trades.aj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")
raj = trades.raj(quotesMid, "TradeTime=QuoteTime", "Mid, QuoteTime")
Learn more about the aj and raj methods in the Deephaven documentation.
Last Updated: 28 February 2020 12:20 -05:00 UTC Deephaven v.1.20200121 (See other versions)
Deephaven Documentation Copyright 2016-2020 Deephaven Data Labs, LLC All Rights Reserved