Join tables
Joins -- bringing data from one table to another -- is vital in any data system. Deephaven is no exception.
Because this is Deephaven, joins update in real time as new (and relevant) data becomes available.
Deephaven supports a few different joins:
- These get used the most:
naturalJoin()
aj()
- Other options:
join()
leftJoin()
exactJoin()
At this time, Deephaven does not support an elegant outerJoin()
, though your support persons can provide a workaround.
#
Syntax relevant for all Join methodsThe syntax of all Joins follows this pattern (where the joinMethod
appears in the list above):
leftTable.joinMethod(rightTable, "Key(s)", "columnsToJoinIn")
#
Left and right tablesThe mental map of laying out two tables and then joining them is often helpful.
The leftTable
is the starting base table to which data is joined, and the rightTable
is the table from which data comes.
important
âJoin data from the right to the leftâ is a concept that will help your debugging and communication with colleagues.
#
Using quotes- Use double quotes around
"KeyOnly"
or around the set of"Key1, Key2, Key3"
. - Use double quotes around
"SingleColToJoin"
or around the set of"Col1, Col2, Col3"
. - Again, sets have quotes around the whole set, not each element.
#
Join KeysThe Join key or keys establishes the identifier in the leftTable
that will be used to find data in the rightTable
. Any data type can be chosen as a key, and there can be multiple keys.
Think of keys as âcolumns to matchâ in all Deephaven Joins.
Keys are chosen based on use cases: âI want to bring data to the table todayTrades, from the table yestTrades, based on the USym of each row (in todayTrades).â
   leftTable = todayTrades
   rightTable = yestTrades
   key = USym
âI want to bring data to the table risk, from the table factorData, based on the Sector and FactorName of each row (in risk).â
   leftTable = risk
   rightTable = factorData
   keys = USym, FactorName
Keys can have different column names in the two tables. The syntax below addresses this:
leftTable.joinMethod(rightTable, "KeyLeftName = KeyRightName")
OR
leftTable.joinMethod(rightTable, "Key1, Key2LeftName=Key2RightName")
OR
left.joinMethod(right, "KeyLeft1=KeyRight1, KeyLeft2=KeyRight2")
warning
Queries will error out if the identified Key columns do not exist in the respective tables.
#
Columns to Join (onto the left from the right)You can choose what columns to bring from the rightTable
to the leftTable
by populating the aforementioned columnsToJoinIn
.
Various alternatives:
- Bring all the non-Key columns:
  Â
leftTable.joinMethod(rightTable, "Key")
- Bring only the ones I select:
  Â
leftTable.joinMethod(rightTable, "Key", "ColOne")
- Bring several that I select:
  Â
leftTable.joinMethod(rightTable, "Key", "ColOne, ColTwo, ColZZZ")
- Bring and rename the ones I select:
  Â
leftTable.joinMethod(rightTable, "Key", "New1=Col1, New2=Col2")
warning
Queries will error out if either:
- One of the
columnsToJoinIn
from therightTable
already exists in theleftTable
(i.e., same name). - The identified column from the
rightTable
does not exist.
#
Examples using the syntax aboveBelow are a series of Join queries using different Join methods and the various combinations of the degrees of freedom documented above:
- One or multiple Join keys,
- Differently named columns serving as keys between the left and right tables,
- Inheriting all, some, or renamed columns from the right as part of the Join process.
Setting up the examples with some source tables:
Examples of syntax -- donât worry yet about what each one does. These all work:
#
Natural JoinThe naturalJoin
is somewhat like an Excel vlookup or SQL leftJoin.
- The
leftTable
always retains the same number of rows with which it started. - If there are no matching values for a row, the appended cell(s) from the right table will contain NULL values.
- The
rightTable
can only have one row for each key of the join.
naturalJoin()
#
Examples of Example of a one-key naturalJoin()
:
Example of multi-key naturalJoin()
with renaming of join columns:
#
As-of Join- As-of joins [
aj()
] and reverse-as-of-joins [raj()
] are generally considered time series joins. - Technically, any sorted column can be used as the (last) join key, but in most cases some version of timestamp will be employed.
- As-of joins follow the same syntax as other joins, but the last join key is not used to exactly match the two tables.
- Instead, for that key (i.e., âtimeâ usually), Deephaven finds the value in the
rightTable
that is closest to the corresponding value in theleftTable
without going over (i.e., âsame or just-earlierâ). - Then, as with any other join, the
ColumnsToJoinIn
are added to theleftTable
from therightTable
.
note
raj()
is nearly identical to aj()
, except the engine finds the value that is the same or just after the corresponding value (i.e., âsame or just-laterâ).
aj
and raj
#
Examples of Example of one key aj()
:
Example of multi key aj()
with columnToJoinIn
naming:
Example of raj()
a table on itself:
#
Other JoinsAdditional methods are available:
See our Writing Queries guide for full documentation.
All follow the syntax described at the top of this document:
leftTable.joinMethod(rightTable, "Key(s)", "ColumnsToJoinIn")
where each is simply placed in the place of joinMethod
.
Read on for quick, conceptual summaries for each.
join()
#
Two scenarios that differ from naturalJoin()
:
- There is no match for a key from the
leftTable
in therightTable
.- For
naturalJoin()
, the joined columns are null. - For
join()
, the row [from theleftTable
] is removed from the result.
- For
- There are multiple matches for a key between the two tables.
- For
naturalJoin()
, the query fails. - For
join()
, multiple rows are produced -- i.e., theleftTable
row is repeated for each of therightTable
matches.
- For
It is possible, therefore, for the results of a join()
operation to have less rows or more rows than are present in the leftTable
.
exactJoin()
#
- Requires exactly one match from the
rightTable
for each of the rows in theleftTable
. - If there are zero or multiple matches, the operation will fail.
leftJoin()
#
- This returns the exact rows of the
leftTable
. - The data joined in from the
rightTable
are grouped into arrays of data.
So, for example, a leftTable
of end-of-day trade-count could be leftJoin-ed by USym with a rightTable
of trades where the Price column would be an array.