Grouping
Deephaven offers three methods for grouping (or ungrouping) data in tables.
by
ungroup
byExternal
See also Aggregation.
by
Syntax
by(String… groupColumns)
The by
method groups column content into arrays. If no arguments are defined in the method, the content of each column is grouped into its own array.
The method is best explained by operating in three stages:
- Evaluate the columns defined by
groupColumns
against the source table (similar to how a select statement would) - Create a table containing all the distinct rows from the previous step.
- For each column from the source table not listed in the
by
method, add a column with the same name, containing an array with one element for each row that maps to the given distinct grouping row.
See also Expand Array Data for Row
Example 1
The goal for this example is to create a new table with the same number of columns as the source table, but instead of having five rows of data for each column, the content in the rows for each column will be grouped into an array using the by
method with no arguments. The query follows:
t=source.by()
When this query runs, the t
table is generated. There are no arguments to the by
method, so the content in the rows for each column is grouped into an array.
|
|
Example 2
The goal for this example is to group the content based on the distinct values in column C
. The query follows:
t2=source.by(C)
When this query runs, the following t2
table is generated. There are four distinct values in column C
, so the final table has four rows. In the source table, there are two rows in column C
that have the same value (AAPL), so there are two values in each array for that row in the t2
table. Each of the other rows in column C
of the source table was unique, so the respective arrays for columns A
and B
in the t2
table each have only one value.
|
|
Example 3
The goal for this example is to group the content based on whether the values in column A
are even or odd. The query follows:
t3=source.by("Parity=A%2")
When this query runs, the following t3
table is generated. A new column titled Parity
is created in the t3
table to hold the modulus (remainder) values of 1 or 0. The values in column A
are then evaluated for those moduli and then grouped into the corresponding arrays for each column. In the source table, there are three rows with a modulus value of 1 in column A
, and two rows with a modulus value of 0 in column A
. Therefore, the t3
table has only two rows of data - one for odd values and one for even values.
|
|
ungroup
Syntax
ungroup()
ungroup
is the inverse of the by
method. The ungroup
method unwraps content from an array and builds a brand new data structure to hold the unwrapped content. There are two forms of ungroup
:
- No argument: If no argument is presented, all array columns from the source table will be unwrapped into separate rows. Consequently, all array columns in each row must have the same number of values.
- Explicit array column list: Only columns specified will be unwrapped.
Example 4
The goal for this example is to unwrap all of the arrays for each column. Because we want to unwrap them all, no argument is needed. The query follows:
t4=source.ungroup()
When this query runs, the following t4
table is generated. In the first row of the source
table, columns A
and B
both have three values in their respective arrays. Therefore, Deephaven unwraps those arrays into three rows in the t4
table - one row for each corresponding value in the arrays. In the second row of the source table, columns A
and B
both have two values in their respective arrays. Therefore, Deephaven unwraps those arrays into two additional rows in the t4
table, resulting in five total rows for the table.
|
|
Note: The same result could have been accomplished by including columns B
and C
in the ungroup
argument, i.e., t4=source.ungroup("B","C")
Example 5
The goal for this example is to unwrap one array column while leaving the other array column intact. The query follows:
t5=source.ungroup("B")
When this query runs, the following t5
table is generated. We are only unwrapping column B
, so we need to add new rows for each value in each array. Column B
has three values in the array for the first row in the source
table. So, Deephaven unwraps that array into three separate rows in the t5
table - one row for each corresponding value in the array. However, we did not tell Deephaven to ungroup the arrays in column C
, so column C
maintains its respective arrays for the corresponding values in column B
. In the second row of the source
table, column B
has two values in its array. So, Deephaven unwraps that array into two additional rows in the t5
table, while maintaining the array in column C
for those respective rows.
|
|
byExternal
Syntax
byExternal(String… columnNames)
byExternal
divides a single table into multiple tables, which are defined by unique key-value pairs in a TableMap
. The keys are determined by the columnNames
used in the argument.
For example, assume you have a table with five columns and 100 rows. One of the columns is named USym
, and it contains five distinct values (AAPL, GOOG, MSFT, BABA and TSLA). For each distinct value in the USym
column, there are 20 rows of data.
The byExternal
method can be used to create five different tables based on the five unique key-value pairs defined in the TableMap
. The query follows:
tableByUSym = table.byExternal("USym")
One table would be created for each of the key-value pairs, (i.e., USym-AAPL
, USym-GOOG
, USym-MSFT
, USym-BABA
, and USym-TSLA
). Each distinct USym has 20 rows of data, so each table would have 20 rows.
In essence, this would be equivalent to performing individual where
filters for each unique value in the USym
column, and then saving the results of each to its own variable. However, using the byExternal
method is far more efficient and faster because it eliminates the need to repeatedly pass over the data for each of the values of interest.
To retrieve the contents of a given table, you would use the get
method as shown in the following:
t1=tableByUsym.get("AAPL")
t2=tableByUsym.get("GOOG")
t3=tableByUsym.get("MSFT")
...
To see all of the keys created in the TableMap
, you would use the TableMap getKeySet()
method as shown below:
tKeys=tableByUSym.getKeySet()
Note: this method returns an array of keys.
Using byExternal
on multiple columns
You can also create a TableMap
using more than one column in the table. In this case, the number of tables created would be based on the number of all possible combinations of key-pairs in the TableMap
. For example, if one of the columns in the previous example was titled Date
and there were three different dates in that column, there would be three more additional key-pairs. Each key-pair for the Date
column would be paired with each key-pair for the USym
column, so there would be 15 different tables created in this example.
Using our previous example, the following code would generate tables for each unique combination of values in both the Date
and USym
columns, e.g., ["2017-04-09", "TSLA"
].
tableByUSym = table.byExternal("Date", "USym")
To open a specific resulting table in the multi-column TableMap
, you would then use the SmartKey
object to define a specific combination, and then use the .get
method as shown in the examples below:
from deephaven import *
key1 = SmartKey("2017-04-08", "GOOG")
key2 = SmartKey("2017-04-09", "TSLA")
GOOG08 = curveLogByUsym.get(key1)
Spy09 = curveLogByUsym.get(key2)
import com.fishlib.datastructures.util.SmartKey;
key1 = new SmartKey("2017-04-08", "GOOG")
key2 = new SmartKey("2017-04-09", "TSLA")
GOOG08 = curveLogByUsym.get(key1)
Spy09 = curveLogByUsym.get(key2)
Last Updated: 16 February 2021 15:52 -05:00 UTC Deephaven v.1.20200331 (See other versions)
Deephaven Documentation Copyright 2016-2020 Deephaven Data Labs, LLC All Rights Reserved