Selection 

Deephaven offers several methods for selecting (or deselecting) columns of data in a query. These methods also enable users to eliminate or modify existing columns of data, or create new columns of data. The first five methods are select, view, update, updateView and lazyUpdate

Before deciding on the selection method to employ in your query, you should consider two questions:

  1. Do you need to use only some of the columns or all of the columns from the source table?
  2. Will the analyses run faster by storing all of the information in memory, or will it be more efficient to dynamically compute cell values as they are needed?

An example for the first question follows:

If your source table has 50 columns of data, but you only need a subset of those columns for your analysis, using the select or view methods will allow you to specifically name the columns you want to keep in the resulting table. The select and view methods also allow you to modify or add new columns to the resulting table.

Once you have your initial column set pared down, you may want to modify a column or add more columns to the resulting table. This is when the update, updateView, and lazyUpdatemethods come into play. When using one of these three selection methods, all of the columns from the source table are automatically included in the resulting table. 

 

select

 

view

 

 

Use when you want to keep only some of the columns from the source table in the resulting table.

 

update

 

updateView

 

 

lazyUpdate

 

Use when you want to keep all columns from the source table in the resulting table.

We now have two methods that enable us to keep only some the columns in a table, and three methods that enable us to include all columns of data in a table.  Now we need to consider the second question noted above: 

Will the analyses run faster by storing all of the information in memory, or will it be more efficient to dynamically compute cell values as they are needed?

When using data from large datasets, the method in which you access and store that data can have a big impact on efficiency and speed.

When select and update are used in a query, Deephaven will access and evaluate the data requested and then store it in memory.  Storing the data is more efficient when the content is expensive to evaluate and/or when the data is accessed multiple times. However, keep in mind that storing large datasets requires a large amount of memory.

Note: When using select,  the entire dataset requested is stored in memory.  When using update, only the "updated" portion of the dataset is stored in memory.

With view and updateView, the data being requested is not stored in memory.  Rather, Deephaven stores a formula that recalculates each value as it is needed. These values are calculated as needed and never saved. Therefore, when the formula is fast to compute or if you are only accessing a small portion of the data, storing the results as a formula is often the best option. Memory is not allocated to store new columns, so less memory is needed.

The lazyUpdate method computes column formulas on demand and will defer computation until it is required. Because it caches the results for the set of input values, the same value will never be computed twice. For small sets of unique values, this uses less memory than an update, and requires less computation than an updateView. However, if you have many unique values, an update will be more efficient because the lazyUpdate stores the formula inputs and result in a map, whereas the update stores the values more compactly in an array.

 

select

 

view

 

 

 

Use when you want to keep only some of the columns from the source table in the resulting table.

 

update

 

updateView

 

 

lazyUpdate

 

Use when you want to keep all columns from the source table in the resulting table.

The content requested is evaluated once and saved into memory. 

The content exists only as formulas that refer to existing data. Values are calculated on the fly and never saved.

The content is evaluated on demand and when you access a cell, the formula inputs are used to store the values in the cache so the same value is not computed twice.

 

A large amount of memory is required for storing large datasets.

Memory is not allocated to store new columns, so less memory is needed.

Because the results are saved in a cache, this method uses less memory than an update when you have a smaller set of unique values, and less computation is required than with updateView.

 

If the content is expensive to evaluate and accessed many times, storing the results in memory is often the best option.

When the formula is fast to compute or if you are only accessing a small portion of the data, storing the results as a formula is often the best option.

This method defers computation of formulas until they are required. Values are never removed from the lazyUpdate cache, and this method is best when you have a small number of unique values.

 

Choosing one pair of methods over the other is not a trivial decision. Depending on the nature of the analysis and the amount of memory you have at your disposal, one method is usually more efficient than the other.  As a general rule of thumb, you should start with view and updateView, and then switch to select, update, and lazyUpdate to see if it is more efficient.

The following sections detail how to use each of these methods in a query.

select and view

The select and view methods create one table with the same number of rows as the source table, and one column for each argument used in the query. Each argument can be the name of an existing column, or a formula to define an existing column or create a new column. If you do not specify a column from the source table in the argument, it will not be copied to the resulting table.

There is only one difference between the select and view methods. Data obtained by using the select method is saved into memory; data obtained by using the view method is referenced via a formula and is not saved in memory.

Example

Start with the table shown below named source; we will write a query to create another table with only certain columns from the original table. 

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

The query follows.  Note:  The select method is used in this example, but the syntax and the resulting data is identical to that when using the view method.

t=source.select ("A", "B=B+2", "E=C+A")

t is the name of the variable that will store the new table,  source is the table name, select is the method. Three arguments describe the columns to include in the result table.

The first argument, "A" indicates that the resulting table's first column should be column A from the source table, with no changes.  The second argument, "B=B+2", indicates that the resulting table's second column should be column B from the source table, plus two. The third argument, "E=C+A", tells us to create a new virtual column in the resulting table, column E, by combining columns C and A from the original table.  Note:  Column C contains strings (as compared to numeric values), so Deephaven will concatenate the two columns and present them in the new E column.

When the query runs, the new t table is created.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t=source.select("A", "B=B+2", "E=C+A")

A

B

E

1

12

AAPL1

2

22

IBM2

3

32

C3

As is demonstrated in the resulting table above, the select and the view methods enable you to:

  • Choose the specific columns you want to show in the new table
  • Manipulate the data you are showing in the new table, or
  • Omit columns that you do not need for your analysis  
  • Create new virtual columns in the resulting table

update, updateView, and lazyUpdate

The update, updateView, and lazyUpdate methods create one table with the same number of rows as the source table.  All of the columns in the source table are included in the new table. However, the contents of those columns in the resulting table can be manipulated through arguments in the query.  Additional columns can be added if desired.

The primary difference among these methods is how data is stored. Data obtained by using the update method is saved into memory; data obtained by using the updateView method is referenced via a formula and is not saved in memory; when you access a cell, data computed by using the lazyUpdate method is stored in a cache.

Note:  The update method is being used in the following example, but the syntax and the resulting data is identical to that when using the updateView or lazyUpdate method.

Example

Starting with the table shown below named source, we will write a query to create another table that will "update" certain data in the source table for further analysis. 

source

A

B

C

1

10

AAPL

2

20

IBM

3

30

C

The query follows.

t2=source.update("B=B+2","D=C+A")

t2 is the name of the variable that will store the new table, source is the table name, and update is the method. 

The first argument,"B=B+2", indicates column B in the resulting table should be increased by two. The second argument,"D=C+A", indicates a new virtual column, column D, should be created in the resulting table by combining the values of column C  and column A from the source table. Note: Because column C contains strings, Deephaven will concatenate the values in the two columns and present them in the new D column.

When the query runs, the new t2 table is created.

source

A

B

C

1

10

AAPL

2

20

IBM

3

30

C

 

t2=source.update("B=B+2","D=C+A")

A

B

C

D

1

12

AAPL

AAPL1

2

22

IBM

IBM2

3

32

C

C3

As demonstrated in the resulting table above, the update and updateView methods enable you to:

  • Include all of the columns from the source table in the resulting table without having to include arguments for each.  
  • Manipulate column values in the resulting table.
  • Create new virtual columns in the resulting table.

selectDistinct

The selectDistinct method creates a new table with only the columns from the source table that are listed in the argument. The number of rows in the new table's column is determined by the number of distinct values contained in these columns. 

Example

Starting with the table shown below, we will then write a query to select the distinct values in a column. 

source

A

B

1

apple

1

apple

2

orange

2

orange

3

plum

3

grape

The query follows:

t=source.selectDistinct("B")

t is the name of the variable that will store the new table, source is the table name, and selectDistinct is the method.  The argument, "B", indicates the method will use the values from column B of the source table.

When the query runs, the new t table is created.

source

A

B

1

apple

1

apple

2

orange

2

orange

3

plum

3

grape

 

t=source.selectDistinct("B")

B

apple

orange

plum

grape

Column B in the source table has only four distinct values (apple, orange, plum and grape), so the new t table has only four rows - one for each distinct value in column B.

This example showed the distinct values in only one column.  However, the selectDistinct method can also be used to determine distinct value sets across multiple columns.

Using the same table as before, we will adjust the query to include two columns in the argument:

t2=source.selectDistinct("A","B")

When this new query runs, the new t2 table is created.

source

A

B

1

apple

1

apple

2

orange

2

orange

3

plum

3

grape

 

t2=source.selectDistinct("A","B")

A

B

1

apple

2

orange

3

plum

3

grape

When the selectDistinct method is used on multiple columns, it looks for distinct sets of values in the columns selected.  In this example, the first two rows have the same set of values, so only one row is created in the new table to hold this distinct value set. The third and fourth rows also have an identical set of values, so only one row is added in the t2 table with this set of values.  Each of the last two rows have their own distinct value sets, so two rows are included in the t2 table.

dropColumns

The dropColumns method creates a table with the same number of rows as the source table, but omits any columns included in the dropColumns argument.  This method is useful when you only want to eliminate a small number of columns from the source table, or, in cases where you need to add a column for some operation, but then no longer need it after the operation is complete.

Example

Starting with the table shown below named source, we will then write a query to create another table that will drop (omit) one or more columns. 

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

The query follows:

t=source.dropColumns("B","D")

t is the name of the variable that will store the new table, source is the table name, and dropColumn is the method.  The argument ("B","D") indicates columns B and D should be dropped (not included) in the resulting table.

When the query runs, the new t table is created.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t=source.dropColumns("B","D")

A

C

1

AAPL

2

IBM

3

C

moveColumns

The moveColumns method creates a table with the same number of rows and columns as the source table. However, a column (or a set of columns) can be moved to a different location by assigning it to a specific column index value in the moveColumns argument. 

Example

Starting with the table shown below named source, we will then write a query to create another table with a column in a different order. 

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

The query follows:

t=source.moveColumns(3,"B")

t is the name of the variable that will store the new table, source is the table name, and moveColumn is the method.  The argument (3,"B") indicates column B should be moved to column index number three in the resulting table.  Deephaven uses a zero-based index model, so column index number 3 would be the fourth column. 

When the query runs, the new t table is created with column B now in the fourth position (3rd index).

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t=source.moveColumns(3,"B")

A

C

D

B

1

AAPL

Lucy 10

2

IBM

Ricky 20

3

C

Fred 30

Column sets can also be moved using the moveColumns method.  For example, the following query moves the set of columns B and C to column index number 2.

t2=source.moveColumns(2,"B","C")

When this query runs, the new t2 table is created with the set of columns B and C now in the third and fourth position (column index number 2 and 3 ) respectively.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t2=source.moveColumns(2,"B","C")

A

D

B

C

1

Lucy

10

AAPL

2

Ricky

20

IBM

3

Fred

30

C

moveUpColumns

The moveUpColumns method creates a table with the same number of rows and columns as the source table. However, a column (or a set of columns) can be moved to the first column index position in the resulting table by including the column name (or set of column names) in the moveUpColumns argument. 

Unlike the moveColumns method, the argument for the moveUpColumns method does not require the column index number.

Example

Starting with the table shown below named source, we will then write a query to create another table with a different column as the left column. 

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

The query follows:

t=source.moveUpColumns("D")

t is the name of the variable that will store the new table, source is the table name, and moveUpColumn is the method.  The argument "D" indicates column D should be placed first in order (column index 0) in the resulting table. Deephaven uses a zero-based index model, so column index 0 would be the first column. 

When the query runs, the new t table is created with column D now in the first position (column index number 0).

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t=source.moveUpColumns("D")

D

A

B

C

Lucy

1

10

AAPL

Ricky

2

20

IBM

Fred

3

30

C

A set of columns can also be moved using the moveUpColumns method.  For example, the following query moves the set of columns B and C to the first column index position.

t2=source.moveUpColumns("B","C")

When this query runs, the new t2 table is created with columns B and C now in the first and second position (column index 0 and 1) respectively.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t2=source.moveUpColumns("B","C")

B

C

A

D

10

AAPL

1

Lucy

20

IBM

2

Ricky

30

C

3

Fred

moveDownColumns

The moveDownColumns() method creates a table with the same number of rows and columns as the source table. However, a column (or a set of columns) can be moved to the right side in the resulting table by including the column name (or set of column names) in the moveDownColumns() argument.

Unlike the moveColumns() method, the argument for the moveDownColumns() method does not require the column index number.

Example

Starting with the table shown below named source, we will then write a query to create another table with a different column as the right column.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

The query follows:

t=source.moveDownColumns("C")

t is the name of the variable that will store the new table, source is the table name, and moveDownColumns is the method.  The argument "C" indicates column D should be placed last in order in the resulting table.

When the query runs, the new t table is created with column C now in the last position.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t=source.moveDownColumns("C")

A

B

D

C

1

10

Lucy

AAPL

2

20

Ricky

IBM

3

30

Fred

C

renameColumns

The renameColumns method creates a table with the same number of rows and columns as the source table. However, column names in the resulting table can be changed by including them in the  renameColumns argument. 

Example

Starting with the table shown below named source, we will then write a query to change a column name in the resulting table. 

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

The query follows:

t=source.renameColumns("Beta=B")

"t" is the name of the variable that will store the new table, source is the table name, and renameColumn is the method.  The argument,"Beta=B" indicates the name of column B from the source table should be renamed to column Beta in the resulting table.

When the query runs, the new t table is created.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t=source.renameColumns("Beta=B")

A

Beta

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

Multiple columns can be renamed using the renameColumn method.  For example, the following query will rename columns A and C in the resulting table:

t2=source.renameColumns("Alpha=A","Charlie=C")

When this query runs, the new t2 table is created.

source

A

B

C

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred

 

t2=source.renameColumns("Alpha=A","Charlie=C")

Alpha

B

Charlie

D

1

10

AAPL

Lucy

2

20

IBM

Ricky

3

30

C

Fred


Last Updated: 16 February 2021 18:07 -04:00 UTC    Deephaven v.1.20200928  (See other versions)

Deephaven Documentation     Copyright 2016-2020  Deephaven Data Labs, LLC     All Rights Reserved