Advanced Table Methods

TableTools

Class com.illumon.iris.db.tables.utils.TableTools contains many methods for printing, loading, saving and modifying tables. All methods of TableTools are statically imported into each IrisDbGroovySession by default. Some of those functions follow:

Viewing data

void show(Table source, ...)

Prints the first few rows of a table to standard output.

void showWithIndex(Table source, ...)

Prints the first few rows of a table to standard output, and also shows the details of the index used to retrieve the data.

String string(Table t, ...)

Returns the first few rows of a table as a pipe-delimited string.

String html(Table source)

Returns an entire table formatted as HTML. Note that this should only be used on small tables, as the whole string is built in memory.

CSV files

Table readCsv(...)

Loads a CSV file as an in-memory DB table. Column types are inferred from the data.

void writeCsv(Table t, ...)

Writes a DB table out as a CSV file.

Binary log files

Table readBin(...)

Loads a binary log file as an in-memory DB table. Requires access to the listener class that matches the file to be loaded.

Manually creating tables and columns

Table emptyTable(int size)

Creates an empty table with size rows.

Table emptyTable(int size, TableDefinition tableDefinition)

Creates an empty table with size rows, and column names and types from tableDefinition.

Table newTable(ColumnHolder... columns)

Creates a new table with the specified columns.

ColumnHolder<T> col(String name, T... values)

Creates a new data column.

Manually creating tables and columns

Users may want to create columns for a new table out of arrays. The syntax is as follows:

from deephaven import *  # deephaven.TableTools module imported as as ttools

myTable = ttools.newTable(
    ttools.col("StringColumn", "Str1", "Str2", "Str3", "Str4"),
    ttools.intCol("IntegerColumn", 1, 2, 4, 8),
    ttools.doubleCol("Decimals", 1.0, 2.0, 4.0, 8.0),
    ttools.charCol("Characters", ["A", "B", "C", "D"]))
myStrings = ["Str1", "Str2", "Str3", "Str4"] as String[]
myInts = [1, 2, 4, 8] as int[]
myDoubles = [1.0, 2.0, 4.0, 8.0] as double[]
myCharacters = ['A', 'B', 'C', 'D'] as char[]

myTable = newTable(
        col("StringColumn", myStrings),
        intCol("IntegerColumn", myInts),
        doubleCol("Decimals", myDoubles),
        charCol("Characters" , myCharacters)
)

Each line of the query specifies the data type for the column, its name, and the data for that column in the form of an array. In the Groovy script, each array is first assigned to a variable, which is then included in the query that generates the new table.

More

Table merge(Table... tables)

Combines the tables into an aggregate table. The resulting table comprises the rows of each of the Input Tables, in the given order of the inputs. Null inputs are skipped.

String diff(Table actual, Table expected, int maxDiffLines)

Computes the difference of two tables for use in verification.

TableManagementTools

Class com.illumon.iris.db.tables.utils.TableManagementTools contains many methods for managing tables. The TableManagementTools class is not imported by default, and would need to be manually imported into the IrisDbGroovySession to access its methods. Some of those functions follow:

Simple binary files

Table readTable(Table t, ...)

Loads a DB binary table.

void writeTable(Table t, ...)

Saves a table to disk in DB binary format.

void deleteTable(Table t, ...)

Deletes a DB binary format table from disk.

Directory

List<File> getAllDbDirs(String tableName, File rootDir, int levelsDeep)

Gets all of the directories for an on-disk database.

Add column(s)

TableDefinition addColumns(...)

Adds a new column to a table definition.

Tables Derived from Functions

There are two factories that allow you to create a table from the result of a function:

  • The HashSetBackedTableFactory creates a table with one row for each SmartKey in a hash set (order is not preserved).
  • The FunctionGeneratedTableFactory allows you to assemble a table in your function, and the contents of the output table are replaced with the new values.

HashSetBackedTableFactory

An abstract table that represents a HashSet of SmartKeys. Since we are representing a set, we are not defining an order to our output. Whatever order the table happens to end up in, is fine. The table will refresh by regenerating the full HashSet (using the setGenerator function passed in); and then comparing that to the existing HashSet.

/**
* Create a ticking table based on a setGenerator.
*
* @param setGenerator a function that returns a HashSet of SmartKeys; each SmartKey is a row in the output.
* @param refreshIntervalMs how often to refresh the table, if <= 0 the table does not tick.
* @param colNames the column names for the output table, must match the number of elements in each SmartKey.
* @return a table representing the Set returned by the setGenerator
*/
com.illumon.iris.db.v2.utils.HashSetBackedTableFactory.create(Function.Nullary<HashSet<SmartKey>> setGenerator, int refreshIntervalMs, String... colNames)

FunctionGeneratedTableFactory

This factory allows you to create an arbitrary table using operations like pulling data out of another table and manipulating it using Java code. The table will refresh by regenerating the full values (using the tableGenerator function passed in). The resultant table's values are copied into the result table and appropriate listener notifications are fired.

All of the rows in the output table are modified on every tick, even if no actual changes occurred. The output table also has a contiguous index.

The generator function must produce a table, and the table definition must not change between invocations.

If you are transforming a table, you should generally use the regular table operations as opposed to this factory because table operations are capable of performing some operations incrementally. However, for small tables, this might prove to require less development effort.

There are two creation methods, and each has a different refresh policy. One is time-based and the other is based on a set of ticking tables:

Time-based:

public static Table create(Function.Nullary<Table> tableGenerator, int refreshIntervalMs);

Ticking tables-based:

public static Table create(Function.Nullary<Table> tableGenerator, DynamicTable ...sourceTables);

The query engine does not know the details of your function inputs. If you are dependent on a ticking table in your tableGenerator function, you can add it to the sourceTables list so the function will be recomputed on each tick. Note: If you have an expensive computation, keep in mind it will be recomputed -- from scratch -- each time any of the tables listed in sourceTables tick, even if your result would not change.

An appropriate use of these methods would be to reformat data that you've used other table operations to create, without the need to cast things like transposition into a table operation.

Array Access

The following methods provide array access to columns in a table, and create a row index to locate specific values.

A column name followed by an underscore represents a column as an array of all values within the column. For example, Sym_ creates an array of all the column's Symbols:

t.update("SA=Sym_")

However, you may want to narrow your results and ask for the element in a certain row. In that case, you should create a row index for your table:

t.update("Index=i")// a positional row index, as an Integer
t.update("Index=ii") // a positional row index, as a Long

The variables i and ii both represent row numbers. However, Integers are limited to values up to 231-1, while Longs can represent values up to 263-1. In other words, to avoid precision problems you should use the ii variable; unless you specifically need an Integer to pass to another function.  Using the i variable in a table with more than 2 billion rows will result in an error.

The row index allows users to pinpoint certain values within an array. Within a query, square bracket operators get an element from the array based on the row index. For example, Sym_ creates an array, and Sym_[ii-1] uses the index to go one row back, or get the Symbol value in the previous row.  Sym_[ii+3] creates an array, and then gets the Symbol value three rows ahead of the current row:

t.update("X=Sym_[ii-1]")
t.update("Y=Sym_[ii+3]")

Note: these methods are unreliable within a ticking table. When new rows are inserted within the table (as a result of multiple internal partitions), previously created row indexes do not automatically update, which will cause inconsistent results.

A third variable, k, provides a unique reference for a particular row in the table:

t.update("Index=k")// the internal index within the table

This method is not used for array access, and should only be used in limited circumstances such as debugging or advanced query operations. To use this method properly, you must know the index key of a row. The internal index is not stable between operations and values are not necessarily contiguous. As with the i and ii row indexes, the k index is unreliable for live or intraday data due to query engine operations.

Let's take a look at an example query that combines these methods:

t=t.update("Difference = (Sym=Sym_[ii-1]) ? Price-Price_[ii-1] : null")

This asks for the difference in price in Symbol values using a ternary operator, which returns results based upon a true or false comparison. We want to compare the same Symbol, so the first clause in the formula checks if the value of the Sym in the current row equals the value in the previous row. Then, we want to determine the current price minus the price in the previous row. If the Symbol is not the same in that row (a false comparison), we get a "null" value. If the comparison is true, the formula will return the difference in price.


Last Updated: 08 April 2020 10:40 -04:00 UTC    Deephaven v.1.20200121  (See other versions)

Deephaven Documentation     Copyright 2016-2020  Deephaven Data Labs, LLC     All Rights Reserved