Managing Data in User Tables

Overview

When using Deephaven, you may want to store the results of a query for later use. Deephaven supports saving data both within the database and in external formats. Deephaven allows users to store their own tables in the database. This section of the documentation discusses managing data in user tables.

Partitions

Partitions are distinct segments of data that can be managed independently.

A non-partitioned, or splayed table, is not divided into individual sections: the table exists as a whole.
With a partitioned table, one column is designated as the "partitioning column." All rows with the same value in this column are placed in the same partition. Each partition constitutes an independent section of the table. Individual partitions can be added, replaced, or removed without altering data from other partitions. The most common partitioning column is Date. Partitioning tables by date simplifies the management of tables containing data over long time periods.
To add a partitioned user table to the database, you must first add the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.

Non-Partitioned Direct User Tables

Several methods are used to manage content within directly manipulated, non-partitioned user tables.

Adding a Table

The addTable method is used to create a new, non-partitioned direct user table in a namespace, and simultaneously writes the table and its definition. Arguments to the method include the namespace, the new table name and the variable from which the data is copied.

The following example copies data from queryResults and then uses that data to create a new non-partitioned user table called NewTableName in the ExampleUserNamespace:

db.addTable("ExampleUserNamespace", "NewTableName", queryResults)

Note: The operation will fail if the definition already exists in an existing table named NewTableName.

You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:

db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)

db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)

Appending to a Table

The appendTable method is used to append data onto an existing non-partitioned direct user table. The following example copies data from queryResults and then appends the content to an existing user table called ExampleTable in the ExampleUserNamespace:

db.appendTable("ExampleUserNamespace", "ExampleTable", queryResults)

Replacing a Table

The replaceTable method is used to replace (not append or modify) data in an existing non-partitioned direct user table. Arguments to the method include the namespace, the name of the table containing the data to be replaced, and the variable from which the new data is copied. The following example copies data from queryResults and then replaces the content in an existing user table called ExampleTable in the ExampleUserNamespace:

db.replaceTable("ExampleUserNamespace", "ExampleTable", queryResults)

Note: If no table exists with the namespace and table name provided, replaceTable()behaves identically to addTable().

You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:

db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)

db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)

Removing a Table

The removeTable method removes an existing non-partitioned user table and its definition from the specified namespace. The following example removes an existing user table called ExampleTable from the ExampleUserNamespace:

db.removeTable("ExampleUserNamespace", "ExampleTable")

Other Methods

deleteTables will find all table locations and delete them. This may be necessary if any data is corrupt.

deleteMatchingTables will delete all tables with names matching the regular expression. This example regex matches tables beginning with "Test":

db.deleteMatchingTables("ExampleUserNamespace" , "Test.*")

deleteNamespace will delete the entire namespace and all tables and data therein, and should be used with caution. Do not use if there are centrally managed user tables in the namespace.

Partitioned Direct User Tables

To add a directly manipulated, partitioned user table to the database, you must designate a partition and create the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.

The examples below demonstrate creating and modifying a partitioned table based on data stored as the queryResult variable. The table will be partitioned by date.

Adding a Table Definition for a Partitioned Direct User Table

The addPartitionedTableDefinition method is used create a partitioned table definition in the database. The arguments provided to the method are:

namespace: The namespace under which the table is stored.
tableName: The name used to save the table.
partitioningColumnName: The column by which the data will be partitioned.
tableDefinition: A table definition, which specifies the columns to be saved. This can be taken from an existing table with the getDefinition() method.

The code below demonstrates the creation of a table called MyQueryResults in a namespace called ExampleUserNamespace. The table will be partitioned by the Date column, and will use the table definition (thus, the columnset) of the table stored in the queryResults variable:

db.addPartitionedTableDefinition("ExampleUserNamespace", "MyQueryResults", "Date", queryResults.getDefinition())

Using addTablePartitionAndDefinition() is identical to calling addPartitionedTableDefinitionIfAbsent() immediately before calling addTablePartition().

Storing Data in a Partitioned Direct User Table

To store data from queryResults to the May 1, 2017 partition of the MyQueryResults table, use the addTablePartition method:

db.addTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)

The operation will fail if data already exists in the 2017-05-01 partition of the MyQueryResults table. It will also fail if the table definition MyQueryResults table has not been added yet.

You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:

db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)

db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)

It is possible to automatically add the table definition when storing the data. This can be accomplished with the addTablePartitionAndDefinition method:

db.addColumnPartitionAndDefinition ("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)

Replacing Data in a Partitioned Direct User Table

To replace the existing data stored under the May 1, 2017 partition of the MyQueryResults, use the replaceTablePartition method:

db.replaceTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01" queryResults)

If no data exists at the partition name provided, replaceTablePartition() behaves identically to addTablePartition().

You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:

db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)

db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)

The replaceTableDefinition may also be used.

Removing Data from a Partitioned Direct User Table

To remove the May 5, 2017 partition of the MyQueryResults table, use the removeTablePartition method:

db.removeTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01")

The following can be used to remove the entire table:

db.removeTable("ExampleUserNamespace" , "MyQueryResults").

This removes both the table and its definition.

Centrally Managed User Tables

These tables are always partitioned and are written by the Remote Table Appender. Although the methods for specifying definitions are the same as with directly managed user tables, there is no other overlap.

Adding a Table Definition for a Centrally Managed User Table

To create a centrally managed user table, the definition must exist first using the addPartitionedTableDefinition method. The table data is appended to a system-defined internal partition, and the specified column partition value. The arguments provided to the method are:

namespace: The namespace under which the table is stored.
tableName: The name used to save the table.
partitioningColumnName: The column by which the data will be partitioned (e.g., "2019-01-11" for a Date partitioning column).
tableDefinition: A table definition, which specifies the columns to be saved. This can be taken from an existing table with the getDefinition() method.

The code below demonstrates the creation of a table called ExampleQueryResults in a namespace called ExampleUserNamespace. The table will be partitioned by the Date column, and will use the table definition (thus, the columnset) of the table stored in the queryResults2 variable:

db.addPartitionedTableDefinition("ExampleUserNamespace", "ExampleQueryResults", "Date", queryResults2.getDefinition())

Storing Data in a Centrally Managed User Table

To add rows to a centrally managed user table from a table object (i.e., a query table or a memory table), use the appendCentral method.

The method takes the following arguments:

namespace: The namespace under which the table is stored.
tableName: The name used to save the table.
columnPartitionValue: The column partition value (e.g., "2019-01-11" for a Date partitioning column).
newData: The table containing the new data to be appended.
flush (boolean): true to flush the table to disk; false to wait for flushing (if the implementation supports waiting).

The code below will append rows onto to the partitioned user table, ExampleQueryResults, created in the previous example:

from deephaven import *

db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, True)

db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, true)

In this example, rows from a different table, queryResults3, from the specified date will be added to ExampleQueryResults. The final argument, true, flushes data to the disk immediately. The appendCentral method is useful for persisting data out to disk that is generated in a user script.

The iris.defaultColumnPartitionNamingFunction property determines the default partitioning column for all user tables.

To use the default partitioning column, the appendCentral method takes the following arguments:

namespace: The namespace under which the table is stored.
tableName: The name used to save the table.
newData: The table containing the new data to be appended.
flush (boolean): true to flush the table to disk; false to wait for flushing.

Replacing Data in a Centrally Managed User Table

Unlike directly manipulated user tables, there are no distinct methods to replace data. You can delete the data (see below) and append new data to the table. There is no database call to delete a single centrally managed partition. You can create a Remote Table Appender (RTA) and use it directly.

Removing Data from a Centrally Managed User Table

To remove the all of the data from the ExampleQueryResults table, use the closeAndDeleteCentral method:

db.closeAndDeleteCentral("ExampleUserNamespace", "ExampleQueryResults")

Note that this removes direct partitions in addition to centrally created partitions.

To delete the table definition itself, use the removeTableDefiniton method:

db.removeTableDefinition("ExampleUserNamespace", "ExampleQueryResults")

Other Methods

The logPartitionTableSnapshot method snapshots a table at a regular interval and writes the snapshot to a partitioned table. The arguments to the method follow:

namespace: The namespace under which the table is stored.
tableName: The name used to save the table.
partitioningColumnName: The column by which the data will be partitioned.
columnPartitionValue: The column partition value (e.g., "2019-02-11" for a Date partitioning column).
table: The table to snapshot and log (does not contain the partition column).
logInterval: The interval, in milliseconds, to log snapshots at.
replayer: The data replayer.
append (boolean): true to append to an existing partition; false to replace the existing partition.

The logPartitionIncremental method writes all changes in an input table to a linked partitioned table. The arguments to the method follow:

namespace: The namespace under which the table is stored.
tableName: The name used to save the table.
partitioningColumnName: The column by which the data will be partitioned.
columnPartitionValue: The column partition value (e.g., "2019-02-11" for a Date partitioning column).
table: The table to log (does not contain the partition column).
append (boolean): true to append to an existing partition; false to replace the existing partition.

Naming Conventions

Please refer to the following rules for naming columns, variables, tables, namespaces and partitions:

Column Names

Column names cannot be reserved words. These include Java-reserved words, such as "double", "null", "int", and others (see full list), as well as the following Deephaven -reserved words: "in", "not", "i" and "k".

Column names must begin with a letter (a-z, A-Z), or the underscore character "_"; and subsequent characters may be letters, digits or underscore characters. Spaces, dollar signs and other special characters are not allowed.

Query Variable Names

Query variable names follow the same rules as column names. See above.

Table and Namespace Names

Table and namespace names must start with a letter (a-z, A-Z) or an underscore "_". Subsequent characters can be letters, numbers, underscores, hyphens "-", the plus sign "+" and the at sign "@".

Partition Names

Partition names follow the same rules as table and namespace names (see above). However, partition names may also start with a number.

Last Updated: 16 February 2021 18:07 -04:00 UTC Deephaven v.1.20200928 (See other versions)