Managing Data in User Tables
Overview
When using Deephaven, you may want to store the results of a query for later use. Deephaven supports saving data both within the database and in external formats. Deephaven allows users to store their own tables in the database. This section of the documentation discusses managing data in user tables.
See also: Table Storage
User tables can be stored in any namespace that does not contain system tables. Namespaces do not need to be configured in advance; they are created automatically when a query creates a user table. Once data is stored in a user table, all users with the appropriate permissions can access it.
There are two types of user tables:
- Directly manipulated user tables, stored on a locally accessible filesystem. These may be non-partitioned or partitioned (see below).
- Centrally managed user tables, written by the Remote Table Appender, are always partitioned.
User tables can be added, replaced, or deleted at any time.
User namespaces and tables definitions are stored without regard for how the data is managed. However, each section of this documentation (Non-Partitioned Direct User Tables, Partitioned Direct User Tables, and Centrally Managed User Tables represent distinct APIs and these methods should not be mixed. Note that when writing a ticking table to a Database
, you must hold the LTM lock. See: Live Table Monitor Lock.
Partitions
Partitions are distinct segments of data that can be managed independently.
- A non-partitioned, or splayed table, is not divided into individual sections: the table exists as a whole.
- With a partitioned table, one column is designated as the "partitioning column." All rows with the same value in this column are placed in the same partition. Each partition constitutes an independent section of the table. Individual partitions can be added, replaced, or removed without altering data from other partitions. The most common partitioning column is Date. Partitioning tables by date simplifies the management of tables containing data over long time periods.
- To add a partitioned user table to the database, you must first add the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.
Non-Partitioned Direct User Tables
Several methods are used to manage content within directly manipulated, non-partitioned user tables.
Adding a Table
The addTable
method is used to create a new, non-partitioned direct user table in a namespace, and simultaneously writes the table and its definition. Arguments to the method include the namespace, the new table name and the variable from which the data is copied.
The following example copies data from queryResults
and then uses that data to create a new non-partitioned user table called NewTableName
in the ExampleUserNamespace
:
db.addTable("ExampleUserNamespace", "NewTableName", queryResults)
Note: The operation will fail if the definition already exists in an existing table named NewTableName
.
You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:
db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)
or
db.addTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)
Appending to a Table
The appendTable
method is used to append data onto an existing non-partitioned direct user table. The following example copies data from queryResults
and then appends the content to an existing user table called ExampleTable
in the ExampleUserNamespace
:
db.appendTable("ExampleUserNamespace", "ExampleTable", queryResults)
Replacing a Table
The replaceTable
method is used to replace (not append or modify) data in an existing non-partitioned direct user table. Arguments to the method include the namespace, the name of the table containing the data to be replaced, and the variable from which the new data is copied. The following example copies data from queryResults
and then replaces the content in an existing user table called ExampleTable
in the ExampleUserNamespace
:
db.replaceTable("ExampleUserNamespace", "ExampleTable", queryResults)
Note: If no table exists with the namespace and table name provided, replaceTable()
behaves identically to addTable()
.
You may also specify a storage format, e.g., DeephavenV1 or Parquet, as follows:
db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.DeephavenV1)
or
db.replaceTable("Namespace", "NewTableName", TableName, Database.StorageFormat.Parquet)
Removing a Table
The removeTable
method removes an existing non-partitioned user table and its definition from the specified namespace. The following example removes an existing user table called ExampleTable
from the ExampleUserNamespace
:
db.removeTable("ExampleUserNamespace", "ExampleTable")
Other Methods
deleteTables
will find all table locations and delete them. This may be necessary if any data is corrupt.
deleteMatchingTables
will delete all tables with names matching the regular expression. This example regex matches tables beginning with "Test":
db.deleteMatchingTables("ExampleUserNamespace" , "Test.*"
)
deleteNamespace
will delete the entire namespace and all tables and data therein, and should be used with caution. Do not use if there are centrally managed user tables in the namespace.
Partitioned Direct User Tables
To add a directly manipulated, partitioned user table to the database, you must designate a partition and create the table definition. The table definition can be taken from another table, such as the table whose data will be added to the new user table.
The examples below demonstrate creating and modifying a partitioned table based on data stored as the queryResult
variable. The table will be partitioned by date.
Adding a Table Definition for a Partitioned Direct User Table
The addPartitionedTableDefinition
method is used create a partitioned table definition in the database. The arguments provided to the method are:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned.tableDefinition
: A table definition, which specifies the columns to be saved. This can be taken from an existing table with thegetDefinition()
method.
The code below demonstrates the creation of a table called MyQueryResults
in a namespace called ExampleUserNamespace
. The table will be partitioned by the Date
column, and will use the table definition (thus, the columnset) of the table stored in the queryResults
variable:
db.addPartitionedTableDefinition("ExampleUserNamespace", "MyQueryResults", "Date", queryResults.getDefinition())
Using addTablePartitionAndDefinition()
is identical to calling addPartitionedTableDefinitionIfAbsent()
immediately before calling addTablePartition()
.
Storing Data in a Partitioned Direct User Table
To store data from queryResults
to the May 1, 2017 partition of the MyQueryResults
table, use the addTablePartition
method:
db.addTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)
The operation will fail if data already exists in the 2017-05-01
partition of the MyQueryResults
table. It will also fail if the table definition MyQueryResults
table has not been added yet.
You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:
db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)
or
db.addTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)
It is possible to automatically add the table definition when storing the data. This can be accomplished with the addTablePartitionAndDefinition
method:
db.addColumnPartitionAndDefinition ("ExampleUserNamespace", "MyQueryResults", "2017-05-01", queryResults)
Replacing Data in a Partitioned Direct User Table
To replace the existing data stored under the May 1, 2017 partition of the MyQueryResults
, use the replaceTablePartition
method:
db.replaceTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01" queryResults)
If no data exists at the partition name provided, replaceTablePartition()
behaves identically to addTablePartition()
.
You can also specify a file format, e.g., DeephavenV1 or Parquet, as follows:
db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.DeephavenV1)
or
db.replaceTablePartition("UserNamespace", "TableName", "columnPartitionValue", data, Database.StorageFormat.Parquet)
The replaceTableDefinition
may also be used.
Removing Data from a Partitioned Direct User Table
To remove the May 5, 2017 partition of the MyQueryResults
table, use the removeTablePartition
method:
db.removeTablePartition("ExampleUserNamespace", "MyQueryResults", "2017-05-01")
The following can be used to remove the entire table:
db.removeTable("ExampleUserNamespace" , "MyQueryResults").
This removes both the table and its definition.
Centrally Managed User Tables
These tables are always partitioned and are written by the Remote Table Appender. Although the methods for specifying definitions are the same as with directly managed user tables, there is no other overlap.
Adding a Table Definition for a Centrally Managed User Table
To create a centrally managed user table, the definition must exist first using the addPartitionedTableDefinition
method. The table data is appended to a system-defined internal partition, and the specified column partition value. The arguments provided to the method are:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned (e.g., "2019-01-11" for a Date partitioning column).tableDefinition
: A table definition, which specifies the columns to be saved. This can be taken from an existing table with thegetDefinition()
method.
The code below demonstrates the creation of a table called ExampleQueryResults
in a namespace called ExampleUserNamespace
. The table will be partitioned by the Date
column, and will use the table definition (thus, the columnset) of the table stored in the queryResults2
variable:
db.addPartitionedTableDefinition("ExampleUserNamespace", "ExampleQueryResults", "Date", queryResults2.getDefinition())
Storing Data in a Centrally Managed User Table
To add rows to a centrally managed user table from a table object (i.e., a query table or a memory table), use the appendCentral
method.
The method takes the following arguments:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.-
columnPartitionValue
: The column partition value (e.g., "2019-01-11" for a Date partitioning column). newData
: The table containing the new data to be appended.flush
(boolean):true
to flush the table to disk;false
to wait for flushing (if the implementation supports waiting).
The code below will append rows onto to the partitioned user table, ExampleQueryResults
, created in the previous example:
from deephaven import *
db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, True)
db.appendCentral("ExampleUserNamespace","ExampleQueryResults", "2018-08-27", queryResults3, true)
In this example, rows from a different table, queryResults3
, from the specified date will be added to ExampleQueryResults
. The final argument, true
, flushes data to the disk immediately. The appendCentral
method is useful for persisting data out to disk that is generated in a user script.
The iris.defaultColumnPartitionNamingFunction
property determines the default partitioning column for all user tables.
To use the default partitioning column, the appendCentral
method takes the following arguments:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.newData
: The table containing the new data to be appended.flush
(boolean):true
to flush the table to disk;false
to wait for flushing.
Replacing Data in a Centrally Managed User Table
Unlike directly manipulated user tables, there are no distinct methods to replace data. You can delete the data (see below) and append new data to the table. There is no database call to delete a single centrally managed partition. You can create a Remote Table Appender (RTA) and use it directly.
Removing Data from a Centrally Managed User Table
To remove the all of the data from the ExampleQueryResults
table, use the closeAndDeleteCentral
method:
db.closeAndDeleteCentral("ExampleUserNamespace", "ExampleQueryResults")
Note that this removes direct partitions in addition to centrally created partitions.
To delete the table definition itself, use the removeTableDefiniton
method:
db.removeTableDefinition("ExampleUserNamespace", "ExampleQueryResults")
Other Methods
The logPartitionTableSnapshot
method snapshots a table at a regular interval and writes the snapshot to a partitioned table. The arguments to the method follow:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned.columnPartitionValue
: The column partition value (e.g., "2019-02-11" for a Date partitioning column).table
: The table to snapshot and log (does not contain the partition column).logInterval
: The interval, in milliseconds, to log snapshots at.replayer
: The data replayer.append
(boolean):true
to append to an existing partition;false
to replace the existing partition.
The logPartitionIncremental
method writes all changes in an input table to a linked partitioned table. The arguments to the method follow:
namespace
: The namespace under which the table is stored.tableName
: The name used to save the table.partitioningColumnName
: The column by which the data will be partitioned.columnPartitionValue
: The column partition value (e.g., "2019-02-11" for a Date partitioning column).table
: The table to log (does not contain the partition column).append
(boolean):true
to append to an existing partition;false
to replace the existing partition.
Naming Conventions
Please refer to the following rules for naming columns, variables, tables, namespaces and partitions:
Column Names
Column names cannot be reserved words. These include Java-reserved words, such as "double", "null", "int", and others (see full list), as well as the following Deephaven -reserved words: "in", "not", "i" and "k".
Column names must begin with a letter (a-z, A-Z), or the underscore character "_"; and subsequent characters may be letters, digits or underscore characters. Spaces, dollar signs and other special characters are not allowed.
Query Variable Names
Query variable names follow the same rules as column names. See above.
Table and Namespace Names
Table and namespace names must start with a letter (a-z, A-Z) or an underscore "_". Subsequent characters can be letters, numbers, underscores, hyphens "-", the plus sign "+" and the at sign "@".
Partition Names
Partition names follow the same rules as table and namespace names (see above). However, partition names may also start with a number.
Last Updated: 16 February 2021 18:07 -04:00 UTC Deephaven v.1.20200928 (See other versions)
Deephaven Documentation Copyright 2016-2020 Deephaven Data Labs, LLC All Rights Reserved