pydeephaven

Deephaven Python Client (pydeephaven) is a Python API built on top of Deephaven’s highly efficient Open API which is based on gRPC and Apache Arrow. It allows Python applications to remotely connect to Deephaven data servers, export/import data with the server, run Python scripts on the server, and execute powerful queries on data tables.

Because Deephaven data servers and Deephaven clients including pydeephaven exchange data in the Apache Arrow format, pydeephaven is able to leverage ‘pyarrow’ - the Python bindings of Arrow (https://arrow.apache.org/docs/python/) for data representation and integration with other data analytic tools such as NumPy, Pandas, etc.

Examples

>>> from pydeephaven import Session
>>> from pyarrow import csv
>>> session = Session() # assuming Deephaven Community Edition is running locally with the default configuration
>>> table1 = session.import_table(csv.read_csv("data1.csv"))
>>> table2 = session.import_table(csv.read_csv("data2.csv"))
>>> joined_table = table1.join(table2, on=["key_col_1", "key_col_2"], joins=["data_col1"])
>>> df = joined_table.to_arrow().to_pandas()
>>> print(df)
>>> session.close()
exception DHError(message='')[source]

Bases: Exception

A custom exception class used by pydeephaven.

add_note()

Exception.add_note(note) – add a note to the exception

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class NaturalJoinType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

An Enum defining ways to handle duplicate right hand table values during natural join operations

ERROR_ON_DUPLICATE = 1

Throw an error if a duplicate right hand table row is found. This is the default behavior if not specified

EXACTLY_ONE_MATCH = 4

Match exactly one right hand table row; throw an error if there are zero or more than one matches

FIRST_MATCH = 2

Match the first right hand table row and ignore later duplicates

LAST_MATCH = 3

Match the last right hand table row and ignore earlier duplicates

class Query(session, table)[source]

Bases: TableInterface[Query]

A Query object is used to define and execute a sequence of Deephaven table operations on the server.

When the query is executed, the table operations specified for the Query object are batched together and sent to the server in a single request, thus avoiding multiple round trips between the client and the server. The result of executing the query is a new Deephaven table.

Note, an application should always use the factory method on the Session object to create a Query instance as the constructor is subject to future changes to support more advanced features already planned.

agg_all_by(agg, by)[source]

Adds an AggregateAll operation to the query.

Parameters:
  • agg (Aggregation) – the aggregation to be applied

  • by (Union[str, list[str]]) – the group-by column name(s)

Return type:

Query

Returns:

self

agg_by(aggs, by)[source]

Adds an Aggregate operation to the query.

Parameters:
  • by (Union[str, list[str]]) – the group-by column name(s)

  • aggs (Union[Aggregation, list[Aggregation]]) – the aggregation(s) to be applied

Return type:

Query

Returns:

self

aj(table, on, joins=None)[source]

Adds a as-of join operation to the query.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or a match condition of two columns, e.g. ‘col_a = col_b’. The first ‘N-1’ matches are exact matches. The final match is an inexact match. The inexact match can use either ‘>’ or ‘>=’. If a common name is used for the inexact match, ‘>=’ is used for the comparison.

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None

Return type:

Query

Returns:

self

avg_by(by=None)[source]

Adds an avg-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

count_by(col, by=None)[source]

Adds a count-by aggregation to the query.

Parameters:
  • col (str) – the name of the column to store the counts

  • by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

drop_columns(cols)[source]

Adds a drop-columns operation to the query.

Parameters:

cols (Union[str, list[str]]) – the column name(s)

Return type:

Query

Returns:

self

exact_join(table, on, joins=None)[source]

Adds an exact-join operation to the query.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or an equal expression, i.e. “col_a = col_b” for different column names

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None

Return type:

Query

Returns:

self

exec()[source]

Executes the query on the server and return the result table.

Return type:

Table

Returns:

a Table object

Raises:

DHError

first_by(by=None)[source]

Adds a first-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

group_by(by=None)[source]

Adds a group-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s); default is None

Return type:

Query

Returns:

self

head(num_rows)[source]

Adds a head operation to the query.

Parameters:

num_rows (int) – the number of rows at the head of table

Return type:

Query

Returns:

self

head_by(num_rows, by)[source]

Adds a head-by operation to the query.

Parameters:
  • num_rows (int) – the number of rows at the beginning of each group

  • by (Union[str, list[str]]) – the group-by column name(s)

Return type:

Query

Returns:

self

join(table, on=None, joins=None, reserve_bits=10)[source]

Adds a cross-join operation to the query.

Parameters:
  • table (Table) – the right-table of the join

  • on (Optional[Union[str, list[str]]]) – the column(s) to match, can be a common name or an equal expression, i.e. “col_a = col_b” for different column names

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None

  • reserve_bits (int) – the number of bits of key-space to initially reserve per group; default is 10

Return type:

Query

Returns:

self

last_by(by=None)[source]

Adds a last-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

lazy_update(formulas)[source]

Adds a lazy-update operation to the query.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Query

Returns:

self

max_by(by=None)[source]

Adds a max-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

median_by(by=None)[source]

Adds a median-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

min_by(by=None)[source]

Adds a min-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

natural_join(table, on, joins=None, type=NaturalJoinType.ERROR_ON_DUPLICATE)[source]

Adds a natural-join operation to the query.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or an equal expression, i.e. “col_a = col_b” for different column names

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None

  • type (NaturalJoinType) – the action to be taken when duplicate right hand rows are encountered; default is ERROR_ON_DUPLICATE

Return type:

Query

Returns:

self

raj(table, on, joins=None)[source]

Adds a reverse as-of join operation to the query.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or a match condition of two columns, e.g. ‘col_a = col_b’. The first ‘N-1’ matches are exact matches. The final match is an inexact match. The inexact match can use either ‘<’ or ‘<=’. If a common name is used for the inexact match, ‘<=’ is used for the comparison.

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None

Return type:

Query

Returns:

self

select(formulas=None)[source]

Adds a select operation to the query.

Parameters:

formulas (Optional[Union[str, list[str]]]) – the column formula(s), default is None

Return type:

Query

Returns:

self

select_distinct(cols=None)[source]

Adds a select-distinct operation to the query.

Parameters:

cols (Optional[Union[str, list[str]]]) – the column name(s), default is None

Return type:

Query

Returns:

self

slice(start, stop)

Extracts a subset of a table by row positions into a new Table.

If both the start and the stop are positive, then both are counted from the beginning of the table. The start is inclusive, and the stop is exclusive. slice(0, N) is equivalent to head() (N) The start must be less than or equal to the stop.

If the start is positive and the stop is negative, then the start is counted from the beginning of the table, inclusively. The stop is counted from the end of the table. For example, slice(1, -1) includes all rows but the first and last. If the stop is before the start, the result is an empty table.

If the start is negative, and the stop is zero, then the start is counted from the end of the table, and the end of the slice is the size of the table. slice(-N, 0) is equivalent to tail() (N).

If the start is negative and the stop is negative, they are both counted from the end of the table. For example, slice(-2, -1) returns the second to last row of the table.

Parameters:
  • start (int) – the first row position to include in the result

  • stop (int) – the last row position to include in the result

Return type:

TypeVar(T)

Returns:

a new Table

Raises:

DHError

Examples

>>> table.slice(0, 5)    # first 5 rows
>>> table.slice(-5, 0)   # last 5 rows
>>> table.slice(2, 6)    # rows from index 2 to 5
>>> table.slice(6, 2)    # ERROR: cannot slice start after end
>>> table.slice(-6, -2)  # rows from 6th last to 2nd last (exclusive)
>>> table.slice(-2, -6)  # ERROR: cannot slice start after end
>>> table.slice(2, -3)   # all rows except the first 2 and the last 3
>>> table.slice(-6, 8)   # rows from 6th last to index 8 (exclusive)
snapshot()[source]

Adds a snapshot operation to the query.

Return type:

Query

Returns:

self

snapshot_when(trigger_table, stamp_cols=None, initial=False, incremental=False, history=False)[source]

Adds a snapshot_when operation to the query.

Parameters:
  • trigger_table (Table) – the trigger table

  • stamp_cols (Optional[Union[str, list[str]]]) – The column(s) from trigger_table that form the “stamp key”, may be renames, default is None, meaning that all columns from trigger_table form the “stamp key”.

  • initial (bool) – Whether to take an initial snapshot upon construction, default is False. When False, the resulting table will remain empty until trigger_table first updates.

  • incremental (bool) – Whether the resulting table should be incremental, default is False. When False, all rows of this table will have the latest “stamp key”. When True, only the rows of this table that have been added or updated will have the latest “stamp key”.

  • history (bool) – Whether the resulting table should keep history, default is False. A history table appends a full snapshot of this table and the “stamp key” as opposed to updating existing rows. The history flag is currently incompatible with initial and incremental: when history is True, incremental and initial must be False.

Return type:

Query

Returns:

self

sort(order_by, order=None)[source]

Adds sort operation to the query.

Parameters:
  • order_by (Union[str, list[str]]) – the names of the columns to be sorted on

  • order (Optional[Union[SortDirection, list[SortDirection]]]) – the corresponding sort direction(s) for each sort column, default is None. In the absence of explicit sort directions, data will be sorted in the ascending order.

Return type:

Query

Returns:

self

std_by(by=None)[source]

Adds a std-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

sum_by(by=None)[source]

Adds a sum-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

tail(num_rows)[source]

Adds a tail operation to the query.

Parameters:

num_rows (int) – the number of rows at the end of table

Return type:

Query

Returns:

self

tail_by(num_rows, by)[source]

Adds a tail-by operation to the query.

Parameters:
  • num_rows (int) – the number of rows at the end of each group

  • by (Union[str, list[str]]) – the group-by column name(s)

Return type:

Query

Returns:

self

ungroup(cols=None, null_fill=True)[source]

Adds an ungroup operation to the query.

Parameters:
  • cols (Optional[Union[str, list[str]]]) – the array column(s), default is None, meaning all array columns will be ungrouped

  • null_fill (bool) – indicates whether null should be used to fill missing cells, default is True

Return type:

Query

Returns:

self

update(formulas)[source]

Adds an update operation to the query.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Query

Returns:

self

update_by(ops, by=None)[source]

Adds an update-by operation to the query.

Parameters:
  • ops (Union[UpdateByOperation, list[UpdateByOperation]]) – the UpdateByOperation(s) to be applied

  • by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

update_view(formulas)[source]

Adds an update-view operation to the query.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Query

Returns:

self

var_by(by=None)[source]

Adds a var-by aggregation to the query.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None

Return type:

Query

Returns:

self

view(formulas)[source]

Adds a view operation to the query.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Query

Returns:

self

where(filters)[source]

Adds a filter operation to the query.

Parameters:

filters (Union[str, list[str], Filter, list[Filter]]) – the filter condition expression(s)

Return type:

Query

Returns:

self

where_in(filter_table, cols)[source]

Adds a where_in operation to the query.

Parameters:
  • filter_table (Table) – the table containing the set of values to filter on

  • cols (Union[str, list[str]]) – the column name(s)

Return type:

Query

Returns:

self

where_not_in(filter_table, cols)[source]

Adds a where_not_in operation to the query.

Parameters:
  • filter_table (Table) – the table containing the set of values to filter on

  • cols (Union[str, list[str]]) – the column name(s)

Return type:

Query

Returns:

self

class Session(host=None, port=None, auth_type='Anonymous', auth_token='', never_timeout=True, session_type='python', use_tls=False, tls_root_certs=None, client_cert_chain=None, client_private_key=None, client_opts=None, extra_headers=None)[source]

Bases: object

A Session represents a connection to the Deephaven data server. It contains a number of convenience methods for asking the server to create tables, import Arrow data into tables, merge tables, run Python scripts, and execute queries.

Session objects can be used in Python with statement so that whatever happens in the with statement block, they are guaranteed to be closed upon exit.

tables

names of the global tables available in the server after running scripts

Type:

list[str]

is_alive

check if the session is still alive (may refresh the session)

Type:

bool

Initializes a Session object that connects to the Deephaven server

Parameters:
  • host (Optional[str]) – the host name or IP address of the remote machine, if None, ‘localhost’ is used.

  • port (Optional[int]) – the port number that Deephaven server is listening on, if None, 10000 is used.

  • auth_type (str) – the authentication type string, can be “Anonymous’, ‘Basic”, or any custom-built authenticator in the server, such as “io.deephaven.authentication.psk.PskAuthenticationHandler”, default is ‘Anonymous’.

  • auth_token (str) – the authentication token string. When auth_type is ‘Basic’, it must be “user:password”; when auth_type is “Anonymous’, it will be ignored; when auth_type is a custom-built authenticator, it must conform to the specific requirement of the authenticator

  • never_timeout (bool) – never allow the session to timeout, default is True

  • session_type (str) – the Deephaven session type. Defaults to ‘python’

  • use_tls (bool) – if True, use a TLS connection. Defaults to False

  • tls_root_certs (Optional[bytes]) – PEM encoded root certificates to use for TLS connection, or None to use system defaults. If not None implies use a TLS connection and the use_tls argument should have been passed as True. Defaults to None

  • client_cert_chain (Optional[bytes]) – PEM encoded client certificate if using mutual TLS. Defaults to None, which implies not using mutual TLS.

  • client_private_key (Optional[bytes]) – PEM encoded client private key for client_cert_chain if using mutual TLS. Defaults to None, which implies not using mutual TLS.

  • client_opts (Optional[list[Tuple[str,Union[int,str]]]) –

    list of tuples for name and value of options to the underlying grpc channel creation. Defaults to None, which implies not using any channel options. See https://grpc.github.io/grpc/cpp/group__grpc__arg__keys.html for a list of valid options. Example options:

    [ (‘grpc.target_name_override’, ‘idonthaveadnsforthishost’),

    (‘grpc.min_reconnect_backoff_ms’, 2000) ]

  • extra_headers (Optional[dict[bytes, bytes]]) – additional headers (and values) to add to server requests. Defaults to None, which implies not using any extra headers.

Raises:

DHError

bind_table(name, table)[source]

Binds a table to the given name on the server so that it can be referenced by that name.

Parameters:
  • name (str) – name for the table

  • table (Table) – a Table object

Raises:

DHError

Return type:

None

close()[source]

Closes the Session object if it hasn’t timed out already.

Raises:

DHError

Return type:

None

empty_table(size)[source]

Creates an empty table on the server.

Parameters:

size (int) – the size of the empty table in number of rows

Return type:

Table

Returns:

a Table object

Raises:

DHError

fetch(ticket)[source]

Fetches a server object by ticket.

This is low-level method that can be used to fetch non-Table server objects. The ticket represents a fetchable server object, e.g PluginClient, Fetchable. This method is used together with the publish() method to share server objects between sessions.

Parameters:

ticket (Ticket) – a ticket

Return type:

ExportTicket

Returns:

an ExportTicket object

Raises:

DHError

fetch_table(ticket)[source]

Fetches a table by ticket.

Parameters:

ticket (SharedTicket) – a ticket

Return type:

Table

Returns:

a Table object

Raises:

DHError

property host

The host name of the server.

import_table(data)[source]

Imports the pyarrow table as a new Deephaven table on the server.

Deephaven supports most of the Arrow data types. However, if the pyarrow table contains any field with a data type not supported by Deephaven, the import operation will fail.

Parameters:

data (pa.Table) – a pyarrow Table object

Return type:

Table

Returns:

a Table object

Raises:

DHError

input_table(schema=None, init_table=None, key_cols=None, blink_table=False)[source]

Creates an InputTable from either Arrow schema or initial table. When blink_table is True, the InputTable will be a blink table. When blink_table is False (default), the InputTable will be keyed if key columns are provided, otherwise it will be append-only.

Parameters:
  • schema (Optional[pa.Schema]) – the schema for the InputTable

  • init_table (Optional[Table]) – the initial table

  • key_cols (Optional[Union[str, list[str]]]) – the name(s) of the key column(s)

  • blink_table (bool) – whether the InputTable should be a blink table, default is False

Return type:

InputTable

Returns:

an InputTable

Raises:

DHError, ValueError

property is_alive

Whether the session is alive.

merge_tables(tables, order_by='')[source]

Merges several tables into one table on the server.

Parameters:
  • tables (list[Table]) – the list of Table objects to merge

  • order_by (str) – the order by column name, if specified, the input tables must already be sorted on this column, and the result table will be sorted on it as well, default is “”

Return type:

Table

Returns:

a Table object

Raises:

DHError

open_table(name)[source]

Opens a table in the global scope with the given name on the server.

Parameters:

name (str) – the name of the table

Return type:

Table

Returns:

a Table object

Raises:

DHError

plugin_client(server_obj)[source]

Wraps a ticket as a PluginClient. Capabilities here vary based on the server implementation of the ObjectType, but most will at least send a response payload to the client, possibly including references to other objects. In some cases, depending on the server implementation, the client will also be able to send the same sort of messages back to the server.

Part of the experimental plugin API.

Return type:

PluginClient

property port

The port number of the server.

publish(source_ticket, result_ticket)[source]

Publishes a source ticket to the result ticket.

This is low-level method that can be used to publish non-Table server objects that are previously fetched from the server. The source ticket represents the previously fetched server object to be published, and the result ticket, which should normally be a SharedTicket, is the ticket to publish to. The result ticket can then be fetched by other sessions to access the object as long as the object is not released. This method is used together with the fetch() method to share server objects between sessions.

Parameters:
  • source_ticket (Ticket) – The source ticket to publish from.

  • result_ticket (Ticket) – The result ticket to publish to.

Raises:

DHError – If the operation fails.

Return type:

None

publish_table(ticket, table)[source]

Publishes a table to the given shared ticket. The ticket can then be used by another session to fetch the table.

Note that, the shared ticket can be fetched by other sessions to access the table as long as the table is not released. When the table is released either through an explicit call of the close method on it, or implicitly through garbage collection, or through the closing of the publishing session, the shared ticket will no longer be valid.

Parameters:
Raises:

DHError

Return type:

None

query(table)[source]

Creates a Query object to define a sequence of operations on a Deephaven table.

Parameters:

table (Table) – a Table object

Return type:

Query

Returns:

a Query object

Raises:

DHError

release(ticket)[source]

Releases a ticket.

Parameters:

ticket (Ticket) – the ticket to release

Return type:

None

run_script(script, systemic=None)[source]

Runs the supplied Python script on the server.

Parameters:
  • script (str) – the Python script code

  • systemic (Optional[bool]) – Whether to treat the code as systemically important. Defaults to None which uses the default system behavior

Raises:

DHError

Return type:

None

time_table(period, start_time=None, blink_table=False)[source]

Creates a time table on the server.

Parameters:
  • period (Union[int, str]) – the interval at which the time table ticks (adds a row); units are nanoseconds or a time interval string, e.g. “PT00:00:.001” or “PT1S”

  • start_time (Optional[Union[int, str]]) – the start time for the time table in nanoseconds or as a date time formatted string; default is None (meaning now)

  • blink_table (bool) – if the time table should be a blink table, defaults to False

Return type:

Table

Returns:

a Table object

Raises:

DHError

class SortDirection(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

An enum defining the sort ordering.

ASCENDING = 1

Ascending sort direction

DESCENDING = -1

Descending sort direction

class Table(session, ticket, schema_header=b'', size=None, is_static=None, schema=None)[source]

Bases: TableInterface[Table], ServerObject

A Table object represents a reference to a table on the server. It is the core data structure of Deephaven and supports a rich set of operations such as filtering, sorting, aggregating, joining, snapshotting etc.

Note, an application should never instantiate a Table object directly. Table objects are always provided through factory methods such as Session.empty_table(), or import/export methods such as Session.import_table(), open_table(), or any of the Table operations.

is_closed

check if the table has been closed on the server

Type:

bool

agg_all_by(agg, by)[source]

The agg_all_by method creates a new table containing grouping columns and grouped data. The resulting grouped data is defined by the aggregation specified.

Note, because agg_all_by applies the aggregation to all the columns of the table, it will ignore any column names specified for the aggregation.

Parameters:
  • agg (Aggregation) – the aggregation to be applied

  • by (Union[str, list[str]]) – the group-by column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

agg_by(aggs, by)[source]

The agg_by method creates a new table containing grouping columns and grouped data. The resulting grouped data is defined by the aggregation(s) specified.

Parameters:
  • aggs (Union[Aggregation, list[Aggregation]]) – the aggregation(s) to be applied

  • by (Union[str, list[str]]) – the group-by column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

aj(table, on, joins=None)[source]

The aj (as-of join) method creates a new table containing all the rows and columns of the left table, plus additional columns containing data from the right table. For columns appended to the left table (joins), row values equal the row values from the right table where the keys from the left table most closely match the keys from the right table without going over. If there is no matching key in the right table, appended row values are NULL.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or a match condition of two columns, e.g. ‘col_a = col_b’. The first ‘N-1’ matches are exact matches. The final match is an inexact match. The inexact match can use either ‘>’ or ‘>=’. If a common name is used for the inexact match, ‘>=’ is used for the comparison.

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None, which means all the columns from the right table, excluding those specified in ‘on’

Return type:

Table

Returns:

a Table object

Raises:

DHError

avg_by(by=None)[source]

The avg_by method creates a new table containing the average for each group. Columns not used in the grouping must be of numeric types.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

close()[source]

Close the table reference on the server.

Raises:

DHError

Return type:

None

count_by(col, by=None)[source]

The count_by method creates a new table containing the number of rows for each group. The count of each group is stored in a new column named after the ‘col’ parameter.

Parameters:
  • col (str) – the name of the column to store the counts

  • by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

drop_columns(cols)[source]

The drop_column method creates a new table with the same size as this table but omits any of the specified columns.

Parameters:

cols (Union[str, list[str]]) – the column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

exact_join(table, on, joins=None)[source]

The exact_join method creates a new table containing all the rows and columns of this table plus additional columns containing data from the right table. For columns appended to the left table (joins), row values equal the row values from the right table where the key values in the left and right tables are equal.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or an equal expression, i.e. “col_a = col_b” for different column names

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None, which means all the columns from the right table, excluding those specified in ‘on’

Return type:

Table

Returns:

a Table object

Raises:

DHError

first_by(by=None)[source]

The first_by method creates a new table which contains the first row of each distinct group.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

group_by(by=None)[source]

The group_by method creates a new table containing grouping columns and grouped data, column content is grouped into arrays.

If no group-by column is given, the content of each column is grouped into its own array.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

head(num_rows)[source]

The head method creates a new table with a specific number of rows from the beginning of the table.

Parameters:

num_rows (int) – the number of rows at the head of table

Return type:

Table

Returns:

a Table object

Raises:

DHError

head_by(num_rows, by)[source]

The head_by method creates a new table containing the first number of rows for each group.

Parameters:
  • num_rows (int) – the number of rows at the beginning of each group

  • by (Union[str, list[str]]) – the group-by column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

property is_closed

Whether this table is closed.

property is_refreshing

Whether this table is refreshing.

join(table, on=None, joins=None, reserve_bits=10)[source]

The join method creates a new table containing rows that have matching values in both tables. Rows that do not have matching criteria will not be included in the result. If there are multiple matches between a row from the left table and rows from the right table, all matching combinations will be included. If no columns to match (on) are specified, every combination of left and right table rows is included.

Parameters:
  • table (Table) – the right-table of the join

  • on (Optional[Union[str, list[str]]]) – the column(s) to match, can be a common name or an equal expression, i.e. “col_a = col_b” for different column names, default is None

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None, which means all the columns from the right table, excluding those specified in ‘on’

  • reserve_bits (int) – the number of bits of key-space to initially reserve per group; default is 10

Return type:

Table

Returns:

a Table object

Raises:

DHError

last_by(by=None)[source]

The last_by method creates a new table which contains the last row of each distinct group.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

lazy_update(formulas)[source]

The lazy_update method creates a new table containing a new, cached, formula column for each formula.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

max_by(by=None)[source]

The max_by method creates a new table containing the maximum value for each group. Columns not used in the grouping must be of numeric types.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

median_by(by=None)[source]

The median_by method creates a new table containing the median for each group. Columns not used in the grouping must be of numeric types.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

property meta_table

The column definitions of the table in a Table form.

min_by(by=None)[source]

The min_by method creates a new table containing the minimum value for each group. Columns not used in the grouping must be of numeric types.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

natural_join(table, on, joins=None, type=NaturalJoinType.ERROR_ON_DUPLICATE)[source]

The natural_join method creates a new table containing all the rows and columns of this table, plus additional columns containing data from the right table. For columns appended to the left table (joins), row values equal the row values from the right table where the key values in the left and right tables are equal. If there is no matching key in the right table, appended row values are NULL.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or an equal expression, i.e. “col_a = col_b” for different column names

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None, which means all the columns from the right table, excluding those specified in ‘on’

  • type (NaturalJoinType) – the action to be taken when duplicate right hand rows are encountered; default is ERROR_ON_DUPLICATE

Return type:

Table

Returns:

a Table object

Raises:

DHError

property pb_ticket

Returns the ticket as a gRPC protobuf ticket object.

property pb_typed_ticket

Returns a protobuf typed ticket, suitable for use in communicating with an ObjectType plugin on the server.

raj(table, on, joins=None)[source]

The raj (reverse as-of join) method creates a new table containing all the rows and columns of the left table, plus additional columns containing data from the right table. For columns appended to the left table ( joins), row values equal the row values from the right table where the keys from the left table most closely match the keys from the right table without going under. If there is no matching key in the right table, appended row values are NULL.

Parameters:
  • table (Table) – the right-table of the join

  • on (Union[str, list[str]]) – the column(s) to match, can be a common name or a match condition of two columns, e.g. ‘col_a = col_b’. The first ‘N-1’ matches are exact matches. The final match is an inexact match. The inexact match can use either ‘<’ or ‘<=’. If a common name is used for the inexact match, ‘<=’ is used for the comparison.

  • joins (Optional[Union[str, list[str]]]) – the column(s) to be added from the right table to the result table, can be renaming expressions, i.e. “new_col = col”; default is None, which means all the columns from the right table, excluding those specified in ‘on’

Return type:

Table

Returns:

a Table object

Raises:

DHError

select(formulas=None)[source]

The select method creates a new in-memory table that includes one column for each formula. If no formula is specified, all columns will be included.

Parameters:

formulas (Optional[Union[str, list[str]]]) – the column formula(s), default is None

Return type:

Table

Returns:

a Table object

Raises:

DHError

select_distinct(cols=None)[source]

The select_distinct method creates a new table containing all the unique values for a set of key columns. When the selectDistinct method is used on multiple columns, it looks for distinct sets of values in the selected columns.

Parameters:

cols (Optional[Union[str, list[str]]]) – the column name(s), default is None

Return type:

Table

Returns:

a Table object

Raises:

DHError

slice(start, stop)[source]

Extracts a subset of a table by row positions into a new Table.

If both the start and the stop are positive, then both are counted from the beginning of the table. The start is inclusive, and the stop is exclusive. slice(0, N) is equivalent to head() (N) The start must be less than or equal to the stop.

If the start is positive and the stop is negative, then the start is counted from the beginning of the table, inclusively. The stop is counted from the end of the table. For example, slice(1, -1) includes all rows but the first and last. If the stop is before the start, the result is an empty table.

If the start is negative, and the stop is zero, then the start is counted from the end of the table, and the end of the slice is the size of the table. slice(-N, 0) is equivalent to tail() (N).

If the start is negative and the stop is negative, they are both counted from the end of the table. For example, slice(-2, -1) returns the second to last row of the table.

Parameters:
  • start (int) – the first row position to include in the result

  • stop (int) – the last row position to include in the result

Return type:

Table

Returns:

a new Table

Raises:

DHError

Examples

>>> table.slice(0, 5)    # first 5 rows
>>> table.slice(-5, 0)   # last 5 rows
>>> table.slice(2, 6)    # rows from index 2 to 5
>>> table.slice(6, 2)    # ERROR: cannot slice start after end
>>> table.slice(-6, -2)  # rows from 6th last to 2nd last (exclusive)
>>> table.slice(-2, -6)  # ERROR: cannot slice start after end
>>> table.slice(2, -3)   # all rows except the first 2 and the last 3
>>> table.slice(-6, 8)   # rows from 6th last to index 8 (exclusive)
snapshot()[source]

The snapshot method creates a static snapshot table.

Return type:

Table

Returns:

a Table object

Raises:

DHError

snapshot_when(trigger_table, stamp_cols=None, initial=False, incremental=False, history=False)[source]

The snapshot_when creates a table that captures a snapshot of this table whenever trigger_table updates.

When trigger_table updates, a snapshot of this table and the “stamp key” from trigger_table form the resulting table. The “stamp key” is the last row of the trigger_table, limited by the stamp_cols. If trigger_table is empty, the “stamp key” will be represented by NULL values.

Parameters:
  • trigger_table (Table) – the trigger table

  • stamp_cols (Optional[Union[str, list[str]]]) – The column(s) from trigger_table that form the “stamp key”, may be renames, default is None, meaning that all columns from trigger_table form the “stamp key”.

  • initial (bool) – Whether to take an initial snapshot upon construction, default is False. When False, the resulting table will remain empty until trigger_table first updates.

  • incremental (bool) – Whether the resulting table should be incremental, default is False. When False, all rows of this table will have the latest “stamp key”. When True, only the rows of this table that have been added or updated will have the latest “stamp key”.

  • history (bool) – Whether the resulting table should keep history, default is False. A history table appends a full snapshot of this table and the “stamp key” as opposed to updating existing rows. The history flag is currently incompatible with initial and incremental: when history is True, incremental and initial must be False.

Return type:

Table

Returns:

a Table object

Raises:

DHError

sort(order_by, order=None)[source]

The sort method creates a new table where the rows are ordered based on values in the specified set of columns.

Parameters:
  • order_by (Union[str, list[str]]) – the column(s) to be sorted on

  • order (Optional[Union[SortDirection, list[SortDirection]]]) – the corresponding sort direction(s) for each sort column, default is None. In the absence of explicit sort directions, data will be sorted in the ascending order.

Return type:

Table

Returns:

a Table object

Raises:

DHError

sort_descending(order_by)[source]

The sort_descending method creates a new table where rows in a table are sorted in descending order based on the order_by column(s).

Parameters:

order_by (Union[str, list[str]]) – the column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

std_by(by=None)[source]

The std_by method creates a new table containing the sample standard deviation for each group. Columns not used in the grouping must be of numeric types.

Sample standard deviation is computed using Bessel’s correction, which ensures that the sample variance will be an unbiased estimator of population variance.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column names(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

sum_by(by=None)[source]

The sum_by method creates a new table containing the sum for each group. Columns not used in the grouping must be of numeric types.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

tail(num_rows)[source]

The tail method creates a new table with a specific number of rows from the end of the table.

Parameters:

num_rows (int) – the number of rows at the end of table

Return type:

Table

Returns:

a Table object

Raises:

DHError

tail_by(num_rows, by)[source]

The tail_by method creates a new table containing the last number of rows for each group.

Parameters:
  • num_rows (int) – the number of rows at the end of each group

  • by (Union[str, list[str]]) – the group-by column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

ticket

The ticket that points to the object on the server.

to_arrow()[source]

Takes a snapshot of the table and returns a pyarrow Table.

Return type:

Table

Returns:

a pyarrow.Table

Raises:

DHError

type

The type of the object. May be empty, indicating that the instance cannot be connected to or otherwise directly used from the client.

ungroup(cols=None, null_fill=True)[source]

The ungroup method creates a new table in which array columns from the source table are unwrapped into separate rows. The ungroup columns should be of array types.

Parameters:
  • cols (Optional[Union[str, list[str]]]) – the array column(s), default is None, meaning all array columns will be ungrouped, default is None, meaning all array columns will be ungrouped

  • null_fill (bool) – indicates whether null should be used to fill missing cells, default is True

Return type:

Table

Returns:

a Table object

Raises:

DHError

update(formulas)[source]

The update method creates a new table containing a new, in-memory column for each formula.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

update_by(ops, by=None)[source]

The update_by method creates a table with additional columns calculated from window-based aggregations of columns in this table. The aggregations are defined by the provided operations, which support incremental aggregations over the corresponding rows in the table. The aggregations will apply position or time-based windowing and compute the results over the entire table or each row group as identified by the provided key columns.

Parameters:
  • ops (Union[UpdateByOperation, list[UpdateByOperation]]) – the UpdateByOperation(s) to be applied

  • by (Optional[Union[str, list[str]]]) – the group-by column name(s), defaults to None, meaning all calculations are performed over the entire table

Return type:

Table

Returns:

a Table object

Raises:

DHError

update_view(formulas)[source]

The update_view method creates a new table containing a new, formula column for each formula.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

var_by(by=None)[source]

The var_by method creates a new table containing the sample variance for each group. Columns not used in the grouping must be of numeric types.

Sample variance is computed using Bessel’s correction, which ensures that the sample variance will be an unbiased estimator of population variance.

Parameters:

by (Optional[Union[str, list[str]]]) – the group-by column name(s), default is None, meaning grouping all the rows into one group

Return type:

Table

Returns:

a Table object

Raises:

DHError

view(formulas)[source]

The view method creates a new formula table that includes one column for each formula.

Parameters:

formulas (Union[str, list[str]]) – the column formula(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

where(filters)[source]

The where method creates a new table with only the rows meeting the filter criteria in the column(s) of the table.

Parameters:

filters (Union[str, list[str], Filter, list[Filter]]) – the filter condition expression(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

where_in(filter_table, cols)[source]

The where_in method creates a new table containing rows from the source table, where the rows match values in the filter table.

Parameters:
  • filter_table (Table) – the table containing the set of values to filter on

  • cols (Union[str, list[str]]) – the column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError

where_not_in(filter_table, cols)[source]

The where_not_in method creates a new table containing rows from the source table, where the rows do not match values in the filter table.

Parameters:
  • filter_table (Table) – the table containing the set of values to filter on

  • cols (Union[str, list[str]]) – the column name(s)

Return type:

Table

Returns:

a Table object

Raises:

DHError