deephaven_enterprise.iceberg

This module is used to define Iceberg endpoints and to discover and deploy Deephaven schemas for Iceberg tables.

class DiscoveryResult(j_result)[source]

Bases: JObjectWrapper

DiscoveryResult represents the result of an Iceberg discovery. It can be used to deploy the discovered schema to Deephaven. It is returned by the discover() function.

deploy_embedded()[source]

Deploys the discovered Iceberg schema to Deephaven, embedding the endpoint within the schema.

Raises:

DHError

Return type:

None

deploy_named()[source]

Deploys the discovered Iceberg schema to Deephaven linking the endpoint by its name.

Raises:

DHError

Return type:

None

j_object_type

alias of DiscoveryConfig

class IcebergEndpoint(j_endpoint)[source]

Bases: JObjectWrapper

An Iceberg Endpoint defines both the location of the Iceberg Catalog and data warehouse. It also provides information required to create the catalog and retrieve data from the store as a Deephaven table. Optionally users may provide a set of properties, secrets, and hadoop settings to be passed down to the Iceberg APIs.

Secrets passed to IcebergEndpoint instances are simply named references, never actual secret values. For example, you may provide S3 credentials as a secret defined as

IcebergEndpoint(“rest”, “http://catalog/”, “s3://warehouse”, data_instructions, secrets = [“s3.access.key” : “s3.dev_access_key” ])

Deephaven will locate the secret within the set of SecretsProviders to discover the actual value.

Note that it should not be instantiated directly, but rather through the make_endpoint() function.

deploy(overwrite_existing=False)[source]

Deploys this endpoint to Deephaven. This will fail unless the endpoint was created with a name.

Parameters:

overwrite_existing (bool) – whether to overwrite an existing endpoint with the same name, defaults to False.

Raises:

DHError

Return type:

None

j_object_type

alias of IcebergEndpoint

to_json()[source]

Returns a JSON representation of the endpoint.

Return type:

str

discover(table_id, endpoint, snapshot_id=None, namespace=None, table_name=None, reference_definition=None)[source]

Discovers an Iceberg table.

Parameters:
  • table_id (str) – the Iceberg table identifier

  • endpoint (IcebergEndpoint) – the endpoint to use for discovery

  • snapshot_id (Optional[str]) – the Iceberg snapshot ID to use for discovery, defaults to None

  • namespace (Optional[str]) – A user specified namespace. If not set, the namespace is derived from the table_id, defaults to None

  • table_name (Optional[str]) – A user specified table name. If not set, the table name is derived from the table_id, defaults to None

  • reference_definition (Optional[TableDefinition]) – A user specified reference TableDefinition. The discovery process guarantees that the result is compatible with this definition, defaults to None

Return type:

DiscoveryResult

Returns:

a DiscoveryResult that can be used to deploy Deephaven schemas

Raises:

DHError

get_named_endpoint(endpoint_name)[source]

Gets an IcebergEndpoint from Deephaven by name.

Parameters:

endpoint_name (str) – the name of the endpoint

Return type:

IcebergEndpoint

Returns:

an IcebergEndpoint object

Raises:

DHError

make_endpoint(catalog_type, catalog_uri, warehouse_uri, data_instructions, endpoint_name=None, properties=None, secrets=None, hadoop_opt=None)[source]

Creates a new IcebergEndpoint. If the endpoint_name is set, the resulting endpoint can be deployed to Deephaven for reuse with the IcebergEndpoint.deploy() method.

Any provided properties and secrets will be merged into a single collection of properties to be passed on to the Iceberg APIs

Parameters:
  • catalog_type (str) – the type of catalog. Possible values include “rest”, “glue”, “hive”, “nessie”, “hadoop”, and “jdbc”

  • catalog_uri (str) – the URI of the catalog

  • warehouse_uri (str) – the URI of the data

  • data_instructions (Any) – the data instructions for Deephaven to use when fetching table data. E.g. a :class:`deephaven.experimental.s3.S3Instructions: object

  • endpoint_name (Optional[str]) – a name for this endpoint. This must be set in order to deploy this endpoint, defaults to None

  • properties (Optional[dict[str, str]]) – a map of properties to be passed to the Iceberg API, defaults to None

  • secrets (Optional[dict[str, str]]) – a map of secrets to be resolved and passed to the Iceberg API, defaults to None

  • hadoop_opt (Optional[dict[str, str]]) – a map of hadoop specific options to be passed to the Iceberg API, defaults to None

Return type:

IcebergEndpoint

Returns:

a IcebergEndpoint object

Raises:

DHError