Class HiveLocationsTableKeyFinder

java.lang.Object
io.deephaven.enterprise.locations.hive.HiveLocationsTableKeyFinder
All Implemented Interfaces:
TableLocationKeyFinder<EnterpriseTableLocationKey>

public class HiveLocationsTableKeyFinder extends Object implements TableLocationKeyFinder<EnterpriseTableLocationKey>
A TableLocationKeyFinder that uses a table of file names, sizes, and partitioning columns to provide table locations. This is intended for use with coreplus:hive formatted tables.

The table must have the following columns and exist in the ".locations_table" subdirectory:

Locations Table Columns
Name Type Description
Filename String The column file name, relative to the root of this table.
Size long The size of the partition in rows.
Format String The format as a String value of TableLocation.Format (either "DEEPHAVEN" or * "PARQUET")..
LastModifiedNanos long The epoch timestamp of the time the partition was last modified
ColumnVersion int Integer version of column files for Deephaven partitions.

The writeLocationsTable(File, Schema) method can be used to scan all available locations using the Core KeyValuePartitionLayout key finder; and then write a new locations table for the discovered partitions.

The system will not use the underlying data discovery mechanism when a locations table is available. You must keep the locations table in sync wih the actual table locations. If the locations table does not match the underlying data, then you will see null rows (when the locations table includes more rows than the underlying data); or rows will be missing in the table (when the locations table does not represent the underlying data).

  • Field Details

    • FILENAME_COLUMN_NAME

      public static final String FILENAME_COLUMN_NAME
      The column of file names, relative to the root of this table.
      See Also:
    • SIZE_COLUMN_NAME

      public static final String SIZE_COLUMN_NAME
      The column of partition sizes, in rows.
      See Also:
    • FORMAT_COLUMN_NAME

      public static final String FORMAT_COLUMN_NAME
      The column name containing the format as a String value of TableLocation.Format (either "DEEPHAVEN" or "PARQUET").
      See Also:
    • LAST_MODIFIED_NANOS_COLUMN_NAME

      public static final String LAST_MODIFIED_NANOS_COLUMN_NAME
      The column name containing the epoch timestamp of the time the partition was last modified.
      See Also:
    • COLUMN_VERSION_COLUMN_NAME

      public static final String COLUMN_VERSION_COLUMN_NAME
      The column name containing the integer version of column files for Deephaven format.
      See Also:
    • ROW_GROUP_SIZES_COLUMN_NAME

      public static final String ROW_GROUP_SIZES_COLUMN_NAME
      The column name sizes of each row group in parquet format.
      See Also:
    • LOCATIONS_SUBDIRECTORY

      public static final String LOCATIONS_SUBDIRECTORY
      The name of the subdirectory containing the locations table, in Deephaven format.

      The Community Key Finder ignores things that start with dot. If you have a file name that does not match key=value, then the key finder will throw an exception.

      See Also:
  • Constructor Details

    • HiveLocationsTableKeyFinder

      public HiveLocationsTableKeyFinder(@NotNull @NotNull File tableRoot, @NotNull @NotNull List<ColumnDefinition<?>> columnPartitionKeys)
      Construct a HiveLocationsTableKeyFinder using the tableRoot for resolving full paths and the specified locationsTable for locating keys.
      Parameters:
      tableRoot - the root directory that this locations table references.
      columnPartitionKeys - the partitioning column name
  • Method Details

    • findKeys

      public void findKeys(@NotNull @NotNull Consumer<EnterpriseTableLocationKey> locationKeyObserver)
      Specified by:
      findKeys in interface TableLocationKeyFinder<EnterpriseTableLocationKey>
    • writeLocationsTable

      public static void writeLocationsTable(@NotNull @NotNull File tableRoot, @NotNull @NotNull io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.Schema schema) throws IOException
      Write a locations table to disk for the given directory and schema.
      Parameters:
      tableRoot - the directory to scan and update with a ".locations_table" that the Core+ database object can then use to accelerate scanning Hive layouts.
      schema - the schema of the table
      Throws:
      IOException - if an error occurs while writing the locations table
    • generateLocationsTable

      public static Table generateLocationsTable(File tableRoot, io.deephaven.shadow.enterprise.com.illumon.iris.db.schema.Schema schema)
      Produce an in-memory table containing the files and relevant metadata for a given directory.
      Parameters:
      tableRoot - the directory to scan
      schema - the schema of the table
      Returns:
      an in-memory table suitable for use with the HiveLocationsTableKeyFinder