Class ParquetTools

java.lang.Object
com.illumon.iris.db.v2.locations.parquet.ParquetTools

public final class ParquetTools extends Object
Tools for managing and manipulating tables on disk in parquet format.
  • Field Details

  • Method Details

    • readTable

      public static Table readTable(@NotNull String sourceFilePath)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFilePath - The file or directory to examine
      Returns:
      table
    • readTable

      public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFilePath - The file or directory to examine
      readInstructions - Instructions for customizations while reading
      Returns:
      table
    • readTable

      public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFilePath - The file or directory to examine
      readInstructions - Instructions for customizations while reading
      sourceTableInstructions - Instructions to control the underlying behaviors of source tables
      Returns:
      table
    • readTable

      public static Table readTable(@NotNull File sourceFile)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFile - The file or directory to examine
      Returns:
      table
    • readTable

      public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFile - The file or directory to examine
      readInstructions - Instructions for customizations while reading
      Returns:
      table
    • readTable

      public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFile - The file or directory to examine
      readInstructions - Instructions for customizations while reading
      sourceTableInstructions - Instructions to control the underlying behaviors of source tables
      Returns:
      table
    • writeTable

      public static void writeTable(@NotNull Table sourceTable, @NotNull String destPath)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destPath - destination file path; the file name should end in ".parquet" extension If the path includes non-existing directories they are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
    • writeTable

      public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destFile - destination file; the file name should end in ".parquet" extension If the path includes non-existing directories they are created
    • writeTable

      public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destFile - destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      definition - table definition to use (instead of the one implied by the table itself)
    • writeTable

      public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull ParquetInstructions writeInstructions)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destFile - destination file; its path must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      writeInstructions - instructions for customizations while writing
    • writeTable

      public static void writeTable(@NotNull Table sourceTable, @NotNull String destPath, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destPath - destination path; it must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      definition - table definition to use (instead of the one implied by the table itself)
      writeInstructions - instructions for customizations while writing
    • writeTable

      public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      definition - table definition to use (instead of the one implied by the table itself)
      writeInstructions - instructions for customizations while writing
      destFile - destination file; its path must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
    • writeParquetTables

      public static void writeParquetTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull ParquetInstructions writeInstructions, @NotNull File[] destinations, @NotNull String[] groupingColumns)
      Writes tables to disk in parquet format to a supplied set of destinations. If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with .groupBy(<grouping columns>).ungroup() or .sort(<grouping column>).
      Parameters:
      sources - The tables to write
      tableDefinition - The common schema for all the tables to write
      writeInstructions - Write instructions for customizations while writing
      destinations - The destinations paths. Any non-existing directories in the paths provided are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      groupingColumns - List of columns the tables are grouped by (the write operation will store the grouping info)
    • writeTables

      public static void writeTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull File[] destinations)
      Write out tables to disk.
      Parameters:
      sources - source tables
      tableDefinition - table definition
      destinations - destinations
    • deleteTable

      @VisibleForTesting public static void deleteTable(@NotNull File path)
      Deletes a table on disk.
      Parameters:
      path - path to delete
    • getParquetFileReader

      public static io.deephaven.parquet.base.ParquetFileReader getParquetFileReader(@NotNull File parquetFile)
      Make a ParquetFileReader for the supplied File.
      Parameters:
      parquetFile - The File to read
      Returns:
      The new ParquetFileReader
    • convertSchema

      public static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions> convertSchema(@NotNull org.apache.parquet.schema.MessageType schema, @NotNull Map<String,String> keyValueMetadata, @NotNull ParquetInstructions readInstructionsIn)
      Convert schema information from a ParquetMetadata into ColumnDefinitions.
      Parameters:
      schema - Parquet schema. DO NOT RELY ON ParquetMetadataConverter FOR THIS! USE ParquetFileReader!
      keyValueMetadata - Parquet key-value metadata map
      readInstructionsIn - Input conversion ParquetInstructions
      Returns:
      A Pair with ColumnDefinitions and adjusted ParquetInstructions
    • readDataIndexTable

      @Nullable public static Table readDataIndexTable(@NotNull File tableFile, @Nullable TableInfo info, @NotNull String... keyColumnNames)
      Read a Data Index (or grouping) table from disk. If TableInfo are provided, it will be used to aid in locating the table.
      Parameters:
      tableFile - The path to the base table
      info - An optional TableInfo object to assist in locating files
      keyColumnNames - the names of key columns
      Returns:
      the data index table for the specified key columns or null if none was found.
    • writeDataIndexTable

      public static String writeDataIndexTable(@NotNull String parentFile, @NotNull Table indexTable, @NotNull String indexColumnName, @Nullable org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodec, @NotNull String... keyColumnNames) throws SchemaMappingException, IOException
      Write out the Data Index table for the specified columns. This will place the Data Index in a table adjacent to the data table in a directory titled "Index-<Column names>".
      Parameters:
      parentFile - the full path of the parent file
      indexTable - the table containing the index
      indexColumnName - the name of the Index column
      compressionCodec - optional CompressionCodecName
      keyColumnNames - the ordered names of key columns
      Returns:
      path to the Data Index table that was written
      Throws:
      SchemaMappingException - Error creating a parquet table schema for the given table (likely due to unsupported types)
      IOException - For file writing related errors
    • writeGroupingTable

      @NotNull public static <T> String writeGroupingTable(@NotNull ParquetInstructions instructions, @NotNull com.illumon.dataobjects.ColumnDefinition<T> groupingColumnDef, @NotNull String fullOutputFilePath, @NotNull Map<T,ReadOnlyIndex> columnGrouping) throws SchemaMappingException, IOException
      Throws:
      SchemaMappingException
      IOException
    • setDefaultCompressionCodecName

      public static void setDefaultCompressionCodecName(String compressionCodecName)