Class ParquetTools

java.lang.Object
com.illumon.iris.db.v2.locations.parquet.ParquetTools

public class ParquetTools
extends Object
Tools for managing and manipulating tables on disk in parquet format.
  • Field Details

  • Method Details

    • readTable

      public static Table readTable​(@NotNull String sourceFilePath)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFilePath - The file or directory to examine
      Returns:
      table
    • readTable

      public static Table readTable​(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFilePath - The file or directory to examine
      readInstructions - Instructions for customizations while reading
      Returns:
      table
    • readTable

      public static Table readTable​(@NotNull File sourceFile)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFile - The file or directory to examine
      Returns:
      table
    • readTable

      public static Table readTable​(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions)
      Reads in a table from a single parquet, metadata file, or directory with recognized layout.
      Parameters:
      sourceFile - The file or directory to examine
      readInstructions - Instructions for customizations while reading
      Returns:
      table
    • writeTable

      public static void writeTable​(@NotNull Table sourceTable, @NotNull String destPath)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destPath - destination file path; the file name should end in ".parquet" extension If the path includes non-existing directories they are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
    • writeTable

      public static void writeTable​(@NotNull Table sourceTable, @NotNull File destFile)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destFile - destination file; the file name should end in ".parquet" extension If the path includes non-existing directories they are created
    • writeTable

      public static void writeTable​(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destFile - destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      definition - table definition to use (instead of the one implied by the table itself)
    • writeTable

      public static void writeTable​(@NotNull Table sourceTable, @NotNull File destFile, @NotNull ParquetInstructions writeInstructions)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destFile - destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      writeInstructions - instructions for customizations while writing
    • writeTable

      public static void writeTable​(@NotNull Table sourceTable, @NotNull String destPath, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      destPath - destination path; it must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      definition - table definition to use (instead of the one implied by the table itself)
      writeInstructions - instructions for customizations while writing
    • writeTable

      public static void writeTable​(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)
      Write a table to a file.
      Parameters:
      sourceTable - source table
      definition - table definition to use (instead of the one implied by the table itself)
      writeInstructions - instructions for customizations while writing
      destFile - destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
    • writeParquetTables

      public static void writeParquetTables​(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull ParquetInstructions writeInstructions, @NotNull File[] destinations, @NotNull String[] groupingColumns)
      Writes tables to disk in parquet format to a supplied set of destinations. If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with .groupBy(<grouping columns>).ungroup() or .sort(<grouping column>).
      Parameters:
      sources - The tables to write
      tableDefinition - The common schema for all the tables to write
      writeInstructions - Write instructions for customizations while writing
      destinations - The destinations paths. Any non existing directories in the paths provided are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
      groupingColumns - List of columns the tables are grouped by (the write operation will store the grouping info)
    • writeTables

      public static void writeTables​(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull File[] destinations)
      Write out tables to disk.
      Parameters:
      sources - source tables
      tableDefinition - table definition
      destinations - destinations
    • deleteTable

      @VisibleForTesting public static void deleteTable​(File path)
      Deletes a table on disk.
      Parameters:
      path - path to delete
    • readSingleFileTable

      public static Table readSingleFileTable​(@NotNull ReadOnlyParquetTableLocation tableLocation, @NotNull TableDefinition tableDefinition)
      Reads in a table from a single parquet file using the provided table definition.
      Parameters:
      tableDefinition - The table's definition
      Returns:
      The table
    • getParquetFileReader

      public static io.deephaven.parquet.base.ParquetFileReader getParquetFileReader​(@NotNull File parquetFile)
      Make a ParquetFileReader for the supplied File.
      Parameters:
      parquetFile - The File to read
      Returns:
      The new ParquetFileReader
    • convertSchema

      public static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,​ParquetInstructions> convertSchema​(@NotNull org.apache.parquet.schema.MessageType schema, @NotNull Map<String,​String> keyValueMetadata, @NotNull ParquetInstructions readInstructionsIn)
      Convert schema information from a ParquetMetadata into ColumnDefinitions.
      Parameters:
      schema - Parquet schema. DO NOT RELY ON ParquetMetadataConverter FOR THIS! USE ParquetFileReader!
      keyValueMetadata - Parquet key-value metadata map
      readInstructionsIn - Input conversion ParquetInstructions
      Returns:
      A Pair with ColumnDefinitions and adjusted ParquetInstructions
    • setDefaultCompressionCodecName

      public static void setDefaultCompressionCodecName​(String compressionCodecName)