Class ParquetTools
java.lang.Object
com.illumon.iris.db.v2.locations.parquet.ParquetTools
Tools for managing and manipulating tables on disk in parquet format.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final ParquetInstructionsstatic final ParquetInstructionsstatic final ParquetInstructionsstatic final ParquetInstructionsstatic final Stringstatic final ParquetInstructions -
Method Summary
Modifier and TypeMethodDescriptionstatic com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions> convertSchema(org.apache.parquet.schema.MessageType schema, Map<String, String> keyValueMetadata, ParquetInstructions readInstructionsIn) Convert schema information from aParquetMetadataintoColumnDefinitions.static voiddeleteTable(File path) Deletes a table on disk.static io.deephaven.parquet.base.ParquetFileReadergetParquetFileReader(File parquetFile) Make aParquetFileReaderfor the suppliedFile.static TablereadDataIndexTable(File tableFile, TableInfo info, String... keyColumnNames) Read a Data Index (or grouping) table from disk.static TableReads in a table from a single parquet, metadata file, or directory with recognized layout.static TablereadTable(File sourceFile, ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static TablereadTable(File sourceFile, ParquetInstructions readInstructions, SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static TableReads in a table from a single parquet, metadata file, or directory with recognized layout.static TablereadTable(String sourceFilePath, ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static TablereadTable(String sourceFilePath, ParquetInstructions readInstructions, SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static voidsetDefaultCompressionCodecName(String compressionCodecName) static StringwriteDataIndexTable(String parentFile, Table indexTable, String indexColumnName, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodec, String... keyColumnNames) Write out the Data Index table for the specified columns.static <T> StringwriteGroupingTable(ParquetInstructions instructions, com.illumon.dataobjects.ColumnDefinition<T> groupingColumnDef, String fullOutputFilePath, Map<T, ReadOnlyIndex> columnGrouping) static voidwriteParquetTables(Table[] sources, TableDefinition tableDefinition, ParquetInstructions writeInstructions, File[] destinations, String[] groupingColumns) Writes tables to disk in parquet format to a supplied set of destinations.static voidwriteTable(Table sourceTable, File destFile) Write a table to a file.static voidwriteTable(Table sourceTable, File destFile, TableDefinition definition) Write a table to a file.static voidwriteTable(Table sourceTable, File destFile, TableDefinition definition, ParquetInstructions writeInstructions) Write a table to a file.static voidwriteTable(Table sourceTable, File destFile, ParquetInstructions writeInstructions) Write a table to a file.static voidwriteTable(Table sourceTable, String destPath) Write a table to a file.static voidwriteTable(Table sourceTable, String destPath, TableDefinition definition, ParquetInstructions writeInstructions) Write a table to a file.static voidwriteTables(Table[] sources, TableDefinition tableDefinition, File[] destinations) Write out tables to disk.
-
Field Details
-
METADATA_FILE_NAME
- See Also:
-
COMMON_METADATA_FILE_NAME
- See Also:
-
DEFAULT_PARQUET_FILE_NAME
- See Also:
-
BEGIN_POS
- See Also:
-
END_POS
- See Also:
-
GROUPING_KEY
- See Also:
-
LZ4
-
LZO
-
GZIP
-
ZSTD
-
LEGACY
-
-
Method Details
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath- The file or directory to examinereadInstructions- Instructions for customizations while reading- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath- The file or directory to examinereadInstructions- Instructions for customizations while readingsourceTableInstructions- Instructions to control the underlying behaviors ofsource tables- Returns:
- table
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile- The file or directory to examinereadInstructions- Instructions for customizations while reading- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile- The file or directory to examinereadInstructions- Instructions for customizations while readingsourceTableInstructions- Instructions to control the underlying behaviors ofsource tables- Returns:
- table
-
writeTable
Write a table to a file.- Parameters:
sourceTable- source tabledestPath- destination file path; the file name should end in ".parquet" extension If the path includes non-existing directories they are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeTable
Write a table to a file.- Parameters:
sourceTable- source tabledestFile- destination file; the file name should end in ".parquet" extension If the path includes non-existing directories they are created
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition) Write a table to a file.- Parameters:
sourceTable- source tabledestFile- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition- table definition to use (instead of the one implied by the table itself)
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull ParquetInstructions writeInstructions) Write a table to a file.- Parameters:
sourceTable- source tabledestFile- destination file; its path must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usewriteInstructions- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull String destPath, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions) Write a table to a file.- Parameters:
sourceTable- source tabledestPath- destination path; it must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition- table definition to use (instead of the one implied by the table itself)writeInstructions- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions) Write a table to a file.- Parameters:
sourceTable- source tabledefinition- table definition to use (instead of the one implied by the table itself)writeInstructions- instructions for customizations while writingdestFile- destination file; its path must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeParquetTables
public static void writeParquetTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull ParquetInstructions writeInstructions, @NotNull File[] destinations, @NotNull String[] groupingColumns) Writes tables to disk in parquet format to a supplied set of destinations. If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with.groupBy(<grouping columns>).ungroup()or.sort(<grouping column>).- Parameters:
sources- The tables to writetableDefinition- The common schema for all the tables to writewriteInstructions- Write instructions for customizations while writingdestinations- The destinations paths. Any non-existing directories in the paths provided are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usegroupingColumns- List of columns the tables are grouped by (the write operation will store the grouping info)
-
writeTables
public static void writeTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull File[] destinations) Write out tables to disk.- Parameters:
sources- source tablestableDefinition- table definitiondestinations- destinations
-
deleteTable
Deletes a table on disk.- Parameters:
path- path to delete
-
getParquetFileReader
public static io.deephaven.parquet.base.ParquetFileReader getParquetFileReader(@NotNull File parquetFile) Make aParquetFileReaderfor the suppliedFile.- Parameters:
parquetFile- TheFileto read- Returns:
- The new
ParquetFileReader
-
convertSchema
public static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions> convertSchema(@NotNull org.apache.parquet.schema.MessageType schema, @NotNull Map<String, String> keyValueMetadata, @NotNull ParquetInstructions readInstructionsIn) Convert schema information from aParquetMetadataintoColumnDefinitions.- Parameters:
schema- Parquet schema. DO NOT RELY ONParquetMetadataConverterFOR THIS! USEParquetFileReader!keyValueMetadata- Parquet key-value metadata mapreadInstructionsIn- Input conversionParquetInstructions- Returns:
- A
PairwithColumnDefinitionsand adjustedParquetInstructions
-
readDataIndexTable
@Nullable public static Table readDataIndexTable(@NotNull File tableFile, @Nullable TableInfo info, @NotNull String... keyColumnNames) Read a Data Index (or grouping) table from disk. IfTableInfoare provided, it will be used to aid in locating the table.- Parameters:
tableFile- The path to the base tableinfo- An optionalTableInfoobject to assist in locating fileskeyColumnNames- the names of key columns- Returns:
- the data index table for the specified key columns or
nullif none was found.
-
writeDataIndexTable
public static String writeDataIndexTable(@NotNull String parentFile, @NotNull Table indexTable, @NotNull String indexColumnName, @Nullable org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodec, @NotNull String... keyColumnNames) throws SchemaMappingException, IOException Write out the Data Index table for the specified columns. This will place the Data Index in a table adjacent to the data table in a directory titled "Index-<Column names>".- Parameters:
parentFile- the full path of the parent fileindexTable- the table containing the indexindexColumnName- the name of the Index columncompressionCodec- optional CompressionCodecNamekeyColumnNames- the ordered names of key columns- Returns:
- path to the Data Index table that was written
- Throws:
SchemaMappingException- Error creating a parquet table schema for the given table (likely due to unsupported types)IOException- For file writing related errors
-
writeGroupingTable
@NotNull public static <T> String writeGroupingTable(@NotNull ParquetInstructions instructions, @NotNull com.illumon.dataobjects.ColumnDefinition<T> groupingColumnDef, @NotNull String fullOutputFilePath, @NotNull Map<T, ReadOnlyIndex> columnGrouping) throws SchemaMappingException, IOException- Throws:
SchemaMappingExceptionIOException
-
setDefaultCompressionCodecName
-