Class ParquetTools
java.lang.Object
com.illumon.iris.db.v2.locations.parquet.ParquetTools
public class ParquetTools extends Object
Tools for managing and manipulating tables on disk in parquet format.
-
Field Summary
Fields Modifier and Type Field Description static StringCOMMON_METADATA_FILE_NAMEstatic StringDEFAULT_PARQUET_FILE_NAMEstatic ParquetInstructionsGZIPstatic ParquetInstructionsLEGACYstatic ParquetInstructionsLZ4static ParquetInstructionsLZOstatic StringMETADATA_FILE_NAMEstatic ParquetInstructionsZSTD -
Method Summary
Modifier and Type Method Description static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions>convertSchema(org.apache.parquet.schema.MessageType schema, Map<String,String> keyValueMetadata, ParquetInstructions readInstructionsIn)Convert schema information from aParquetMetadataintoColumnDefinitions.static voiddeleteTable(File path)Deletes a table on disk.static io.deephaven.parquet.base.ParquetFileReadergetParquetFileReader(File parquetFile)Make aParquetFileReaderfor the suppliedFile.static TablereadSingleFileTable(ReadOnlyParquetTableLocation tableLocation, TableDefinition tableDefinition)Reads in a table from a single parquet file using the provided table definition.static TablereadTable(File sourceFile)Reads in a table from a single parquet, metadata file, or directory with recognized layout.static TablereadTable(File sourceFile, ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.static TablereadTable(String sourceFilePath)Reads in a table from a single parquet, metadata file, or directory with recognized layout.static TablereadTable(String sourceFilePath, ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.static voidsetDefaultCompressionCodecName(String compressionCodecName)static voidwriteParquetTables(Table[] sources, TableDefinition tableDefinition, ParquetInstructions writeInstructions, File[] destinations, String[] groupingColumns)Writes tables to disk in parquet format to a supplied set of destinations.static voidwriteTable(Table sourceTable, File destFile)Write a table to a file.static voidwriteTable(Table sourceTable, File destFile, TableDefinition definition)Write a table to a file.static voidwriteTable(Table sourceTable, File destFile, TableDefinition definition, ParquetInstructions writeInstructions)Write a table to a file.static voidwriteTable(Table sourceTable, File destFile, ParquetInstructions writeInstructions)Write a table to a file.static voidwriteTable(Table sourceTable, String destPath)Write a table to a file.static voidwriteTable(Table sourceTable, String destPath, TableDefinition definition, ParquetInstructions writeInstructions)Write a table to a file.static voidwriteTables(Table[] sources, TableDefinition tableDefinition, File[] destinations)Write out tables to disk.
-
Field Details
-
METADATA_FILE_NAME
- See Also:
- Constant Field Values
-
COMMON_METADATA_FILE_NAME
- See Also:
- Constant Field Values
-
DEFAULT_PARQUET_FILE_NAME
- See Also:
- Constant Field Values
-
LZ4
-
LZO
-
GZIP
-
ZSTD
-
LEGACY
-
-
Method Details
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath- The file or directory to examinereadInstructions- Instructions for customizations while reading- Returns:
- table
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile- The file or directory to examinereadInstructions- Instructions for customizations while reading- Returns:
- table
-
writeTable
Write a table to a file.- Parameters:
sourceTable- source tabledestPath- destination file path; the file name should end in ".parquet" extension If the path includes non-existing directories they are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeTable
Write a table to a file.- Parameters:
sourceTable- source tabledestFile- destination file; the file name should end in ".parquet" extension If the path includes non-existing directories they are created
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition)Write a table to a file.- Parameters:
sourceTable- source tabledestFile- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition- table definition to use (instead of the one implied by the table itself)
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable- source tabledestFile- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usewriteInstructions- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull String destPath, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable- source tabledestPath- destination path; it must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition- table definition to use (instead of the one implied by the table itself)writeInstructions- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable- source tabledefinition- table definition to use (instead of the one implied by the table itself)writeInstructions- instructions for customizations while writingdestFile- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeParquetTables
public static void writeParquetTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull ParquetInstructions writeInstructions, @NotNull File[] destinations, @NotNull String[] groupingColumns)Writes tables to disk in parquet format to a supplied set of destinations. If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with.groupBy(<grouping columns>).ungroup()or.sort(<grouping column>).- Parameters:
sources- The tables to writetableDefinition- The common schema for all the tables to writewriteInstructions- Write instructions for customizations while writingdestinations- The destinations paths. Any non existing directories in the paths provided are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usegroupingColumns- List of columns the tables are grouped by (the write operation will store the grouping info)
-
writeTables
public static void writeTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull File[] destinations)Write out tables to disk.- Parameters:
sources- source tablestableDefinition- table definitiondestinations- destinations
-
deleteTable
Deletes a table on disk.- Parameters:
path- path to delete
-
readSingleFileTable
public static Table readSingleFileTable(@NotNull ReadOnlyParquetTableLocation tableLocation, @NotNull TableDefinition tableDefinition)Reads in a table from a single parquet file using the provided table definition.- Parameters:
tableDefinition- The table'sdefinition- Returns:
- The table
-
getParquetFileReader
public static io.deephaven.parquet.base.ParquetFileReader getParquetFileReader(@NotNull File parquetFile)Make aParquetFileReaderfor the suppliedFile.- Parameters:
parquetFile- TheFileto read- Returns:
- The new
ParquetFileReader
-
convertSchema
public static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions> convertSchema(@NotNull org.apache.parquet.schema.MessageType schema, @NotNull Map<String,String> keyValueMetadata, @NotNull ParquetInstructions readInstructionsIn)Convert schema information from aParquetMetadataintoColumnDefinitions.- Parameters:
schema- Parquet schema. DO NOT RELY ONParquetMetadataConverterFOR THIS! USEParquetFileReader!keyValueMetadata- Parquet key-value metadata mapreadInstructionsIn- Input conversionParquetInstructions- Returns:
- A
PairwithColumnDefinitionsand adjustedParquetInstructions
-
setDefaultCompressionCodecName
-