Class ParquetTools
java.lang.Object
com.illumon.iris.db.v2.locations.parquet.ParquetTools
public class ParquetTools extends Object
Tools for managing and manipulating tables on disk in parquet format.
-
Field Summary
Fields Modifier and Type Field Description static String
COMMON_METADATA_FILE_NAME
static String
DEFAULT_PARQUET_FILE_NAME
static ParquetInstructions
GZIP
static ParquetInstructions
LEGACY
static ParquetInstructions
LZ4
static ParquetInstructions
LZO
static String
METADATA_FILE_NAME
static ParquetInstructions
ZSTD
-
Method Summary
Modifier and Type Method Description static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions>
convertSchema(org.apache.parquet.schema.MessageType schema, Map<String,String> keyValueMetadata, ParquetInstructions readInstructionsIn)
Convert schema information from aParquetMetadata
intoColumnDefinitions
.static void
deleteTable(File path)
Deletes a table on disk.static io.deephaven.parquet.base.ParquetFileReader
getParquetFileReader(File parquetFile)
Make aParquetFileReader
for the suppliedFile
.static Table
readSingleFileTable(ReadOnlyParquetTableLocation tableLocation, TableDefinition tableDefinition)
Reads in a table from a single parquet file using the provided table definition.static Table
readTable(File sourceFile)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(File sourceFile, ParquetInstructions readInstructions)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(String sourceFilePath)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(String sourceFilePath, ParquetInstructions readInstructions)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static void
setDefaultCompressionCodecName(String compressionCodecName)
static void
writeParquetTables(Table[] sources, TableDefinition tableDefinition, ParquetInstructions writeInstructions, File[] destinations, String[] groupingColumns)
Writes tables to disk in parquet format to a supplied set of destinations.static void
writeTable(Table sourceTable, File destFile)
Write a table to a file.static void
writeTable(Table sourceTable, File destFile, TableDefinition definition)
Write a table to a file.static void
writeTable(Table sourceTable, File destFile, TableDefinition definition, ParquetInstructions writeInstructions)
Write a table to a file.static void
writeTable(Table sourceTable, File destFile, ParquetInstructions writeInstructions)
Write a table to a file.static void
writeTable(Table sourceTable, String destPath)
Write a table to a file.static void
writeTable(Table sourceTable, String destPath, TableDefinition definition, ParquetInstructions writeInstructions)
Write a table to a file.static void
writeTables(Table[] sources, TableDefinition tableDefinition, File[] destinations)
Write out tables to disk.
-
Field Details
-
METADATA_FILE_NAME
- See Also:
- Constant Field Values
-
COMMON_METADATA_FILE_NAME
- See Also:
- Constant Field Values
-
DEFAULT_PARQUET_FILE_NAME
- See Also:
- Constant Field Values
-
LZ4
-
LZO
-
GZIP
-
ZSTD
-
LEGACY
-
-
Method Details
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
writeTable
Write a table to a file.- Parameters:
sourceTable
- source tabledestPath
- destination file path; the file name should end in ".parquet" extension If the path includes non-existing directories they are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeTable
Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; the file name should end in ".parquet" extension If the path includes non-existing directories they are created
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition)Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition
- table definition to use (instead of the one implied by the table itself)
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usewriteInstructions
- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull String destPath, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable
- source tabledestPath
- destination path; it must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition
- table definition to use (instead of the one implied by the table itself)writeInstructions
- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable
- source tabledefinition
- table definition to use (instead of the one implied by the table itself)writeInstructions
- instructions for customizations while writingdestFile
- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeParquetTables
public static void writeParquetTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull ParquetInstructions writeInstructions, @NotNull File[] destinations, @NotNull String[] groupingColumns)Writes tables to disk in parquet format to a supplied set of destinations. If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with.groupBy(<grouping columns>).ungroup()
or.sort(<grouping column>)
.- Parameters:
sources
- The tables to writetableDefinition
- The common schema for all the tables to writewriteInstructions
- Write instructions for customizations while writingdestinations
- The destinations paths. Any non existing directories in the paths provided are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usegroupingColumns
- List of columns the tables are grouped by (the write operation will store the grouping info)
-
writeTables
public static void writeTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull File[] destinations)Write out tables to disk.- Parameters:
sources
- source tablestableDefinition
- table definitiondestinations
- destinations
-
deleteTable
Deletes a table on disk.- Parameters:
path
- path to delete
-
readSingleFileTable
public static Table readSingleFileTable(@NotNull ReadOnlyParquetTableLocation tableLocation, @NotNull TableDefinition tableDefinition)Reads in a table from a single parquet file using the provided table definition.- Parameters:
tableDefinition
- The table'sdefinition
- Returns:
- The table
-
getParquetFileReader
public static io.deephaven.parquet.base.ParquetFileReader getParquetFileReader(@NotNull File parquetFile)Make aParquetFileReader
for the suppliedFile
.- Parameters:
parquetFile
- TheFile
to read- Returns:
- The new
ParquetFileReader
-
convertSchema
public static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions> convertSchema(@NotNull org.apache.parquet.schema.MessageType schema, @NotNull Map<String,String> keyValueMetadata, @NotNull ParquetInstructions readInstructionsIn)Convert schema information from aParquetMetadata
intoColumnDefinitions
.- Parameters:
schema
- Parquet schema. DO NOT RELY ONParquetMetadataConverter
FOR THIS! USEParquetFileReader
!keyValueMetadata
- Parquet key-value metadata mapreadInstructionsIn
- Input conversionParquetInstructions
- Returns:
- A
Pair
withColumnDefinitions
and adjustedParquetInstructions
-
setDefaultCompressionCodecName
-