Class ParquetTools
java.lang.Object
com.illumon.iris.db.v2.locations.parquet.ParquetTools
Tools for managing and manipulating tables on disk in parquet format.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
static final String
static final String
static final String
static final String
static final ParquetInstructions
static final ParquetInstructions
static final ParquetInstructions
static final ParquetInstructions
static final String
static final ParquetInstructions
-
Method Summary
Modifier and TypeMethodDescriptionstatic com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,
ParquetInstructions> convertSchema
(org.apache.parquet.schema.MessageType schema, Map<String, String> keyValueMetadata, ParquetInstructions readInstructionsIn) Convert schema information from aParquetMetadata
intoColumnDefinitions
.static void
deleteTable
(File path) Deletes a table on disk.static io.deephaven.parquet.base.ParquetFileReader
getParquetFileReader
(File parquetFile) Make aParquetFileReader
for the suppliedFile
.static Table
readDataIndexTable
(File tableFile, TableInfo info, String... keyColumnNames) Read a Data Index (or grouping) table from disk.static Table
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable
(File sourceFile, ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable
(File sourceFile, ParquetInstructions readInstructions, SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable
(String sourceFilePath, ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable
(String sourceFilePath, ParquetInstructions readInstructions, SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.static void
setDefaultCompressionCodecName
(String compressionCodecName) static String
writeDataIndexTable
(String parentFile, Table indexTable, String indexColumnName, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodec, String... keyColumnNames) Write out the Data Index table for the specified columns.static <T> String
writeGroupingTable
(ParquetInstructions instructions, com.illumon.dataobjects.ColumnDefinition<T> groupingColumnDef, String fullOutputFilePath, Map<T, ReadOnlyIndex> columnGrouping) static void
writeParquetTables
(Table[] sources, TableDefinition tableDefinition, ParquetInstructions writeInstructions, File[] destinations, String[] groupingColumns) Writes tables to disk in parquet format to a supplied set of destinations.static void
writeTable
(Table sourceTable, File destFile) Write a table to a file.static void
writeTable
(Table sourceTable, File destFile, TableDefinition definition) Write a table to a file.static void
writeTable
(Table sourceTable, File destFile, TableDefinition definition, ParquetInstructions writeInstructions) Write a table to a file.static void
writeTable
(Table sourceTable, File destFile, ParquetInstructions writeInstructions) Write a table to a file.static void
writeTable
(Table sourceTable, String destPath) Write a table to a file.static void
writeTable
(Table sourceTable, String destPath, TableDefinition definition, ParquetInstructions writeInstructions) Write a table to a file.static void
writeTables
(Table[] sources, TableDefinition tableDefinition, File[] destinations) Write out tables to disk.
-
Field Details
-
METADATA_FILE_NAME
- See Also:
-
COMMON_METADATA_FILE_NAME
- See Also:
-
DEFAULT_PARQUET_FILE_NAME
- See Also:
-
BEGIN_POS
- See Also:
-
END_POS
- See Also:
-
GROUPING_KEY
- See Also:
-
LZ4
-
LZO
-
GZIP
-
ZSTD
-
LEGACY
-
-
Method Details
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examinereadInstructions
- Instructions for customizations while readingsourceTableInstructions
- Instructions to control the underlying behaviors ofsource tables
- Returns:
- table
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions) Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examinereadInstructions
- Instructions for customizations while readingsourceTableInstructions
- Instructions to control the underlying behaviors ofsource tables
- Returns:
- table
-
writeTable
Write a table to a file.- Parameters:
sourceTable
- source tabledestPath
- destination file path; the file name should end in ".parquet" extension If the path includes non-existing directories they are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeTable
Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; the file name should end in ".parquet" extension If the path includes non-existing directories they are created
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition) Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition
- table definition to use (instead of the one implied by the table itself)
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull ParquetInstructions writeInstructions) Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; its path must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usewriteInstructions
- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull String destPath, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions) Write a table to a file.- Parameters:
sourceTable
- source tabledestPath
- destination path; it must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition
- table definition to use (instead of the one implied by the table itself)writeInstructions
- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions) Write a table to a file.- Parameters:
sourceTable
- source tabledefinition
- table definition to use (instead of the one implied by the table itself)writeInstructions
- instructions for customizations while writingdestFile
- destination file; its path must end in ".parquet". Any non-existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeParquetTables
public static void writeParquetTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull ParquetInstructions writeInstructions, @NotNull File[] destinations, @NotNull String[] groupingColumns) Writes tables to disk in parquet format to a supplied set of destinations. If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with.groupBy(<grouping columns>).ungroup()
or.sort(<grouping column>)
.- Parameters:
sources
- The tables to writetableDefinition
- The common schema for all the tables to writewriteInstructions
- Write instructions for customizations while writingdestinations
- The destinations paths. Any non-existing directories in the paths provided are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usegroupingColumns
- List of columns the tables are grouped by (the write operation will store the grouping info)
-
writeTables
public static void writeTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull File[] destinations) Write out tables to disk.- Parameters:
sources
- source tablestableDefinition
- table definitiondestinations
- destinations
-
deleteTable
Deletes a table on disk.- Parameters:
path
- path to delete
-
getParquetFileReader
public static io.deephaven.parquet.base.ParquetFileReader getParquetFileReader(@NotNull File parquetFile) Make aParquetFileReader
for the suppliedFile
.- Parameters:
parquetFile
- TheFile
to read- Returns:
- The new
ParquetFileReader
-
convertSchema
public static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions> convertSchema(@NotNull org.apache.parquet.schema.MessageType schema, @NotNull Map<String, String> keyValueMetadata, @NotNull ParquetInstructions readInstructionsIn) Convert schema information from aParquetMetadata
intoColumnDefinitions
.- Parameters:
schema
- Parquet schema. DO NOT RELY ONParquetMetadataConverter
FOR THIS! USEParquetFileReader
!keyValueMetadata
- Parquet key-value metadata mapreadInstructionsIn
- Input conversionParquetInstructions
- Returns:
- A
Pair
withColumnDefinitions
and adjustedParquetInstructions
-
readDataIndexTable
@Nullable public static Table readDataIndexTable(@NotNull File tableFile, @Nullable TableInfo info, @NotNull String... keyColumnNames) Read a Data Index (or grouping) table from disk. IfTableInfo
are provided, it will be used to aid in locating the table.- Parameters:
tableFile
- The path to the base tableinfo
- An optionalTableInfo
object to assist in locating fileskeyColumnNames
- the names of key columns- Returns:
- the data index table for the specified key columns or
null
if none was found.
-
writeDataIndexTable
public static String writeDataIndexTable(@NotNull String parentFile, @NotNull Table indexTable, @NotNull String indexColumnName, @Nullable org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodec, @NotNull String... keyColumnNames) throws SchemaMappingException, IOException Write out the Data Index table for the specified columns. This will place the Data Index in a table adjacent to the data table in a directory titled "Index-<Column names>".- Parameters:
parentFile
- the full path of the parent fileindexTable
- the table containing the indexindexColumnName
- the name of the Index columncompressionCodec
- optional CompressionCodecNamekeyColumnNames
- the ordered names of key columns- Returns:
- path to the Data Index table that was written
- Throws:
SchemaMappingException
- Error creating a parquet table schema for the given table (likely due to unsupported types)IOException
- For file writing related errors
-
writeGroupingTable
@NotNull public static <T> String writeGroupingTable(@NotNull ParquetInstructions instructions, @NotNull com.illumon.dataobjects.ColumnDefinition<T> groupingColumnDef, @NotNull String fullOutputFilePath, @NotNull Map<T, ReadOnlyIndex> columnGrouping) throws SchemaMappingException, IOException- Throws:
SchemaMappingException
IOException
-
setDefaultCompressionCodecName
-