Class ParquetTools
java.lang.Object
com.illumon.iris.db.v2.locations.parquet.ParquetTools
public class ParquetTools extends Object
Tools for managing and manipulating tables on disk in parquet format.
-
Field Summary
Fields Modifier and Type Field Description static String
BEGIN_POS
static String
COMMON_METADATA_FILE_NAME
static String
DEFAULT_PARQUET_FILE_NAME
static String
END_POS
static String
GROUPING_KEY
static ParquetInstructions
GZIP
static ParquetInstructions
LEGACY
static ParquetInstructions
LZ4
static ParquetInstructions
LZO
static String
METADATA_FILE_NAME
static ParquetInstructions
ZSTD
-
Method Summary
Modifier and Type Method Description static String
computeDataIndexTableName(String path, String... columnName)
static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions>
convertSchema(org.apache.parquet.schema.MessageType schema, Map<String,String> keyValueMetadata, ParquetInstructions readInstructionsIn)
Convert schema information from aParquetMetadata
intoColumnDefinitions
.static void
deleteTable(File path)
Deletes a table on disk.static io.deephaven.parquet.base.ParquetFileReader
getParquetFileReader(File parquetFile)
Make aParquetFileReader
for the suppliedFile
.static Table
readDataIndexTable(File tableFile, TableInfo info, String... keyColumnNames)
Read a Data Index (or grouping) table from disk.static Table
readSingleFileTable(ReadOnlyParquetTableLocation tableLocation, TableDefinition tableDefinition, SourceTableInstructions sourceTableInstructions)
Reads in a table from a single parquet file using the provided table definition.static Table
readTable(File sourceFile)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(File sourceFile, ParquetInstructions readInstructions)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(File sourceFile, ParquetInstructions readInstructions, SourceTableInstructions sourceTableInstructions)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(String sourceFilePath)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(String sourceFilePath, ParquetInstructions readInstructions)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static Table
readTable(String sourceFilePath, ParquetInstructions readInstructions, SourceTableInstructions sourceTableInstructions)
Reads in a table from a single parquet, metadata file, or directory with recognized layout.static void
setDefaultCompressionCodecName(String compressionCodecName)
static String
writeDataIndexTable(String parentFile, Table indexTable, String indexColumnName, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodec, String... keyColumnNames)
Write out the Data Index table for the specified columns.static <T> String
writeGroupingTable(ParquetInstructions instructions, com.illumon.dataobjects.ColumnDefinition<T> groupingColumnDef, String fullOutputFilePath, Map<T,ReadOnlyIndex> columnGrouping)
static void
writeParquetTables(Table[] sources, TableDefinition tableDefinition, ParquetInstructions writeInstructions, File[] destinations, String[] groupingColumns)
Writes tables to disk in parquet format to a supplied set of destinations.static void
writeTable(Table sourceTable, File destFile)
Write a table to a file.static void
writeTable(Table sourceTable, File destFile, TableDefinition definition)
Write a table to a file.static void
writeTable(Table sourceTable, File destFile, TableDefinition definition, ParquetInstructions writeInstructions)
Write a table to a file.static void
writeTable(Table sourceTable, File destFile, ParquetInstructions writeInstructions)
Write a table to a file.static void
writeTable(Table sourceTable, String destPath)
Write a table to a file.static void
writeTable(Table sourceTable, String destPath, TableDefinition definition, ParquetInstructions writeInstructions)
Write a table to a file.static void
writeTables(Table[] sources, TableDefinition tableDefinition, File[] destinations)
Write out tables to disk.
-
Field Details
-
METADATA_FILE_NAME
- See Also:
- Constant Field Values
-
COMMON_METADATA_FILE_NAME
- See Also:
- Constant Field Values
-
DEFAULT_PARQUET_FILE_NAME
- See Also:
- Constant Field Values
-
BEGIN_POS
- See Also:
- Constant Field Values
-
END_POS
- See Also:
- Constant Field Values
-
GROUPING_KEY
- See Also:
- Constant Field Values
-
LZ4
-
LZO
-
GZIP
-
ZSTD
-
LEGACY
-
-
Method Details
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
readTable
public static Table readTable(@NotNull String sourceFilePath, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFilePath
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
readTable
Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examine- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
readTable
public static Table readTable(@NotNull File sourceFile, @NotNull ParquetInstructions readInstructions, @NotNull SourceTableInstructions sourceTableInstructions)Reads in a table from a single parquet, metadata file, or directory with recognized layout.- Parameters:
sourceFile
- The file or directory to examinereadInstructions
- Instructions for customizations while reading- Returns:
- table
-
writeTable
Write a table to a file.- Parameters:
sourceTable
- source tabledestPath
- destination file path; the file name should end in ".parquet" extension If the path includes non-existing directories they are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeTable
Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; the file name should end in ".parquet" extension If the path includes non-existing directories they are created
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition)Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition
- table definition to use (instead of the one implied by the table itself)
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable
- source tabledestFile
- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usewriteInstructions
- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull String destPath, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable
- source tabledestPath
- destination path; it must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usedefinition
- table definition to use (instead of the one implied by the table itself)writeInstructions
- instructions for customizations while writing
-
writeTable
public static void writeTable(@NotNull Table sourceTable, @NotNull File destFile, @NotNull TableDefinition definition, @NotNull ParquetInstructions writeInstructions)Write a table to a file.- Parameters:
sourceTable
- source tabledefinition
- table definition to use (instead of the one implied by the table itself)writeInstructions
- instructions for customizations while writingdestFile
- destination file; its path must end in ".parquet". Any non existing directories in the path are created If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent use
-
writeParquetTables
public static void writeParquetTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull ParquetInstructions writeInstructions, @NotNull File[] destinations, @NotNull String[] groupingColumns)Writes tables to disk in parquet format to a supplied set of destinations. If you specify grouping columns, there must already be grouping information for those columns in the sources. This can be accomplished with.groupBy(<grouping columns>).ungroup()
or.sort(<grouping column>)
.- Parameters:
sources
- The tables to writetableDefinition
- The common schema for all the tables to writewriteInstructions
- Write instructions for customizations while writingdestinations
- The destinations paths. Any non existing directories in the paths provided are created. If there is an error any intermediate directories previously created are removed; note this makes this method unsafe for concurrent usegroupingColumns
- List of columns the tables are grouped by (the write operation will store the grouping info)
-
writeTables
public static void writeTables(@NotNull Table[] sources, @NotNull TableDefinition tableDefinition, @NotNull File[] destinations)Write out tables to disk.- Parameters:
sources
- source tablestableDefinition
- table definitiondestinations
- destinations
-
deleteTable
Deletes a table on disk.- Parameters:
path
- path to delete
-
readSingleFileTable
public static Table readSingleFileTable(@NotNull ReadOnlyParquetTableLocation tableLocation, @NotNull TableDefinition tableDefinition, @NotNull SourceTableInstructions sourceTableInstructions)Reads in a table from a single parquet file using the provided table definition.- Parameters:
tableDefinition
- The table'sdefinition
- Returns:
- The table
-
getParquetFileReader
public static io.deephaven.parquet.base.ParquetFileReader getParquetFileReader(@NotNull File parquetFile)Make aParquetFileReader
for the suppliedFile
.- Parameters:
parquetFile
- TheFile
to read- Returns:
- The new
ParquetFileReader
-
convertSchema
public static com.fishlib.base.Pair<List<com.illumon.dataobjects.ColumnDefinition>,ParquetInstructions> convertSchema(@NotNull org.apache.parquet.schema.MessageType schema, @NotNull Map<String,String> keyValueMetadata, @NotNull ParquetInstructions readInstructionsIn)Convert schema information from aParquetMetadata
intoColumnDefinitions
.- Parameters:
schema
- Parquet schema. DO NOT RELY ONParquetMetadataConverter
FOR THIS! USEParquetFileReader
!keyValueMetadata
- Parquet key-value metadata mapreadInstructionsIn
- Input conversionParquetInstructions
- Returns:
- A
Pair
withColumnDefinitions
and adjustedParquetInstructions
-
readDataIndexTable
@Nullable public static Table readDataIndexTable(@NotNull File tableFile, @Nullable TableInfo info, @NotNull String... keyColumnNames)Read a Data Index (or grouping) table from disk. IfTableInfo
are provided, it will be used to aid in locating the table.- Parameters:
tableFile
- The path to the base tableinfo
- An optionalTableInfo
object to assist in locating fileskeyColumnNames
- the names of key columns- Returns:
- the data index table for the specified key columns or
null
if none was found.
-
writeDataIndexTable
public static String writeDataIndexTable(@NotNull String parentFile, @NotNull Table indexTable, @NotNull String indexColumnName, @Nullable org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodec, @NotNull String... keyColumnNames) throws SchemaMappingException, IOExceptionWrite out the Data Index table for the specified columns. This will place the Data Index in a table adjacent to the data table in a directory titled "Index-<Column names>".- Parameters:
parentFile
- the full path of the parent fileindexTable
- the table containing the indexindexColumnName
- the name of the Index columnkeyColumnNames
- the ordered names of key columns- Throws:
SchemaMappingException
IOException
-
writeGroupingTable
@NotNull public static <T> String writeGroupingTable(@NotNull ParquetInstructions instructions, com.illumon.dataobjects.ColumnDefinition<T> groupingColumnDef, String fullOutputFilePath, Map<T,ReadOnlyIndex> columnGrouping) throws SchemaMappingException, IOException- Throws:
SchemaMappingException
IOException
-
computeDataIndexTableName
-
setDefaultCompressionCodecName
-