Package io.deephaven.parquet.table
Class ParquetInstructions
java.lang.Object
io.deephaven.parquet.table.ParquetInstructions
- All Implemented Interfaces:
ColumnToCodecMappings
This class provides instructions intended for read and write parquet operations (which take it as an optional
argument) specifying desired transformations. Examples are mapping column names and use of specific codecs during
(de)serialization.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classstatic interfacestatic enum -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringstatic final intstatic final intstatic final intstatic final ParquetInstructionsstatic final int -
Method Summary
Modifier and TypeMethodDescriptionabstract Stringstatic ParquetInstructions.Builderbuilder()abstract booleanabstract StringgetCodecArgs(String columnName) abstract StringgetCodecName(String columnName) abstract StringgetColumnNameFromParquetColumnName(String parquetColumnName) final StringgetColumnNameFromParquetColumnNameOrDefault(String parquetColumnName) abstract Optional<ParquetColumnResolver.Factory>abstract Stringabstract OptionalIntgetFieldId(String columnName) The field ID for the givencolumnName.abstract Optional<Collection<List<String>>>abstract intabstract intabstract StringgetParquetColumnNameFromColumnNameOrDefault(String columnName) abstract RowGroupInfoabstract Optional<SeekableChannelsProvider>abstract Objectabstract Optional<TableDefinition>abstract intabstract booleanabstract booleanabstract Optional<ParquetInstructions.OnWriteCompleted>static booleanabstract booleanuseDictionary(String columnName) abstract ParquetInstructionswithLayout(ParquetInstructions.ParquetFileLayout fileLayout) Creates a newParquetInstructionsobject with the same properties as the current object but layout set as the providedParquetInstructions.ParquetFileLayout.abstract ParquetInstructionswithTableDefinition(TableDefinition tableDefinition) Creates a newParquetInstructionsobject with the same properties as the current object but definition set as the providedTableDefinition.abstract ParquetInstructionswithTableDefinitionAndLayout(TableDefinition tableDefinition, ParquetInstructions.ParquetFileLayout fileLayout) Creates a newParquetInstructionsobject with the same properties as the current object but definition and layout set as the provided values.
-
Field Details
-
DEFAULT_COMPRESSION_CODEC_NAME
-
DEFAULT_MAXIMUM_DICTIONARY_KEYS
public static final int DEFAULT_MAXIMUM_DICTIONARY_KEYS- See Also:
-
DEFAULT_MAXIMUM_DICTIONARY_SIZE
public static final int DEFAULT_MAXIMUM_DICTIONARY_SIZE- See Also:
-
MIN_TARGET_PAGE_SIZE
public static final int MIN_TARGET_PAGE_SIZE -
DEFAULT_TARGET_PAGE_SIZE
public static final int DEFAULT_TARGET_PAGE_SIZE -
EMPTY
-
-
Method Details
-
getColumnNameFromParquetColumnNameOrDefault
-
getParquetColumnNameFromColumnNameOrDefault
-
getColumnNameFromParquetColumnName
-
getCodecName
- Specified by:
getCodecNamein interfaceColumnToCodecMappings
-
getCodecArgs
- Specified by:
getCodecArgsin interfaceColumnToCodecMappings
-
useDictionary
- Returns:
- A hint that the writer should use dictionary-based encoding for writing this column; never evaluated for non-String columns, defaults to false
-
getFieldId
The field ID for the givencolumnName.- Parameters:
columnName- the Deephaven column name- Returns:
- the field id
-
getSpecialInstructions
-
getCompressionCodecName
-
getMaximumDictionaryKeys
public abstract int getMaximumDictionaryKeys()- Returns:
- The maximum number of unique keys the writer should add to a dictionary page before switching to
non-dictionary encoding; never evaluated for non-String columns, ignored if
useDictionary(String)
-
getMaximumDictionarySize
public abstract int getMaximumDictionarySize()- Returns:
- The maximum number of bytes the writer should add to a dictionary before switching to non-dictionary
encoding; never evaluated for non-String columns, ignored if
useDictionary(String)
-
isLegacyParquet
public abstract boolean isLegacyParquet() -
getTargetPageSize
public abstract int getTargetPageSize() -
isRefreshing
public abstract boolean isRefreshing()- Returns:
- if the data source is refreshing
-
generateMetadataFiles
public abstract boolean generateMetadataFiles()- Returns:
- should we generate "_metadata" and "_common_metadata" files while writing parquet files?
-
getFileLayout
-
getTableDefinition
-
getIndexColumns
-
getRowGroupInfo
-
getColumnResolverFactory
-
withTableDefinition
Creates a newParquetInstructionsobject with the same properties as the current object but definition set as the providedTableDefinition. -
withLayout
Creates a newParquetInstructionsobject with the same properties as the current object but layout set as the providedParquetInstructions.ParquetFileLayout. -
withTableDefinitionAndLayout
public abstract ParquetInstructions withTableDefinitionAndLayout(TableDefinition tableDefinition, ParquetInstructions.ParquetFileLayout fileLayout) Creates a newParquetInstructionsobject with the same properties as the current object but definition and layout set as the provided values. -
baseNameForPartitionedParquetData
- Returns:
- the base name for partitioned parquet data. Check
setBaseNameForPartitionedParquetDatafor more details about different tokens that can be used in the base name.
-
onWriteCompleted
- Returns:
- A callback to be executed when on completing each parquet data file write (excluding the index and metadata files). This callback gets invoked by the writing thread in a linear fashion.
-
getSeekableChannelsProviderForWriting
-
sameColumnNamesAndCodecMappings
@VisibleForTesting public static boolean sameColumnNamesAndCodecMappings(ParquetInstructions i1, ParquetInstructions i2) -
builder
-