Package io.deephaven.parquet.base
Class ParquetFileReader
java.lang.Object
io.deephaven.parquet.base.ParquetFileReader
Top level accessor for a parquet file which can read both from a file path string or a CLI style file URI,
ex."s3://bucket/key".
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Stringfinal org.apache.parquet.format.FileMetaData -
Constructor Summary
ConstructorsConstructorDescriptionParquetFileReader(String source, SeekableChannelsProvider channelsProvider) Create a new ParquetFileReader for the provided source.ParquetFileReader(URI parquetFileURI, SeekableChannelsProvider channelsProvider) Create a new ParquetFileReader for the provided source. -
Method Summary
Modifier and TypeMethodDescriptionstatic ParquetFileReaderMake aParquetFileReaderfor the suppliedFile.static ParquetFileReaderMake aParquetFileReaderfor the suppliedURI.static ParquetFileReadercreateChecked(@NotNull File parquetFile, @Nullable Object specialInstructions) Make aParquetFileReaderfor the suppliedFile.static ParquetFileReadercreateChecked(@NotNull URI parquetFileURI, @Nullable Object specialInstructions) Make aParquetFileReaderfor the suppliedURI.Get the name of all columns that we can know for certain (a) have a dictionary, and (b) use the dictionary on all data pages.getRowGroup(int groupNumber, String version) Create aRowGroupReaderobject for provided row group numberorg.apache.parquet.schema.MessageTypeint
-
Field Details
-
FILE_URI_SCHEME
- See Also:
-
fileMetaData
public final org.apache.parquet.format.FileMetaData fileMetaData
-
-
Constructor Details
-
ParquetFileReader
public ParquetFileReader(String source, SeekableChannelsProvider channelsProvider) throws IOException Create a new ParquetFileReader for the provided source.- Parameters:
source- The source path or URI for the parquet file or the parquet metadata filechannelsProvider- TheSeekableChannelsProviderto use for reading the file- Throws:
IOException
-
ParquetFileReader
public ParquetFileReader(URI parquetFileURI, SeekableChannelsProvider channelsProvider) throws IOException Create a new ParquetFileReader for the provided source.- Parameters:
parquetFileURI- The URI for the parquet file or the parquet metadata filechannelsProvider- TheSeekableChannelsProviderto use for reading the file- Throws:
IOException
-
-
Method Details
-
create
public static ParquetFileReader create(@NotNull @NotNull File parquetFile, @Nullable @Nullable Object specialInstructions) - Parameters:
parquetFile- The parquet file or the parquet metadata filespecialInstructions- Optional read instructions to pass toSeekableChannelsProviderwhile creating channels- Returns:
- The new
ParquetFileReader
-
create
public static ParquetFileReader create(@NotNull @NotNull URI parquetFileURI, @Nullable @Nullable Object specialInstructions) - Parameters:
parquetFileURI- The URI for the parquet file or the parquet metadata filespecialInstructions- Optional read instructions to pass toSeekableChannelsProviderwhile creating channels- Returns:
- The new
ParquetFileReader
-
createChecked
public static ParquetFileReader createChecked(@NotNull @NotNull File parquetFile, @Nullable @Nullable Object specialInstructions) throws IOException Make aParquetFileReaderfor the suppliedFile.- Parameters:
parquetFile- The parquet file or the parquet metadata filespecialInstructions- Optional read instructions to pass toSeekableChannelsProviderwhile creating channels- Returns:
- The new
ParquetFileReader - Throws:
IOException- if an IO exception occurs
-
createChecked
public static ParquetFileReader createChecked(@NotNull @NotNull URI parquetFileURI, @Nullable @Nullable Object specialInstructions) throws IOException Make aParquetFileReaderfor the suppliedURI.- Parameters:
parquetFileURI- The URI for the parquet file or the parquet metadata filespecialInstructions- Optional read instructions to pass toSeekableChannelsProviderwhile creating channels- Returns:
- The new
ParquetFileReader - Throws:
IOException- if an IO exception occurs
-
getChannelsProvider
- Returns:
- The
SeekableChannelsProviderused for this reader, appropriate to use for related file access
-
getColumnsWithDictionaryUsedOnEveryDataPage
Get the name of all columns that we can know for certain (a) have a dictionary, and (b) use the dictionary on all data pages.- Returns:
- A set of parquet column names that satisfies the required condition.
-
getRowGroup
Create aRowGroupReaderobject for provided row group number- Parameters:
version- The "version" string from deephaven specific parquet metadata, or null if it's not present.
-
getSchema
public org.apache.parquet.schema.MessageType getSchema() -
rowGroupCount
public int rowGroupCount()
-