Class BaseAppendableColumnSink<DATA_TYPE,TARRAY>
- Type Parameters:
DATA_TYPE- The column data typeTARRAY- The stored values array data type (for example Integer DataType column would have int[] array data type)
- All Implemented Interfaces:
com.fishlib.base.log.LogOutputAppendable,PartitionUpdatesObserver,RowUpdateObservable,AppendableColumnSink<DATA_TYPE,,TARRAY> AppendableSink<DATA_TYPE,,TARRAY> io.deephaven.csv.sinks.Sink<TARRAY>,io.deephaven.csv.sinks.Source<TARRAY>
- Direct Known Subclasses:
AppendableColumnSinkHolder
AppendableColumnSink.
holds common logic across all implementing sinks, including the handling of processing updates
based on partition column update processing-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final BaseCsvFieldWriterprotected final ImportColumnDataTransformerprotected final Map<String,LocalAppendableColumn<DATA_TYPE>> protected final Stringprotected final DATA_TYPEprotected final booleanstatic final Stringprotected final booleanprotected final booleanprotected final booleanprotected final booleanprotected final booleanprotected final CustomSetterSinkDataProcessorprotected RowUpdateObserverstatic final BaseAppendableColumnSink[] -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedBaseAppendableColumnSink(com.fishlib.io.logger.Logger log, String columnName, ImporterColumnDefinition icdColDef, ImportColumnDataTransformer columnDataTransformer, BaseCsvFieldWriter baseFieldWriter, boolean isPartitionCol, boolean isColumnInSchema, boolean isColumnInSource, boolean schemaHasPartitionCol, boolean fromSplitFile, boolean customSetterCol, CustomSetterSinkDataProcessor sinkDataProcessor) -
Method Summary
Modifier and TypeMethodDescriptionvoidaddAppendableColumn(String partition, LocalAppendableColumn<DATA_TYPE> appendableColumn) The method provides the partition and its associatedLocalAppendableColumnThis information should be cached in the column sink implementation and should be used when updating the column values based onPartitionParserUpdatethat will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.protected abstract voidaddToAppendableColumn(LocalAppendableColumn<DATA_TYPE> appendableColumn, TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue) Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partitioncom.fishlib.base.log.LogOutputappend(com.fishlib.base.log.LogOutput logOutput) voidThis is invoked by the Row Update Publisher, which is the source column sink that has theRowUpdateObserverset in it.voidThe provided partition should be evicted from the current partition column cache.voidflush()flushes all associatedLocalAppendableColumnof the column (different partitions will have different LocalAppendableColumns).Returns an optional Data Transformer instance if data transformation is defined for the column in schema.Returns the column name retrieved from the TableDefinition of the column for this sink.Returns the defined constant value in case of a constant column value.Returns the source name mapping defined for the column in the schema.Returns the initializedCustomSetterSinkDataProcessorin the constructorprotected StringBuilderbooleanReturns true if the column is defined in schema.booleanReturns true when the source name attribute inImporterColumnDefinitionis not null and the columnsAppendableColumnSink.isConstantColumn()attribute is not true.booleanReturns false if column type is not set toImporterColumnDefinition.IrisImportConstantin schemabooleanThis is true if the schema import column section of this column defines a class to be used as a CustomSetter.booleanReturns true when the column is one of multiple columns mapped to a single source file column in the schema.booleanReturns true when theColumnDefinition.isPartitioning()attribute for the column is set to true.voidonPartitionParserUpdate(PartitionParserUpdate parserUpdate) Updates are published by the source (CsvPartitionColumnParserorAppendableTableWrapper) by invoking this method.protected voidprocessingComplete(PartitionParserUpdate parserUpdate) Flush all associatedLocalAppendableColumnfor current ColumnSink before invokingPhaser.arrive()inPartitionParserUpdateinstancevoidMethod invoked when there are processing errors and normal completion is not invoked.voidpublishRowUpdate(int size, long end) Publish row updates to interested sinks.voidvoidregisterRowUpdateObserver(RowUpdateObserver rowUpdateObserver) Used to set the RowUpdateObserver on the sink.voidWhen invoked will mark the column sink as included in aColumnSinkHoldervoidsetRowUpdatePhaser(Phaser phaser) The phaser instance to track non-source column processing status by row update publisher when schema has no partition column.booleanData transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema.toString()protected voidupdateLocalAppendableColumn(TARRAY values, long destEnd, boolean isSingleValue) Blocks until the partition column is parsed for the same chunk.voidVerifies if the Column processing had encountered errors during the processing of current chunkMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface com.illumon.iris.importers.csv.sink.AppendableColumnSink
add, addBooleans, addBytes, addChars, addDoubles, addFloats, addInts, addLongs, addShorts, isColumnOnlyInSchemaMethods inherited from interface com.illumon.iris.importers.csv.sink.AppendableSink
getUnderlying, nullFlagsToValues, publishToCustomSetter, updateCustomSetterData, updateNonSourceColRowChunk, updateRowChunk, write, writeToLocal
-
Field Details
-
ZERO_LENGTH_WRAPPER_ARRAY
-
DEFAULT_PARTITION_COL
- See Also:
-
columnDataTransformer
-
constantValue
-
columnName
-
isPartitionCol
protected final boolean isPartitionCol -
isColumnInSchema
protected final boolean isColumnInSchema -
isColumnInSource
protected final boolean isColumnInSource -
schemaHasPartitionCol
protected final boolean schemaHasPartitionCol -
customSetterCol
protected final boolean customSetterCol -
isFromSplitFile
protected final boolean isFromSplitFile -
columnMap
-
evictedPartitions
-
updateObserver
-
sinkDataProcessor
-
baseFieldWriter
-
-
Constructor Details
-
BaseAppendableColumnSink
protected BaseAppendableColumnSink(@NotNull com.fishlib.io.logger.Logger log, @NotNull String columnName, @Nullable ImporterColumnDefinition icdColDef, @Nullable ImportColumnDataTransformer columnDataTransformer, @Nullable BaseCsvFieldWriter baseFieldWriter, boolean isPartitionCol, boolean isColumnInSchema, boolean isColumnInSource, boolean schemaHasPartitionCol, boolean fromSplitFile, boolean customSetterCol, CustomSetterSinkDataProcessor sinkDataProcessor)
-
-
Method Details
-
getConstantValue
Description copied from interface:AppendableSinkReturns the defined constant value in case of a constant column value. Constant value definition is present in the importColumn xml element. The value will be accessible through ImportDataTransformer- Specified by:
getConstantValuein interfaceAppendableSink<DATA_TYPE,TARRAY>
-
getColumnName
Description copied from interface:AppendableColumnSinkReturns the column name retrieved from the TableDefinition of the column for this sink. This may or may not match the column name in the source csv file. SeeAppendableColumnSink.getCsvSourceColumnName()- Specified by:
getColumnNamein interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- the associated TableDefinition column name.
-
getCsvSourceColumnName
Description copied from interface:AppendableColumnSinkReturns the source name mapping defined for the column in the schema. When source name is not explicitly defined it defaults to TableDefinition column name. When the attributeAppendableColumnSink.isColumnInSource()is true, this should match a column header in source file.- Specified by:
getCsvSourceColumnNamein interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- The csv source file column header mapping of this column sink.
-
isColumnInSchema
public boolean isColumnInSchema()Description copied from interface:AppendableColumnSinkReturns true if the column is defined in schema.A source csv may have columns that are not defined in the schema. Those columns need to be identified, so they can be handled appropriately to satisfy dhc plumbing requirements. In addition, if such a column is designated to be
RowUpdateObserverthen row updates need to be published.- Specified by:
isColumnInSchemain interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- true if the column is defined in schema.
-
isNotConsideredPartSourceFileMapping
public boolean isNotConsideredPartSourceFileMapping()Description copied from interface:AppendableColumnSinkReturns true when the column is one of multiple columns mapped to a single source file column in the schema. In these instances a ColumnSink defined as aColumnSinkHolderwill be the Column Sink that will be passed to DHC parsers infrastructure as the ColumnSink that is mapped to the source csv column. This column sink instance should be part of the collection of sinks that is managed by theColumnSinkHolder- Specified by:
isNotConsideredPartSourceFileMappingin interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- true if the column is part of many-to-one mapping as defined in schema w.r.t to source column
-
isColumnInSource
public boolean isColumnInSource()Description copied from interface:AppendableColumnSinkReturns true when the source name attribute inImporterColumnDefinitionis not null and the columnsAppendableColumnSink.isConstantColumn()attribute is not true.- Specified by:
isColumnInSourcein interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- true if the column name or source column name in schema is present in the source csv file.
-
isCustomSetterColumn
public boolean isCustomSetterColumn()Description copied from interface:AppendableSinkThis is true if the schema import column section of this column defines a class to be used as a CustomSetter.In addition, the property
AppendableSink.getCustomSinkDataProcessor()should return a non-null value for allAppendableColumnSinkof the tableIn terms of processing the flag will be looked at while processing updates, as they are being written in to the sink for source columns. See
AppendableSink.write(Object, boolean[], long, long, boolean)andAppendableSink.updateNonSourceColRowChunk(int, long)fo usage.- Specified by:
isCustomSetterColumnin interfaceAppendableSink<DATA_TYPE,TARRAY> - Returns:
- true if the Column uses a CustomSetter
-
setColumnIsInSinkHolder
public void setColumnIsInSinkHolder()When invoked will mark the column sink as included in aColumnSinkHolder -
isPartitionCol
public boolean isPartitionCol()Description copied from interface:AppendableColumnSinkReturns true when theColumnDefinition.isPartitioning()attribute for the column is set to true.- Specified by:
isPartitionColin interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- true if the column is defined as a partition column in the schema
-
supportsTransformations
public boolean supportsTransformations()Description copied from interface:AppendableColumnSinkData transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema. When data transformation is supported the corresponding parser should use the generated transformer to transform the source file values.- Specified by:
supportsTransformationsin interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- true when at least one of formula or transform attribute is defined in schema for the column
-
getCustomSinkDataProcessor
Description copied from interface:AppendableSinkReturns the initializedCustomSetterSinkDataProcessorin the constructorThe presence of a non-null value would indicate the presence of a custom setter in the schema. That would mean all non-custom setter columns would publish data to Custom Setter Processor.
- Specified by:
getCustomSinkDataProcessorin interfaceAppendableSink<DATA_TYPE,TARRAY> - Returns:
- The
CustomSetterSinkDataProcessorinitialized in constructor
-
getColumnDataTransformer
Description copied from interface:AppendableColumnSinkReturns an optional Data Transformer instance if data transformation is defined for the column in schema. SeeAppendableColumnSink.supportsTransformations().- Specified by:
getColumnDataTransformerin interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- an instance of ImportColumnDataTransformer when data transformation is defined for column or null.
-
isConstantColumn
public boolean isConstantColumn()Description copied from interface:AppendableColumnSinkReturns false if column type is not set toImporterColumnDefinition.IrisImportConstantin schema- Specified by:
isConstantColumnin interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Returns:
- true if the ColumnDataTransformer has the hasConstant attribute as true
-
toString
-
getStringBuilderIncludingBasicSinkAttributes
-
registerRowUpdateObserver
Used to set the RowUpdateObserver on the sink.This is invoked when the
AppendableColumnSink.isColumnOnlyInSchema()attribute is true for any column in schema then one of the source csv columns will be designated asRowUpdateObserver.- Specified by:
registerRowUpdateObserverin interfaceRowUpdateObservable- Parameters:
rowUpdateObserver- A RowUpdateObserver to which the row chunk attribute (size and destEnd ) will be published
-
setRowUpdatePhaser
The phaser instance to track non-source column processing status by row update publisher when schema has no partition column.- Parameters:
phaser- The phaser object that is registered with all non-source column plus the row update publisher as the registered parties
-
awaitAdvance
public void awaitAdvance()This is invoked by the Row Update Publisher, which is the source column sink that has theRowUpdateObserverset in it. This will ensure that source column will not advance until the non-source columns complete their processing -
publishRowUpdate
public void publishRowUpdate(int size, long end) Publish row updates to interested sinks.- Specified by:
publishRowUpdatein interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Parameters:
size- The size of the updateend- The destination End parameter of the update
-
addAppendableColumn
public void addAppendableColumn(@Nullable String partition, @Nullable LocalAppendableColumn<DATA_TYPE> appendableColumn) Description copied from interface:AppendableColumnSinkThe method provides the partition and its associatedLocalAppendableColumnThis information should be cached in the column sink implementation and should be used when updating the column values based onPartitionParserUpdatethat will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.- Specified by:
addAppendableColumnin interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Parameters:
partition- Partition value as a stringappendableColumn- The LocalAppendableColumn of the column associated for this partition
-
evict
Description copied from interface:AppendableColumnSinkThe provided partition should be evicted from the current partition column cache. This should be registered as one of the evicted partitions.- Specified by:
evictin interfaceAppendableColumnSink<DATA_TYPE,TARRAY> - Parameters:
partition- Partition value as a string, that should be evicted from the local column cache of partitions
-
addToAppendableColumn
protected abstract void addToAppendableColumn(@NotNull LocalAppendableColumn<DATA_TYPE> appendableColumn, @NotNull TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue) Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partition- Parameters:
appendableColumn- The LocalAppendableColumn of the current partitionvalues- The array whose contents need to be written to disk for this column and partitionchunkStartIndex- The start index in the values arraychunkSize- The length of data from the start index that need to be written to diskisSingleValue- This indicates that entire chunk needs to be updated using a single value which is the constant value
-
updateLocalAppendableColumn
protected void updateLocalAppendableColumn(@NotNull TARRAY values, long destEnd, boolean isSingleValue) Blocks until the partition column is parsed for the same chunk. -
onPartitionParserUpdate
Description copied from interface:PartitionUpdatesObserverUpdates are published by the source (CsvPartitionColumnParserorAppendableTableWrapper) by invoking this method.- Specified by:
onPartitionParserUpdatein interfacePartitionUpdatesObserver- Parameters:
parserUpdate- Partition for current chunk.
-
processingComplete
Flush all associatedLocalAppendableColumnfor current ColumnSink before invokingPhaser.arrive()inPartitionParserUpdateinstance- Parameters:
parserUpdate- the PartitionParserUpdate to invoke arrive, or arriveAndAwait on its internal Phaser instance
-
processingFailed
public void processingFailed()Method invoked when there are processing errors and normal completion is not invoked.This would let the processing to continue without resulting in a TimeOut as the actual error has been handled Instead fail upon completion of the batch processing with appropriate errors
Flush all associated
LocalAppendableColumnfor current ColumnSink before invokingPhaser.arrive()inPartitionParserUpdateinstance if present -
flush
public void flush()flushes all associatedLocalAppendableColumnof the column (different partitions will have different LocalAppendableColumns). -
read
- Specified by:
readin interfaceio.deephaven.csv.sinks.Source<DATA_TYPE>
-
append
public com.fishlib.base.log.LogOutput append(@NotNull com.fishlib.base.log.LogOutput logOutput) - Specified by:
appendin interfacecom.fishlib.base.log.LogOutputAppendable
-
validateForProcessingErrors
public void validateForProcessingErrors()Description copied from interface:AppendableSinkVerifies if the Column processing had encountered errors during the processing of current chunkThis is relevant when the column is a
RowUpdateObservableand non-source columns encountered an error. In that event the import will exit throwing appropriate error- Specified by:
validateForProcessingErrorsin interfaceAppendableSink<DATA_TYPE,TARRAY>
-