Class BaseAppendableColumnSink<DATA_TYPE,TARRAY>
- Type Parameters:
DATA_TYPE
- The column data typeTARRAY
- The stored values array data type (for example Integer DataType column would have int[] array data type)
- All Implemented Interfaces:
com.fishlib.base.log.LogOutputAppendable
,PartitionUpdatesObserver
,RowUpdateObservable
,AppendableColumnSink<DATA_TYPE,
,TARRAY> AppendableSink<DATA_TYPE,
,TARRAY> io.deephaven.csv.sinks.Sink<TARRAY>
,io.deephaven.csv.sinks.Source<TARRAY>
- Direct Known Subclasses:
AppendableColumnSinkHolder
AppendableColumnSink
.
holds common logic across all implementing sinks, including the handling of processing updates
based on partition column update processing-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final BaseCsvFieldWriter
protected final ImportColumnDataTransformer
protected final Map<String,
LocalAppendableColumn<DATA_TYPE>> protected final String
protected final DATA_TYPE
protected final boolean
static final String
protected final boolean
protected final boolean
protected final boolean
protected final boolean
protected final boolean
protected final CustomSetterSinkDataProcessor
protected RowUpdateObserver
static final BaseAppendableColumnSink[]
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
BaseAppendableColumnSink
(com.fishlib.io.logger.Logger log, String columnName, ImporterColumnDefinition icdColDef, ImportColumnDataTransformer columnDataTransformer, BaseCsvFieldWriter baseFieldWriter, boolean isPartitionCol, boolean isColumnInSchema, boolean isColumnInSource, boolean schemaHasPartitionCol, boolean fromSplitFile, boolean customSetterCol, CustomSetterSinkDataProcessor sinkDataProcessor) -
Method Summary
Modifier and TypeMethodDescriptionvoid
addAppendableColumn
(String partition, LocalAppendableColumn<DATA_TYPE> appendableColumn) The method provides the partition and its associatedLocalAppendableColumn
This information should be cached in the column sink implementation and should be used when updating the column values based onPartitionParserUpdate
that will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.protected abstract void
addToAppendableColumn
(LocalAppendableColumn<DATA_TYPE> appendableColumn, TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue) Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partitioncom.fishlib.base.log.LogOutput
append
(com.fishlib.base.log.LogOutput logOutput) void
This is invoked by the Row Update Publisher, which is the source column sink that has theRowUpdateObserver
set in it.void
The provided partition should be evicted from the current partition column cache.void
flush()
flushes all associatedLocalAppendableColumn
of the column (different partitions will have different LocalAppendableColumns).Returns an optional Data Transformer instance if data transformation is defined for the column in schema.Returns the column name retrieved from the TableDefinition of the column for this sink.Returns the defined constant value in case of a constant column value.Returns the source name mapping defined for the column in the schema.Returns the initializedCustomSetterSinkDataProcessor
in the constructorprotected StringBuilder
boolean
Returns true if the column is defined in schema.boolean
Returns true when the source name attribute inImporterColumnDefinition
is not null and the columnsAppendableColumnSink.isConstantColumn()
attribute is not true.boolean
Returns false if column type is not set toImporterColumnDefinition.IrisImportConstant
in schemaboolean
This is true if the schema import column section of this column defines a class to be used as a CustomSetter.boolean
Returns true when the column is one of multiple columns mapped to a single source file column in the schema.boolean
Returns true when theColumnDefinition.isPartitioning()
attribute for the column is set to true.void
onPartitionParserUpdate
(PartitionParserUpdate parserUpdate) Updates are published by the source (CsvPartitionColumnParser
orAppendableTableWrapper
) by invoking this method.protected void
processingComplete
(PartitionParserUpdate parserUpdate) Flush all associatedLocalAppendableColumn
for current ColumnSink before invokingPhaser.arrive()
inPartitionParserUpdate
instancevoid
Method invoked when there are processing errors and normal completion is not invoked.void
publishRowUpdate
(int size, long end) Publish row updates to interested sinks.void
void
registerRowUpdateObserver
(RowUpdateObserver rowUpdateObserver) Used to set the RowUpdateObserver on the sink.void
When invoked will mark the column sink as included in aColumnSinkHolder
void
setRowUpdatePhaser
(Phaser phaser) The phaser instance to track non-source column processing status by row update publisher when schema has no partition column.boolean
Data transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema.toString()
protected void
updateLocalAppendableColumn
(TARRAY values, long destEnd, boolean isSingleValue) Blocks until the partition column is parsed for the same chunk.void
Verifies if the Column processing had encountered errors during the processing of current chunkMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface com.illumon.iris.importers.csv.sink.AppendableColumnSink
add, addBooleans, addBytes, addChars, addDoubles, addFloats, addInts, addLongs, addShorts, isColumnOnlyInSchema
Methods inherited from interface com.illumon.iris.importers.csv.sink.AppendableSink
getUnderlying, nullFlagsToValues, publishToCustomSetter, updateCustomSetterData, updateNonSourceColRowChunk, updateRowChunk, write, writeToLocal
-
Field Details
-
ZERO_LENGTH_WRAPPER_ARRAY
-
DEFAULT_PARTITION_COL
- See Also:
-
columnDataTransformer
-
constantValue
-
columnName
-
isPartitionCol
protected final boolean isPartitionCol -
isColumnInSchema
protected final boolean isColumnInSchema -
isColumnInSource
protected final boolean isColumnInSource -
schemaHasPartitionCol
protected final boolean schemaHasPartitionCol -
customSetterCol
protected final boolean customSetterCol -
isFromSplitFile
protected final boolean isFromSplitFile -
columnMap
-
evictedPartitions
-
updateObserver
-
sinkDataProcessor
-
baseFieldWriter
-
-
Constructor Details
-
BaseAppendableColumnSink
protected BaseAppendableColumnSink(@NotNull com.fishlib.io.logger.Logger log, @NotNull String columnName, @Nullable ImporterColumnDefinition icdColDef, @Nullable ImportColumnDataTransformer columnDataTransformer, @Nullable BaseCsvFieldWriter baseFieldWriter, boolean isPartitionCol, boolean isColumnInSchema, boolean isColumnInSource, boolean schemaHasPartitionCol, boolean fromSplitFile, boolean customSetterCol, CustomSetterSinkDataProcessor sinkDataProcessor)
-
-
Method Details
-
getConstantValue
Description copied from interface:AppendableSink
Returns the defined constant value in case of a constant column value. Constant value definition is present in the importColumn xml element. The value will be accessible through ImportDataTransformer- Specified by:
getConstantValue
in interfaceAppendableSink<DATA_TYPE,
TARRAY>
-
getColumnName
Description copied from interface:AppendableColumnSink
Returns the column name retrieved from the TableDefinition of the column for this sink. This may or may not match the column name in the source csv file. SeeAppendableColumnSink.getCsvSourceColumnName()
- Specified by:
getColumnName
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- the associated TableDefinition column name.
-
getCsvSourceColumnName
Description copied from interface:AppendableColumnSink
Returns the source name mapping defined for the column in the schema. When source name is not explicitly defined it defaults to TableDefinition column name. When the attributeAppendableColumnSink.isColumnInSource()
is true, this should match a column header in source file.- Specified by:
getCsvSourceColumnName
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- The csv source file column header mapping of this column sink.
-
isColumnInSchema
public boolean isColumnInSchema()Description copied from interface:AppendableColumnSink
Returns true if the column is defined in schema.A source csv may have columns that are not defined in the schema. Those columns need to be identified, so they can be handled appropriately to satisfy dhc plumbing requirements. In addition, if such a column is designated to be
RowUpdateObserver
then row updates need to be published.- Specified by:
isColumnInSchema
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- true if the column is defined in schema.
-
isNotConsideredPartSourceFileMapping
public boolean isNotConsideredPartSourceFileMapping()Description copied from interface:AppendableColumnSink
Returns true when the column is one of multiple columns mapped to a single source file column in the schema. In these instances a ColumnSink defined as aColumnSinkHolder
will be the Column Sink that will be passed to DHC parsers infrastructure as the ColumnSink that is mapped to the source csv column. This column sink instance should be part of the collection of sinks that is managed by theColumnSinkHolder
- Specified by:
isNotConsideredPartSourceFileMapping
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- true if the column is part of many-to-one mapping as defined in schema w.r.t to source column
-
isColumnInSource
public boolean isColumnInSource()Description copied from interface:AppendableColumnSink
Returns true when the source name attribute inImporterColumnDefinition
is not null and the columnsAppendableColumnSink.isConstantColumn()
attribute is not true.- Specified by:
isColumnInSource
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- true if the column name or source column name in schema is present in the source csv file.
-
isCustomSetterColumn
public boolean isCustomSetterColumn()Description copied from interface:AppendableSink
This is true if the schema import column section of this column defines a class to be used as a CustomSetter.In addition, the property
AppendableSink.getCustomSinkDataProcessor()
should return a non-null value for allAppendableColumnSink
of the tableIn terms of processing the flag will be looked at while processing updates, as they are being written in to the sink for source columns. See
AppendableSink.write(Object, boolean[], long, long, boolean)
andAppendableSink.updateNonSourceColRowChunk(int, long)
fo usage.- Specified by:
isCustomSetterColumn
in interfaceAppendableSink<DATA_TYPE,
TARRAY> - Returns:
- true if the Column uses a CustomSetter
-
setColumnIsInSinkHolder
public void setColumnIsInSinkHolder()When invoked will mark the column sink as included in aColumnSinkHolder
-
isPartitionCol
public boolean isPartitionCol()Description copied from interface:AppendableColumnSink
Returns true when theColumnDefinition.isPartitioning()
attribute for the column is set to true.- Specified by:
isPartitionCol
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- true if the column is defined as a partition column in the schema
-
supportsTransformations
public boolean supportsTransformations()Description copied from interface:AppendableColumnSink
Data transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema. When data transformation is supported the corresponding parser should use the generated transformer to transform the source file values.- Specified by:
supportsTransformations
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- true when at least one of formula or transform attribute is defined in schema for the column
-
getCustomSinkDataProcessor
Description copied from interface:AppendableSink
Returns the initializedCustomSetterSinkDataProcessor
in the constructorThe presence of a non-null value would indicate the presence of a custom setter in the schema. That would mean all non-custom setter columns would publish data to Custom Setter Processor.
- Specified by:
getCustomSinkDataProcessor
in interfaceAppendableSink<DATA_TYPE,
TARRAY> - Returns:
- The
CustomSetterSinkDataProcessor
initialized in constructor
-
getColumnDataTransformer
Description copied from interface:AppendableColumnSink
Returns an optional Data Transformer instance if data transformation is defined for the column in schema. SeeAppendableColumnSink.supportsTransformations()
.- Specified by:
getColumnDataTransformer
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- an instance of ImportColumnDataTransformer when data transformation is defined for column or null.
-
isConstantColumn
public boolean isConstantColumn()Description copied from interface:AppendableColumnSink
Returns false if column type is not set toImporterColumnDefinition.IrisImportConstant
in schema- Specified by:
isConstantColumn
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Returns:
- true if the ColumnDataTransformer has the hasConstant attribute as true
-
toString
-
getStringBuilderIncludingBasicSinkAttributes
-
registerRowUpdateObserver
Used to set the RowUpdateObserver on the sink.This is invoked when the
AppendableColumnSink.isColumnOnlyInSchema()
attribute is true for any column in schema then one of the source csv columns will be designated asRowUpdateObserver
.- Specified by:
registerRowUpdateObserver
in interfaceRowUpdateObservable
- Parameters:
rowUpdateObserver
- A RowUpdateObserver to which the row chunk attribute (size and destEnd ) will be published
-
setRowUpdatePhaser
The phaser instance to track non-source column processing status by row update publisher when schema has no partition column.- Parameters:
phaser
- The phaser object that is registered with all non-source column plus the row update publisher as the registered parties
-
awaitAdvance
public void awaitAdvance()This is invoked by the Row Update Publisher, which is the source column sink that has theRowUpdateObserver
set in it. This will ensure that source column will not advance until the non-source columns complete their processing -
publishRowUpdate
public void publishRowUpdate(int size, long end) Publish row updates to interested sinks.- Specified by:
publishRowUpdate
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Parameters:
size
- The size of the updateend
- The destination End parameter of the update
-
addAppendableColumn
public void addAppendableColumn(@Nullable String partition, @Nullable LocalAppendableColumn<DATA_TYPE> appendableColumn) Description copied from interface:AppendableColumnSink
The method provides the partition and its associatedLocalAppendableColumn
This information should be cached in the column sink implementation and should be used when updating the column values based onPartitionParserUpdate
that will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.- Specified by:
addAppendableColumn
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Parameters:
partition
- Partition value as a stringappendableColumn
- The LocalAppendableColumn of the column associated for this partition
-
evict
Description copied from interface:AppendableColumnSink
The provided partition should be evicted from the current partition column cache. This should be registered as one of the evicted partitions.- Specified by:
evict
in interfaceAppendableColumnSink<DATA_TYPE,
TARRAY> - Parameters:
partition
- Partition value as a string, that should be evicted from the local column cache of partitions
-
addToAppendableColumn
protected abstract void addToAppendableColumn(@NotNull LocalAppendableColumn<DATA_TYPE> appendableColumn, @NotNull TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue) Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partition- Parameters:
appendableColumn
- The LocalAppendableColumn of the current partitionvalues
- The array whose contents need to be written to disk for this column and partitionchunkStartIndex
- The start index in the values arraychunkSize
- The length of data from the start index that need to be written to diskisSingleValue
- This indicates that entire chunk needs to be updated using a single value which is the constant value
-
updateLocalAppendableColumn
protected void updateLocalAppendableColumn(@NotNull TARRAY values, long destEnd, boolean isSingleValue) Blocks until the partition column is parsed for the same chunk. -
onPartitionParserUpdate
Description copied from interface:PartitionUpdatesObserver
Updates are published by the source (CsvPartitionColumnParser
orAppendableTableWrapper
) by invoking this method.- Specified by:
onPartitionParserUpdate
in interfacePartitionUpdatesObserver
- Parameters:
parserUpdate
- Partition for current chunk.
-
processingComplete
Flush all associatedLocalAppendableColumn
for current ColumnSink before invokingPhaser.arrive()
inPartitionParserUpdate
instance- Parameters:
parserUpdate
- the PartitionParserUpdate to invoke arrive, or arriveAndAwait on its internal Phaser instance
-
processingFailed
public void processingFailed()Method invoked when there are processing errors and normal completion is not invoked.This would let the processing to continue without resulting in a TimeOut as the actual error has been handled Instead fail upon completion of the batch processing with appropriate errors
Flush all associated
LocalAppendableColumn
for current ColumnSink before invokingPhaser.arrive()
inPartitionParserUpdate
instance if present -
flush
public void flush()flushes all associatedLocalAppendableColumn
of the column (different partitions will have different LocalAppendableColumns). -
read
- Specified by:
read
in interfaceio.deephaven.csv.sinks.Source<DATA_TYPE>
-
append
public com.fishlib.base.log.LogOutput append(@NotNull com.fishlib.base.log.LogOutput logOutput) - Specified by:
append
in interfacecom.fishlib.base.log.LogOutputAppendable
-
validateForProcessingErrors
public void validateForProcessingErrors()Description copied from interface:AppendableSink
Verifies if the Column processing had encountered errors during the processing of current chunkThis is relevant when the column is a
RowUpdateObservable
and non-source columns encountered an error. In that event the import will exit throwing appropriate error- Specified by:
validateForProcessingErrors
in interfaceAppendableSink<DATA_TYPE,
TARRAY>
-