com.illumon.iris.importers.csv.sink.BaseAppendableColumnSink<DATA_TYPE,TARRAY>

Type Parameters:: DATA_TYPE - The column data type; TARRAY - The stored values array data type (for example Integer DataType column would have int[] array data type)

All Implemented Interfaces:: com.fishlib.base.log.LogOutputAppendable, PartitionUpdatesObserver, RowUpdateObservable, AppendableColumnSink<DATA_TYPE,TARRAY>, AppendableSink<DATA_TYPE,TARRAY>, io.deephaven.csv.sinks.Sink<TARRAY>, io.deephaven.csv.sinks.Source<TARRAY>

Direct Known Subclasses:: AppendableColumnSinkHolder

public abstract class BaseAppendableColumnSink<DATA_TYPE,TARRAY> extends Object implements AppendableColumnSink<DATA_TYPE,TARRAY>, io.deephaven.csv.sinks.Source<TARRAY>, com.fishlib.base.log.LogOutputAppendable

The Base class for All types of implementations for AppendableColumnSink. holds common logic across all implementing sinks, including the handling of processing updates based on partition column update processing

Field Summary

Fields

Modifier and Type

Field

Description

protected final BaseCsvFieldWriter

baseFieldWriter

protected final ImportColumnDataTransformer

columnDataTransformer

protected final Map<String,LocalAppendableColumn<DATA_TYPE>>

columnMap

protected final String

columnName

protected final DATA_TYPE

constantValue

protected final boolean

customSetterCol

static final String

DEFAULT_PARTITION_COL

protected final Set<String>

evictedPartitions

protected final boolean

isColumnInSchema

protected final boolean

isColumnInSource

protected final boolean

isFromSplitFile

protected final boolean

isPartitionCol

protected final boolean

schemaHasPartitionCol

protected final CustomSetterSinkDataProcessor

sinkDataProcessor

protected RowUpdateObserver

updateObserver

static final BaseAppendableColumnSink[]

ZERO_LENGTH_WRAPPER_ARRAY
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

BaseAppendableColumnSink(com.fishlib.io.logger.Logger log, String columnName, ImporterColumnDefinition icdColDef, ImportColumnDataTransformer columnDataTransformer, BaseCsvFieldWriter baseFieldWriter, boolean isPartitionCol, boolean isColumnInSchema, boolean isColumnInSource, boolean schemaHasPartitionCol, boolean fromSplitFile, boolean customSetterCol, CustomSetterSinkDataProcessor sinkDataProcessor)
Method Summary

Modifier and Type

Method

Description

void

addAppendableColumn(String partition, LocalAppendableColumn<DATA_TYPE> appendableColumn)

The method provides the partition and its associated LocalAppendableColumn This information should be cached in the column sink implementation and should be used when updating the column values based on PartitionParserUpdate that will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.

protected abstract void

addToAppendableColumn(LocalAppendableColumn<DATA_TYPE> appendableColumn, TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue)

Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partition

com.fishlib.base.log.LogOutput

append(com.fishlib.base.log.LogOutput logOutput)

void

awaitAdvance()

This is invoked by the Row Update Publisher, which is the source column sink that has the RowUpdateObserver set in it.

void

evict(String partition)

The provided partition should be evicted from the current partition column cache.

void

flush()

flushes all associated LocalAppendableColumn of the column (different partitions will have different LocalAppendableColumns).

ImportColumnDataTransformer

getColumnDataTransformer()

Returns an optional Data Transformer instance if data transformation is defined for the column in schema.

String

getColumnName()

Returns the column name retrieved from the TableDefinition of the column for this sink.

DATA_TYPE

getConstantValue()

Returns the defined constant value in case of a constant column value.

String

getCsvSourceColumnName()

Returns the source name mapping defined for the column in the schema.

CustomSetterSinkDataProcessor

getCustomSinkDataProcessor()

Returns the initialized CustomSetterSinkDataProcessor in the constructor

protected StringBuilder

getStringBuilderIncludingBasicSinkAttributes()

boolean

isColumnInSchema()

Returns true if the column is defined in schema.

boolean

isColumnInSource()

Returns true when the source name attribute in ImporterColumnDefinition is not null and the columns AppendableColumnSink.isConstantColumn() attribute is not true.

boolean

isConstantColumn()

Returns false if column type is not set to ImporterColumnDefinition.IrisImportConstant in schema

boolean

isCustomSetterColumn()

This is true if the schema import column section of this column defines a class to be used as a CustomSetter.

boolean

isNotConsideredPartSourceFileMapping()

Returns true when the column is one of multiple columns mapped to a single source file column in the schema.

boolean

isPartitionCol()

Returns true when the ColumnDefinition.isPartitioning() attribute for the column is set to true.

void

onPartitionParserUpdate(PartitionParserUpdate parserUpdate)

Updates are published by the source (CsvPartitionColumnParser or AppendableTableWrapper) by invoking this method.

protected void

processingComplete(PartitionParserUpdate parserUpdate)

Flush all associated LocalAppendableColumn for current ColumnSink before invoking Phaser.arrive() in PartitionParserUpdate instance

void

processingFailed()

Method invoked when there are processing errors and normal completion is not invoked.

void

publishRowUpdate(int size, long end)

Publish row updates to interested sinks.

void

read(TARRAY dest, boolean[] isNull, long srcBegin, long srcEnd)

void

registerRowUpdateObserver(RowUpdateObserver rowUpdateObserver)

Used to set the RowUpdateObserver on the sink.

void

setColumnIsInSinkHolder()

When invoked will mark the column sink as included in a ColumnSinkHolder

void

setRowUpdatePhaser(Phaser phaser)

The phaser instance to track non-source column processing status by row update publisher when schema has no partition column.

boolean

supportsTransformations()

Data transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema.

String

toString()

protected void

updateLocalAppendableColumn(TARRAY values, long destEnd, boolean isSingleValue)

Blocks until the partition column is parsed for the same chunk.

void

validateForProcessingErrors()

Verifies if the Column processing had encountered errors during the processing of current chunk

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface com.illumon.iris.importers.csv.sink.AppendableColumnSink
add, addBooleans, addBytes, addChars, addDoubles, addFloats, addInts, addLongs, addShorts, isColumnOnlyInSchema

Methods inherited from interface com.illumon.iris.importers.csv.sink.AppendableSink
getUnderlying, nullFlagsToValues, publishToCustomSetter, updateCustomSetterData, updateNonSourceColRowChunk, updateRowChunk, write, writeToLocal

Field Details
- ZERO_LENGTH_WRAPPER_ARRAY
  
  public static final BaseAppendableColumnSink[] ZERO_LENGTH_WRAPPER_ARRAY
- DEFAULT_PARTITION_COL
  
  public static final String DEFAULT_PARTITION_COL
  See Also:
  
  Constant Field Values
- columnDataTransformer
  
  protected final ImportColumnDataTransformer columnDataTransformer
- constantValue
  
  protected final DATA_TYPE constantValue
- columnName
  
  protected final String columnName
- isPartitionCol
  
  protected final boolean isPartitionCol
- isColumnInSchema
  
  protected final boolean isColumnInSchema
- isColumnInSource
  
  protected final boolean isColumnInSource
- schemaHasPartitionCol
  
  protected final boolean schemaHasPartitionCol
- customSetterCol
  
  protected final boolean customSetterCol
- isFromSplitFile
  
  protected final boolean isFromSplitFile
- columnMap
  
  protected final Map<String,LocalAppendableColumn<DATA_TYPE>> columnMap
- evictedPartitions
  
  protected final Set<String> evictedPartitions
- updateObserver
  
  protected RowUpdateObserver updateObserver
- sinkDataProcessor
  
  protected final CustomSetterSinkDataProcessor sinkDataProcessor
- baseFieldWriter
  
  protected final BaseCsvFieldWriter baseFieldWriter
Constructor Details
- BaseAppendableColumnSink
  
  protected BaseAppendableColumnSink(@NotNull com.fishlib.io.logger.Logger log, @NotNull String columnName, @Nullable ImporterColumnDefinition icdColDef, @Nullable ImportColumnDataTransformer columnDataTransformer, @Nullable BaseCsvFieldWriter baseFieldWriter, boolean isPartitionCol, boolean isColumnInSchema, boolean isColumnInSource, boolean schemaHasPartitionCol, boolean fromSplitFile, boolean customSetterCol, CustomSetterSinkDataProcessor sinkDataProcessor)
Method Details
- getConstantValue
  
  @Nullable public DATA_TYPE getConstantValue()
  
  Description copied from interface: AppendableSink
  
  Returns the defined constant value in case of a constant column value. Constant value definition is present in the importColumn xml element. The value will be accessible through ImportDataTransformer
  
  Specified by:
  
  getConstantValue in interface AppendableSink<DATA_TYPE,TARRAY>
- getColumnName
  
  public String getColumnName()
  
  Description copied from interface: AppendableColumnSink
  
  Returns the column name retrieved from the TableDefinition of the column for this sink. This may or may not match the column name in the source csv file. See AppendableColumnSink.getCsvSourceColumnName()
  
  Specified by:
  
  getColumnName in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  the associated TableDefinition column name.
- getCsvSourceColumnName
  
  public String getCsvSourceColumnName()
  
  Description copied from interface: AppendableColumnSink
  
  Returns the source name mapping defined for the column in the schema. When source name is not explicitly defined it defaults to TableDefinition column name. When the attribute AppendableColumnSink.isColumnInSource() is true, this should match a column header in source file.
  
  Specified by:
  
  getCsvSourceColumnName in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  The csv source file column header mapping of this column sink.
- isColumnInSchema
  
  public boolean isColumnInSchema()
  
  Description copied from interface: AppendableColumnSink
  
  Returns true if the column is defined in schema.
  A source csv may have columns that are not defined in the schema. Those columns need to be identified, so they can be handled appropriately to satisfy dhc plumbing requirements. In addition, if such a column is designated to be RowUpdateObserver then row updates need to be published.
  
  Specified by:
  
  isColumnInSchema in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  true if the column is defined in schema.
- isNotConsideredPartSourceFileMapping
  
  public boolean isNotConsideredPartSourceFileMapping()
  
  Description copied from interface: AppendableColumnSink
  
  Returns true when the column is one of multiple columns mapped to a single source file column in the schema. In these instances a ColumnSink defined as a ColumnSinkHolder will be the Column Sink that will be passed to DHC parsers infrastructure as the ColumnSink that is mapped to the source csv column. This column sink instance should be part of the collection of sinks that is managed by the ColumnSinkHolder
  
  Specified by:
  
  isNotConsideredPartSourceFileMapping in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  true if the column is part of many-to-one mapping as defined in schema w.r.t to source column
- isColumnInSource
  
  public boolean isColumnInSource()
  
  Description copied from interface: AppendableColumnSink
  
  Returns true when the source name attribute in ImporterColumnDefinition is not null and the columns AppendableColumnSink.isConstantColumn() attribute is not true.
  
  Specified by:
  
  isColumnInSource in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  true if the column name or source column name in schema is present in the source csv file.
- isCustomSetterColumn
  
  public boolean isCustomSetterColumn()
  
  Description copied from interface: AppendableSink
  
  This is true if the schema import column section of this column defines a class to be used as a CustomSetter.
  In addition, the property AppendableSink.getCustomSinkDataProcessor() should return a non-null value for all AppendableColumnSink of the table
  In terms of processing the flag will be looked at while processing updates, as they are being written in to the sink for source columns. See AppendableSink.write(Object, boolean[], long, long, boolean) and AppendableSink.updateNonSourceColRowChunk(int, long) fo usage.
  
  Specified by:
  
  isCustomSetterColumn in interface AppendableSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  true if the Column uses a CustomSetter
- setColumnIsInSinkHolder
  
  public void setColumnIsInSinkHolder()
  
  When invoked will mark the column sink as included in a ColumnSinkHolder
- isPartitionCol
  
  public boolean isPartitionCol()
  
  Description copied from interface: AppendableColumnSink
  
  Returns true when the ColumnDefinition.isPartitioning() attribute for the column is set to true.
  
  Specified by:
  
  isPartitionCol in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  true if the column is defined as a partition column in the schema
- supportsTransformations
  
  public boolean supportsTransformations()
  
  Description copied from interface: AppendableColumnSink
  
  Data transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema. When data transformation is supported the corresponding parser should use the generated transformer to transform the source file values.
  
  Specified by:
  
  supportsTransformations in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  true when at least one of formula or transform attribute is defined in schema for the column
- getCustomSinkDataProcessor
  
  public CustomSetterSinkDataProcessor getCustomSinkDataProcessor()
  
  Description copied from interface: AppendableSink
  
  Returns the initialized CustomSetterSinkDataProcessor in the constructor
  The presence of a non-null value would indicate the presence of a custom setter in the schema. That would mean all non-custom setter columns would publish data to Custom Setter Processor.
  
  Specified by:
  
  getCustomSinkDataProcessor in interface AppendableSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  The CustomSetterSinkDataProcessor initialized in constructor
- getColumnDataTransformer
  
  @NotNull public ImportColumnDataTransformer getColumnDataTransformer()
  
  Description copied from interface: AppendableColumnSink
  
  Returns an optional Data Transformer instance if data transformation is defined for the column in schema. See AppendableColumnSink.supportsTransformations().
  
  Specified by:
  
  getColumnDataTransformer in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  an instance of ImportColumnDataTransformer when data transformation is defined for column or null.
- isConstantColumn
  
  public boolean isConstantColumn()
  
  Description copied from interface: AppendableColumnSink
  
  Returns false if column type is not set to ImporterColumnDefinition.IrisImportConstant in schema
  
  Specified by:
  
  isConstantColumn in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Returns:
  
  true if the ColumnDataTransformer has the hasConstant attribute as true
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object
- getStringBuilderIncludingBasicSinkAttributes
  
  @NotNull protected StringBuilder getStringBuilderIncludingBasicSinkAttributes()
- registerRowUpdateObserver
  
  public void registerRowUpdateObserver(@NotNull RowUpdateObserver rowUpdateObserver)
  
  Used to set the RowUpdateObserver on the sink.
  This is invoked when the AppendableColumnSink.isColumnOnlyInSchema() attribute is true for any column in schema then one of the source csv columns will be designated as RowUpdateObserver.
  
  Specified by:
  
  registerRowUpdateObserver in interface RowUpdateObservable
  
  Parameters:
  
  rowUpdateObserver - A RowUpdateObserver to which the row chunk attribute (size and destEnd ) will be published
- setRowUpdatePhaser
  
  public void setRowUpdatePhaser(@NotNull Phaser phaser)
  
  The phaser instance to track non-source column processing status by row update publisher when schema has no partition column.
  
  Parameters:
  
  phaser - The phaser object that is registered with all non-source column plus the row update publisher as the registered parties
- awaitAdvance
  
  public void awaitAdvance()
  
  This is invoked by the Row Update Publisher, which is the source column sink that has the RowUpdateObserver set in it. This will ensure that source column will not advance until the non-source columns complete their processing
- publishRowUpdate
  
  public void publishRowUpdate(int size, long end)
  
  Publish row updates to interested sinks.
  
  Specified by:
  
  publishRowUpdate in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Parameters:
  
  size - The size of the update
  
  end - The destination End parameter of the update
- addAppendableColumn
  
  public void addAppendableColumn(@Nullable String partition, @Nullable LocalAppendableColumn<DATA_TYPE> appendableColumn)
  
  Description copied from interface: AppendableColumnSink
  
  The method provides the partition and its associated LocalAppendableColumn This information should be cached in the column sink implementation and should be used when updating the column values based on PartitionParserUpdate that will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.
  
  Specified by:
  
  addAppendableColumn in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Parameters:
  
  partition - Partition value as a string
  
  appendableColumn - The LocalAppendableColumn of the column associated for this partition
- evict
  
  public void evict(@Nullable String partition)
  
  Description copied from interface: AppendableColumnSink
  
  The provided partition should be evicted from the current partition column cache. This should be registered as one of the evicted partitions.
  
  Specified by:
  
  evict in interface AppendableColumnSink<DATA_TYPE,TARRAY>
  
  Parameters:
  
  partition - Partition value as a string, that should be evicted from the local column cache of partitions
- addToAppendableColumn
  
  protected abstract void addToAppendableColumn(@NotNull LocalAppendableColumn<DATA_TYPE> appendableColumn, @NotNull TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue)
  
  Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partition
  
  Parameters:
  
  appendableColumn - The LocalAppendableColumn of the current partition
  
  values - The array whose contents need to be written to disk for this column and partition
  
  chunkStartIndex - The start index in the values array
  
  chunkSize - The length of data from the start index that need to be written to disk
  
  isSingleValue - This indicates that entire chunk needs to be updated using a single value which is the constant value
- updateLocalAppendableColumn
  
  protected void updateLocalAppendableColumn(@NotNull TARRAY values, long destEnd, boolean isSingleValue)
  
  Blocks until the partition column is parsed for the same chunk.
- onPartitionParserUpdate
  
  public void onPartitionParserUpdate(@NotNull PartitionParserUpdate parserUpdate)
  
  Description copied from interface: PartitionUpdatesObserver
  
  Updates are published by the source (CsvPartitionColumnParser or AppendableTableWrapper) by invoking this method.
  
  Specified by:
  
  onPartitionParserUpdate in interface PartitionUpdatesObserver
  
  Parameters:
  
  parserUpdate - Partition for current chunk.
- processingComplete
  
  protected void processingComplete(@NotNull PartitionParserUpdate parserUpdate)
  
  Flush all associated LocalAppendableColumn for current ColumnSink before invoking Phaser.arrive() in PartitionParserUpdate instance
  
  Parameters:
  
  parserUpdate - the PartitionParserUpdate to invoke arrive, or arriveAndAwait on its internal Phaser instance
- processingFailed
  
  public void processingFailed()
  
  Method invoked when there are processing errors and normal completion is not invoked.
  This would let the processing to continue without resulting in a TimeOut as the actual error has been handled Instead fail upon completion of the batch processing with appropriate errors
  Flush all associated LocalAppendableColumn for current ColumnSink before invoking Phaser.arrive() in PartitionParserUpdate instance if present
- flush
  
  public void flush()
  
  flushes all associated LocalAppendableColumn of the column (different partitions will have different LocalAppendableColumns).
- read
  
  public void read(TARRAY dest, boolean[] isNull, long srcBegin, long srcEnd)
  
  Specified by:
  
  read in interface io.deephaven.csv.sinks.Source<DATA_TYPE>
- append
  
  public com.fishlib.base.log.LogOutput append(@NotNull com.fishlib.base.log.LogOutput logOutput)
  
  Specified by:
  
  append in interface com.fishlib.base.log.LogOutputAppendable
- validateForProcessingErrors
  
  public void validateForProcessingErrors()
  
  Description copied from interface: AppendableSink
  
  Verifies if the Column processing had encountered errors during the processing of current chunk
  This is relevant when the column is a RowUpdateObservable and non-source columns encountered an error. In that event the import will exit throwing appropriate error
  
  Specified by:
  
  validateForProcessingErrors in interface AppendableSink<DATA_TYPE,TARRAY>

Class BaseAppendableColumnSink<DATA_TYPE,TARRAY>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface com.illumon.iris.importers.csv.sink.AppendableColumnSink

Methods inherited from interface com.illumon.iris.importers.csv.sink.AppendableSink

Field Details

ZERO_LENGTH_WRAPPER_ARRAY

DEFAULT_PARTITION_COL

columnDataTransformer

constantValue

columnName

isPartitionCol

isColumnInSchema

isColumnInSource

schemaHasPartitionCol

customSetterCol

isFromSplitFile

columnMap

evictedPartitions

updateObserver

sinkDataProcessor

baseFieldWriter

Constructor Details

BaseAppendableColumnSink

Method Details

getConstantValue

getColumnName

getCsvSourceColumnName

isColumnInSchema

isNotConsideredPartSourceFileMapping

isColumnInSource

isCustomSetterColumn

setColumnIsInSinkHolder

isPartitionCol

supportsTransformations

getCustomSinkDataProcessor

getColumnDataTransformer

isConstantColumn

toString

getStringBuilderIncludingBasicSinkAttributes

registerRowUpdateObserver

setRowUpdatePhaser

awaitAdvance

publishRowUpdate

addAppendableColumn

evict

addToAppendableColumn

updateLocalAppendableColumn

onPartitionParserUpdate

processingComplete

processingFailed

flush

read

append

validateForProcessingErrors