Class BaseAppendableColumnSink<DATA_TYPE,TARRAY>

java.lang.Object
com.illumon.iris.importers.csv.sink.BaseAppendableColumnSink<DATA_TYPE,TARRAY>
Type Parameters:
DATA_TYPE - The column data type
TARRAY - The stored values array data type (for example Integer DataType column would have int[] array data type)
All Implemented Interfaces:
com.fishlib.base.log.LogOutputAppendable, PartitionUpdatesObserver, RowUpdateObservable, AppendableColumnSink<DATA_TYPE,TARRAY>, AppendableSink<DATA_TYPE,TARRAY>, io.deephaven.csv.sinks.Sink<TARRAY>, io.deephaven.csv.sinks.Source<TARRAY>
Direct Known Subclasses:
AppendableColumnSinkHolder

public abstract class BaseAppendableColumnSink<DATA_TYPE,TARRAY> extends Object implements AppendableColumnSink<DATA_TYPE,TARRAY>, io.deephaven.csv.sinks.Source<TARRAY>, com.fishlib.base.log.LogOutputAppendable
The Base class for All types of implementations for AppendableColumnSink. holds common logic across all implementing sinks, including the handling of processing updates based on partition column update processing
  • Field Details

    • ZERO_LENGTH_WRAPPER_ARRAY

      public static final BaseAppendableColumnSink[] ZERO_LENGTH_WRAPPER_ARRAY
    • DEFAULT_PARTITION_COL

      public static final String DEFAULT_PARTITION_COL
      See Also:
    • columnDataTransformer

      protected final ImportColumnDataTransformer columnDataTransformer
    • constantValue

      protected final DATA_TYPE constantValue
    • columnName

      protected final String columnName
    • isPartitionCol

      protected final boolean isPartitionCol
    • isColumnInSchema

      protected final boolean isColumnInSchema
    • isColumnInSource

      protected final boolean isColumnInSource
    • schemaHasPartitionCol

      protected final boolean schemaHasPartitionCol
    • customSetterCol

      protected final boolean customSetterCol
    • isFromSplitFile

      protected final boolean isFromSplitFile
    • columnMap

      protected final Map<String,LocalAppendableColumn<DATA_TYPE>> columnMap
    • evictedPartitions

      protected final Set<String> evictedPartitions
    • updateObserver

      protected RowUpdateObserver updateObserver
    • sinkDataProcessor

      protected final CustomSetterSinkDataProcessor sinkDataProcessor
    • baseFieldWriter

      protected final BaseCsvFieldWriter baseFieldWriter
  • Constructor Details

  • Method Details

    • getConstantValue

      @Nullable public DATA_TYPE getConstantValue()
      Description copied from interface: AppendableSink
      Returns the defined constant value in case of a constant column value. Constant value definition is present in the importColumn xml element. The value will be accessible through ImportDataTransformer
      Specified by:
      getConstantValue in interface AppendableSink<DATA_TYPE,TARRAY>
    • getColumnName

      public String getColumnName()
      Description copied from interface: AppendableColumnSink
      Returns the column name retrieved from the TableDefinition of the column for this sink. This may or may not match the column name in the source csv file. See AppendableColumnSink.getCsvSourceColumnName()
      Specified by:
      getColumnName in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      the associated TableDefinition column name.
    • getCsvSourceColumnName

      public String getCsvSourceColumnName()
      Description copied from interface: AppendableColumnSink
      Returns the source name mapping defined for the column in the schema. When source name is not explicitly defined it defaults to TableDefinition column name. When the attribute AppendableColumnSink.isColumnInSource() is true, this should match a column header in source file.
      Specified by:
      getCsvSourceColumnName in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      The csv source file column header mapping of this column sink.
    • isColumnInSchema

      public boolean isColumnInSchema()
      Description copied from interface: AppendableColumnSink
      Returns true if the column is defined in schema.

      A source csv may have columns that are not defined in the schema. Those columns need to be identified, so they can be handled appropriately to satisfy dhc plumbing requirements. In addition, if such a column is designated to be RowUpdateObserver then row updates need to be published.

      Specified by:
      isColumnInSchema in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column is defined in schema.
    • isNotConsideredPartSourceFileMapping

      public boolean isNotConsideredPartSourceFileMapping()
      Description copied from interface: AppendableColumnSink
      Returns true when the column is one of multiple columns mapped to a single source file column in the schema. In these instances a ColumnSink defined as a ColumnSinkHolder will be the Column Sink that will be passed to DHC parsers infrastructure as the ColumnSink that is mapped to the source csv column. This column sink instance should be part of the collection of sinks that is managed by the ColumnSinkHolder
      Specified by:
      isNotConsideredPartSourceFileMapping in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column is part of many-to-one mapping as defined in schema w.r.t to source column
    • isColumnInSource

      public boolean isColumnInSource()
      Description copied from interface: AppendableColumnSink
      Returns true when the source name attribute in ImporterColumnDefinition is not null and the columns AppendableColumnSink.isConstantColumn() attribute is not true.
      Specified by:
      isColumnInSource in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column name or source column name in schema is present in the source csv file.
    • isCustomSetterColumn

      public boolean isCustomSetterColumn()
      Description copied from interface: AppendableSink
      This is true if the schema import column section of this column defines a class to be used as a CustomSetter.

      In addition, the property AppendableSink.getCustomSinkDataProcessor() should return a non-null value for all AppendableColumnSink of the table

      In terms of processing the flag will be looked at while processing updates, as they are being written in to the sink for source columns. See AppendableSink.write(Object, boolean[], long, long, boolean) and AppendableSink.updateNonSourceColRowChunk(int, long) fo usage.

      Specified by:
      isCustomSetterColumn in interface AppendableSink<DATA_TYPE,TARRAY>
      Returns:
      true if the Column uses a CustomSetter
    • setColumnIsInSinkHolder

      public void setColumnIsInSinkHolder()
      When invoked will mark the column sink as included in a ColumnSinkHolder
    • isPartitionCol

      public boolean isPartitionCol()
      Description copied from interface: AppendableColumnSink
      Returns true when the ColumnDefinition.isPartitioning() attribute for the column is set to true.
      Specified by:
      isPartitionCol in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column is defined as a partition column in the schema
    • supportsTransformations

      public boolean supportsTransformations()
      Description copied from interface: AppendableColumnSink
      Data transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema. When data transformation is supported the corresponding parser should use the generated transformer to transform the source file values.
      Specified by:
      supportsTransformations in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true when at least one of formula or transform attribute is defined in schema for the column
    • getCustomSinkDataProcessor

      public CustomSetterSinkDataProcessor getCustomSinkDataProcessor()
      Description copied from interface: AppendableSink
      Returns the initialized CustomSetterSinkDataProcessor in the constructor

      The presence of a non-null value would indicate the presence of a custom setter in the schema. That would mean all non-custom setter columns would publish data to Custom Setter Processor.

      Specified by:
      getCustomSinkDataProcessor in interface AppendableSink<DATA_TYPE,TARRAY>
      Returns:
      The CustomSetterSinkDataProcessor initialized in constructor
    • getColumnDataTransformer

      @NotNull public ImportColumnDataTransformer getColumnDataTransformer()
      Description copied from interface: AppendableColumnSink
      Returns an optional Data Transformer instance if data transformation is defined for the column in schema. See AppendableColumnSink.supportsTransformations().
      Specified by:
      getColumnDataTransformer in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      an instance of ImportColumnDataTransformer when data transformation is defined for column or null.
    • isConstantColumn

      public boolean isConstantColumn()
      Description copied from interface: AppendableColumnSink
      Returns false if column type is not set to ImporterColumnDefinition.IrisImportConstant in schema
      Specified by:
      isConstantColumn in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the ColumnDataTransformer has the hasConstant attribute as true
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getStringBuilderIncludingBasicSinkAttributes

      @NotNull protected StringBuilder getStringBuilderIncludingBasicSinkAttributes()
    • registerRowUpdateObserver

      public void registerRowUpdateObserver(@NotNull RowUpdateObserver rowUpdateObserver)
      Used to set the RowUpdateObserver on the sink.

      This is invoked when the AppendableColumnSink.isColumnOnlyInSchema() attribute is true for any column in schema then one of the source csv columns will be designated as RowUpdateObserver.

      Specified by:
      registerRowUpdateObserver in interface RowUpdateObservable
      Parameters:
      rowUpdateObserver - A RowUpdateObserver to which the row chunk attribute (size and destEnd ) will be published
    • setRowUpdatePhaser

      public void setRowUpdatePhaser(@NotNull Phaser phaser)
      The phaser instance to track non-source column processing status by row update publisher when schema has no partition column.
      Parameters:
      phaser - The phaser object that is registered with all non-source column plus the row update publisher as the registered parties
    • awaitAdvance

      public void awaitAdvance()
      This is invoked by the Row Update Publisher, which is the source column sink that has the RowUpdateObserver set in it. This will ensure that source column will not advance until the non-source columns complete their processing
    • publishRowUpdate

      public void publishRowUpdate(int size, long end)
      Publish row updates to interested sinks.
      Specified by:
      publishRowUpdate in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Parameters:
      size - The size of the update
      end - The destination End parameter of the update
    • addAppendableColumn

      public void addAppendableColumn(@Nullable String partition, @Nullable LocalAppendableColumn<DATA_TYPE> appendableColumn)
      Description copied from interface: AppendableColumnSink
      The method provides the partition and its associated LocalAppendableColumn This information should be cached in the column sink implementation and should be used when updating the column values based on PartitionParserUpdate that will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.
      Specified by:
      addAppendableColumn in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Parameters:
      partition - Partition value as a string
      appendableColumn - The LocalAppendableColumn of the column associated for this partition
    • evict

      public void evict(@Nullable String partition)
      Description copied from interface: AppendableColumnSink
      The provided partition should be evicted from the current partition column cache. This should be registered as one of the evicted partitions.
      Specified by:
      evict in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Parameters:
      partition - Partition value as a string, that should be evicted from the local column cache of partitions
    • addToAppendableColumn

      protected abstract void addToAppendableColumn(@NotNull LocalAppendableColumn<DATA_TYPE> appendableColumn, @NotNull TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue)
      Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partition
      Parameters:
      appendableColumn - The LocalAppendableColumn of the current partition
      values - The array whose contents need to be written to disk for this column and partition
      chunkStartIndex - The start index in the values array
      chunkSize - The length of data from the start index that need to be written to disk
      isSingleValue - This indicates that entire chunk needs to be updated using a single value which is the constant value
    • updateLocalAppendableColumn

      protected void updateLocalAppendableColumn(@NotNull TARRAY values, long destEnd, boolean isSingleValue)
      Blocks until the partition column is parsed for the same chunk.
    • onPartitionParserUpdate

      public void onPartitionParserUpdate(@NotNull PartitionParserUpdate parserUpdate)
      Description copied from interface: PartitionUpdatesObserver
      Updates are published by the source (CsvPartitionColumnParser or AppendableTableWrapper) by invoking this method.
      Specified by:
      onPartitionParserUpdate in interface PartitionUpdatesObserver
      Parameters:
      parserUpdate - Partition for current chunk.
    • processingComplete

      protected void processingComplete(@NotNull PartitionParserUpdate parserUpdate)
      Flush all associated LocalAppendableColumn for current ColumnSink before invoking Phaser.arrive() in PartitionParserUpdate instance
      Parameters:
      parserUpdate - the PartitionParserUpdate to invoke arrive, or arriveAndAwait on its internal Phaser instance
    • processingFailed

      public void processingFailed()
      Method invoked when there are processing errors and normal completion is not invoked.

      This would let the processing to continue without resulting in a TimeOut as the actual error has been handled Instead fail upon completion of the batch processing with appropriate errors

      Flush all associated LocalAppendableColumn for current ColumnSink before invoking Phaser.arrive() in PartitionParserUpdate instance if present

    • flush

      public void flush()
      flushes all associated LocalAppendableColumn of the column (different partitions will have different LocalAppendableColumns).
    • read

      public void read(TARRAY dest, boolean[] isNull, long srcBegin, long srcEnd)
      Specified by:
      read in interface io.deephaven.csv.sinks.Source<DATA_TYPE>
    • append

      public com.fishlib.base.log.LogOutput append(@NotNull com.fishlib.base.log.LogOutput logOutput)
      Specified by:
      append in interface com.fishlib.base.log.LogOutputAppendable
    • validateForProcessingErrors

      public void validateForProcessingErrors()
      Description copied from interface: AppendableSink
      Verifies if the Column processing had encountered errors during the processing of current chunk

      This is relevant when the column is a RowUpdateObservable and non-source columns encountered an error. In that event the import will exit throwing appropriate error

      Specified by:
      validateForProcessingErrors in interface AppendableSink<DATA_TYPE,TARRAY>