Class BaseAppendableColumnSink<DATA_TYPE,TARRAY>

java.lang.Object
com.illumon.iris.importers.csv.sink.BaseAppendableColumnSink<DATA_TYPE,TARRAY>
Type Parameters:
DATA_TYPE - The column data type
TARRAY - The stored values array data type (for example Integer DataType column would have int[] array data type)
All Implemented Interfaces:
com.fishlib.base.log.LogOutputAppendable, PartitionUpdatesObserver, RowUpdateObservable, AppendableColumnSink<DATA_TYPE,TARRAY>, AppendableSink<DATA_TYPE,TARRAY>, io.deephaven.csv.sinks.Sink<TARRAY>, io.deephaven.csv.sinks.Source<TARRAY>
Direct Known Subclasses:
AppendableColumnSinkHolder

public abstract class BaseAppendableColumnSink<DATA_TYPE,TARRAY> extends Object implements AppendableColumnSink<DATA_TYPE,TARRAY>, io.deephaven.csv.sinks.Source<TARRAY>, com.fishlib.base.log.LogOutputAppendable
The Base class for All types of implementations for AppendableColumnSink. holds common logic across all implementing sinks, including the handling of processing updates based on partition column update processing
  • Field Details

    • ZERO_LENGTH_WRAPPER_ARRAY

      public static final BaseAppendableColumnSink[] ZERO_LENGTH_WRAPPER_ARRAY
    • DEFAULT_PARTITION_COL

      public static final String DEFAULT_PARTITION_COL
      See Also:
    • columnDataTransformer

      protected final ImportColumnDataTransformer columnDataTransformer
    • constantValue

      protected final DATA_TYPE constantValue
    • columnName

      protected final String columnName
    • isPartitionCol

      protected final boolean isPartitionCol
    • isColumnInSchema

      protected final boolean isColumnInSchema
    • isColumnInSource

      protected final boolean isColumnInSource
    • schemaHasPartitionCol

      protected final boolean schemaHasPartitionCol
    • isFromSplitFile

      protected final boolean isFromSplitFile
    • columnMap

      protected final Map<String,LocalAppendableColumn<DATA_TYPE>> columnMap
    • evictedPartitions

      protected final Set<String> evictedPartitions
    • updateObserver

      protected RowUpdateObserver updateObserver
  • Constructor Details

    • BaseAppendableColumnSink

      protected BaseAppendableColumnSink(@NotNull com.fishlib.io.logger.Logger log, @NotNull String columnName, @Nullable ImporterColumnDefinition icdColDef, @Nullable ImportColumnDataTransformer columnDataTransformer, boolean isPartitionCol, boolean isColumnInSchema, boolean isColumnInSource, boolean schemaHasPartitionCol, boolean fromSplitFile)
  • Method Details

    • getConstantValue

      @Nullable public DATA_TYPE getConstantValue()
      Description copied from interface: AppendableSink
      Returns the defined constant value in case of a constant column value. Constant value definition is present in the importColumn xml element. The value will be accessible through ImportDataTransformer
      Specified by:
      getConstantValue in interface AppendableSink<DATA_TYPE,TARRAY>
    • getColumnName

      public String getColumnName()
      Description copied from interface: AppendableColumnSink
      Returns the column name retrieved from the TableDefinition of the column for this sink. This may or may not match the column name in the source csv file. See AppendableColumnSink.getCsvSourceColumnName()
      Specified by:
      getColumnName in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      the associated TableDefinition column name.
    • getCsvSourceColumnName

      public String getCsvSourceColumnName()
      Description copied from interface: AppendableColumnSink
      Returns the source name mapping defined for the column in the schema. When source name is not explicitly defined it defaults to TableDefinition column name. When the attribute AppendableColumnSink.isColumnInSource() is true, this should match a column header in source file.
      Specified by:
      getCsvSourceColumnName in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      The csv source file column header mapping of this column sink.
    • isColumnInSchema

      public boolean isColumnInSchema()
      Description copied from interface: AppendableColumnSink
      Returns true if the column is defined in schema.

      A source csv may have columns that are not defined in the schema. Those columns need to be identified, so they can be handled appropriately to satisfy dhc plumbing requirements. In addition, if such a column is designated to be RowUpdateObserver then row updates need to be published.

      Specified by:
      isColumnInSchema in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column is defined in schema.
    • isNotConsideredPartSourceFileMapping

      public boolean isNotConsideredPartSourceFileMapping()
      Description copied from interface: AppendableColumnSink
      Returns true when the column is one of multiple columns mapped to a single source file column in the schema. In these instances a ColumnSink defined as a ColumnSinkHolder will be the Column Sink that will be passed to DHC parsers infrastructure as the ColumnSink that is mapped to the source csv column. This column sink instance should be part of the collection of sinks that is managed by the ColumnSinkHolder
      Specified by:
      isNotConsideredPartSourceFileMapping in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column is part of many-to-one mapping as defined in schema w.r.t to source column
    • isColumnInSource

      public boolean isColumnInSource()
      Description copied from interface: AppendableColumnSink
      Returns true when the source name attribute in ImporterColumnDefinition is not null and the columns AppendableColumnSink.isConstantColumn() attribute is not true.
      Specified by:
      isColumnInSource in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column name or source column name in schema is present in the source csv file.
    • setColumnIsInSinkHolder

      public void setColumnIsInSinkHolder()
      When invoked will mark the column sink as included in a ColumnSinkHolder
    • isPartitionCol

      public boolean isPartitionCol()
      Description copied from interface: AppendableColumnSink
      Returns true when the ColumnDefinition.isPartitioning() attribute for the column is set to true.
      Specified by:
      isPartitionCol in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the column is defined as a partition column in the schema
    • supportsTransformations

      public boolean supportsTransformations()
      Description copied from interface: AppendableColumnSink
      Data transformation is considered to be supported when at least one of transform attribute or formula attribute are defined for the column in schema. When data transformation is supported the corresponding parser should use the generated transformer to transform the source file values.
      Specified by:
      supportsTransformations in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true when at least one of formula or transform attribute is defined in schema for the column
    • getColumnDataTransformer

      @NotNull public ImportColumnDataTransformer getColumnDataTransformer()
      Description copied from interface: AppendableColumnSink
      Returns an optional Data Transformer instance if data transformation is defined for the column in schema. See AppendableColumnSink.supportsTransformations().
      Specified by:
      getColumnDataTransformer in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      an instance of ImportColumnDataTransformer when data transformation is defined for column or null.
    • isConstantColumn

      public boolean isConstantColumn()
      Description copied from interface: AppendableColumnSink
      Returns false if column type is not set to ImporterColumnDefinition.IrisImportConstant in schema
      Specified by:
      isConstantColumn in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Returns:
      true if the ColumnDataTransformer has the hasConstant attribute as true
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getStringBuilderIncludingBasicSinkAttributes

      @NotNull protected StringBuilder getStringBuilderIncludingBasicSinkAttributes()
    • registerRowUpdateObserver

      public void registerRowUpdateObserver(@NotNull RowUpdateObserver rowUpdateObserver)
      Used to set the RowUpdateObserver on the sink.

      This is invoked when the AppendableColumnSink.isColumnOnlyInSchema() attribute is true for any column in schema then one of the source csv columns will be designated as RowUpdateObserver.

      Specified by:
      registerRowUpdateObserver in interface RowUpdateObservable
      Parameters:
      rowUpdateObserver - A RowUpdateObserver to which the row chunk attribute (size and destEnd ) will be published
    • publishRowUpdate

      public void publishRowUpdate(int size, long end)
      Publish row updates to interested sinks.
      Specified by:
      publishRowUpdate in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Parameters:
      size - The size of the update
      end - The destination End parameter of the update
    • addAppendableColumn

      public void addAppendableColumn(@Nullable String partition, @Nullable LocalAppendableColumn<DATA_TYPE> appendableColumn)
      Description copied from interface: AppendableColumnSink
      The method provides the partition and its associated LocalAppendableColumn This information should be cached in the column sink implementation and should be used when updating the column values based on PartitionParserUpdate that will be pushed to the column when the table has a partitioning column and the values of the column should be persisted in the appropriate partition.
      Specified by:
      addAppendableColumn in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Parameters:
      partition - Partition value as a string
      appendableColumn - The LocalAppendableColumn of the column associated for this partition
    • evict

      public void evict(@Nullable String partition)
      Description copied from interface: AppendableColumnSink
      The provided partition should be evicted from the current partition column cache. This should be registered as one of the evicted partitions.
      Specified by:
      evict in interface AppendableColumnSink<DATA_TYPE,TARRAY>
      Parameters:
      partition - Partition value as a string, that should be evicted from the local column cache of partitions
    • addToAppendableColumn

      protected abstract void addToAppendableColumn(@NotNull LocalAppendableColumn<DATA_TYPE> appendableColumn, @NotNull TARRAY values, int chunkStartIndex, int chunkSize, boolean isSingleValue)
      Writes to disk the contents of values array from chunkStartIndex to chunkSize in the appendable column of a particular partition
      Parameters:
      appendableColumn - The LocalAppendableColumn of the current partition
      values - The array whose contents need to be written to disk for this column and partition
      chunkStartIndex - The start index in the values array
      chunkSize - The length of data from the start index that need to be written to disk
      isSingleValue - This indicates that entire chunk needs to be updated using a single value which is the constant value
    • updateLocalAppendableColumn

      protected void updateLocalAppendableColumn(@NotNull TARRAY values, long destEnd, boolean isSingleValue)
      Blocks until the partition column is parsed for the same chunk.
    • onPartitionParserUpdate

      public void onPartitionParserUpdate(@NotNull PartitionParserUpdate parserUpdate)
      Description copied from interface: PartitionUpdatesObserver
      Updates are published by the source (CsvPartitionColumnParser or AppendableTableWrapper) by invoking this method.
      Specified by:
      onPartitionParserUpdate in interface PartitionUpdatesObserver
      Parameters:
      parserUpdate - Partition for current chunk.
    • processingComplete

      protected void processingComplete(@NotNull PartitionParserUpdate parserUpdate)
      Flush all associated LocalAppendableColumn for current ColumnSink before invoking Phaser.arrive() in PartitionParserUpdate instance
      Parameters:
      parserUpdate - the PartitionParserUpdate to invoke arrive, or arriveAndAwait on its internal Phaser instance
    • flush

      public void flush()
      flushes all associated LocalAppendableColumn of the column (different partitions will have different LocalAppendableColumns).
    • read

      public void read(TARRAY dest, boolean[] isNull, long srcBegin, long srcEnd)
      Specified by:
      read in interface io.deephaven.csv.sinks.Source<DATA_TYPE>
    • append

      public com.fishlib.base.log.LogOutput append(@NotNull com.fishlib.base.log.LogOutput logOutput)
      Specified by:
      append in interface com.fishlib.base.log.LogOutputAppendable