Interface CsvImporterHelper

All Superinterfaces:
AutoCloseable, Closeable

public interface CsvImporterHelper extends Closeable
Class to assist with different styles of CSV files
  • Method Details

    • setBufferSize

      void setBufferSize(int bufferSize)
      Sets the buffer size to use for a FooterSkipBufferedReader
      Parameters:
      bufferSize - size of the buffer in characters
    • getBufferSize

      int getBufferSize()
      Returns the buffer size that will be used when creating a FooterSkipBufferedReader
      Returns:
      int size of the buffer in characters
    • getColumnNamesFromStream

      List<String> getColumnNamesFromStream()
      Get the list of column names from a CSV file; only call after it's been initialized with a stream
      Returns:
      the List of column names
    • processImport

      long processImport(@NotNull com.fishlib.io.logger.Logger log, @NotNull ImportTableWriterFactory writerFactory, @NotNull Map<String,ImporterColumnDefinition> icdMap, @NotNull Map<String,String> importProperties, @NotNull String arrayDelimiter, @Nullable String constantColumnValue, @Nullable String currentPartition, @NotNull AtomicInteger errorCount, int maxError, boolean strict, boolean fromSplitFile) throws IOException
      Process the source file or stream and persist to disk as a Table
      Parameters:
      log - The passed-down logger
      writerFactory - The passed down ImportTableWriterFactory
      icdMap - The column name to ImporterColumnDefinition map
      importProperties - Provides basic import attributes
      arrayDelimiter - Delimiter used to parse array data types
      constantColumnValue - A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.
      currentPartition - The current partition value when invoked using splitFile
      errorCount - Holds a record of parse errors
      maxError - Maximum number of field conversion failures allowed
      strict - Whether to fail if a field fails conversion
      fromSplitFile - True if stream is source from an interim split file, split using the partition column
      Returns:
      The number of rows processed
      Throws:
      IOException - throws IOException when exceptions occur while reading files
    • close

      void close() throws IOException
      Close the stream
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException - if an error occurs
    • validateImport

      void validateImport() throws ImportException
      Validate the import.
      Throws:
      ImportException - thrown in case of errors
    • getCsvImporterHelper

      static CsvImporterHelper getCsvImporterHelper(String fileFormat, char delimiter, boolean trim, boolean noHeader, int skipHeaderLines, int skipFooterLines, InputStream inputStream, List<String> columnNames, boolean fromSplitFile) throws IOException
      Get an appropriate CsvImporterHelper instance.
      Parameters:
      fileFormat - The file format
      delimiter - The delimiter
      trim - Whether to trim the lines
      noHeader - Indicates that the CSV does not contain a header row with column names
      skipHeaderLines - How many lines to skip at the beginning of the data
      skipFooterLines - How many lines to skip at the end of the data
      inputStream - The stream to use for the import
      columnNames - List of column names to use as a header for the CSV. null if no column names are being passed in.
      fromSplitFile - Flag indicating if the import source is a splitFile
      Returns:
      the applicable CsvImporterHelper
      Throws:
      IOException - thrown for any type of underlying checked exceptions
    • parseNextRecord

      default org.apache.commons.csv.CSVRecord parseNextRecord() throws IOException
      Parse the next CSV record from the stream
      Returns:
      the parsed CSVRecord
      Throws:
      IOException - if an error occurs