Class GeneralCsvImporter

java.lang.Object
com.illumon.iris.importers.GeneralImporter<CsvFieldWriter>
com.illumon.iris.importers.GeneralCsvImporter

public class GeneralCsvImporter
extends GeneralImporter<CsvFieldWriter>
General CSV importer class to handle standard CSV imports
  • Constructor Details

    • GeneralCsvImporter

      public GeneralCsvImporter​(@NotNull com.fishlib.io.logger.Logger log, @NotNull ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, List<File> sourceFiles, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, boolean trim) throws IOException
      Constructor used when importing from a set of files. Writer will be closed automatically when the set has been imported. For backwards-compatibility with callers that expect to always have a header row in the CSV and that don't support skipping footer lines.
      Parameters:
      log - Logger created upstream
      importTableWriterFactory - Provides table writers on demand based on the type of import (single vs multi partition)
      intradayPartitionColumn - Column to use for determining the target partition for multi partition imports (generally Date)
      sourceElement - InputStream from which to read CSV data
      sourceFiles - Stream of File objects from which to read CSV data
      factory - CSVFieldWriter factory that will create the field writers (setters) for the table's columns
      delimiter - Single character to be used as a delimiter - normally this is a comma
      strict - Whether to fail if a field fails numeric conversion or a column is missing a setter
      fileFormat - Apache commons CSV file format to use
      constantColumnValue - A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.
      skipHeaderLines - Number of lines to skip before the header. Use this for files that have leading "garbage".
      trim - If trim is true, use CSVFormat that trims leading and trailing blanks.
      Throws:
      IOException
    • GeneralCsvImporter

      public GeneralCsvImporter​(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, List<File> sourceFiles, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOException
      Constructor used when importing from a set of files. Writer will be closed automatically when the set has been imported.
      Parameters:
      log - Logger created upstream
      importTableWriterFactory - Provides table writers on demand based on the type of import (single vs multi partition)
      intradayPartitionColumn - Column to use for determining the target partition for multi partition imports (generally Date)
      sourceElement - InputStream from which to read CSV data
      sourceFiles - Stream of File objects from which to read CSV data
      factory - CSVFieldWriter factory that will create the field writers (setters) for the table's columns
      delimiter - Single character to be used as a delimiter - normally this is a comma
      strict - Whether to fail if a field fails numeric conversion or a column is missing a setter
      fileFormat - Apache commons CSV file format to use
      constantColumnValue - A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.
      skipHeaderLines - Number of lines to skip before the header. Use this for files that have leading "garbage".
      skipFooterLines - Number of lines to skip at the end of the file. Use this for files that have trailing "garbage".
      trim - If trim is true, use CSVFormat that trims leading and trailing blanks.
      noHeader - Whether the CSV does not include a header row with column names.
      columnNames - A list of column names to use instead of a header from the CSV.
      Throws:
      IOException
    • GeneralCsvImporter

      public GeneralCsvImporter​(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim) throws IOException
      Constructor used when importing from an InputStream - e.g. QuandlImporter.* Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.
      Parameters:
      log - Logger created upstream
      importTableWriterFactory - Provides table writers on demand based on the type of import (single vs multi partition)
      intradayPartitionColumn - Column to use for determining the target partition for multi partition imports (generally Date)
      sourceElement - InputStream from which to read CSV data
      sourceStream - InputStream from which to read CSV data
      factory - CSVFieldWriter factory that will create the field writers (setters) for the table's columns
      delimiter - Single character to be used as a delimiter - normally this is a comma
      strict - Whether to fail if a field fails numeric conversion or a column is missing a setter
      fileFormat - Apache commons CSV file format to use
      constantColumnValue - A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.
      skipLines - Number of lines to skip before the header. Use this for files that have leading "garbage".
      trim - If trim is true, use CSVFormat that trims leading and trailing blanks.
      Throws:
      IOException
    • GeneralCsvImporter

      public GeneralCsvImporter​(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOException
      Constructor used when importing from an InputStream - e.g. QuandlImporter.* Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.
      Parameters:
      log - Logger created upstream
      importTableWriterFactory - Provides table writers on demand based on the type of import (single vs multi partition)
      intradayPartitionColumn - Column to use for determining the target partition for multi partition imports (generally Date)
      sourceElement - InputStream from which to read CSV data
      sourceStream - InputStream from which to read CSV data
      factory - CSVFieldWriter factory that will create the field writers (setters) for the table's columns
      delimiter - Single character to be used as a delimiter - normally this is a comma
      strict - Whether to fail if a field fails numeric conversion or a column is missing a setter
      fileFormat - Apache commons CSV file format to use
      constantColumnValue - A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.
      skipLines - Number of lines to skip before the header. Use this for files that have leading "garbage".
      trim - If trim is true, use CSVFormat that trims leading and trailing blanks.
      noHeader - Whether the CSV does not include a header row with column names.
      columnNames - A list of column names to use instead of a header from the CSV.
      Throws:
      IOException
    • GeneralCsvImporter

      public GeneralCsvImporter​(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOException
      Constructor used when importing from an InputStream - e.g. QuandlImporter.* Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.
      Parameters:
      log - Logger created upstream
      importTableWriterFactory - Provides table writers on demand based on the type of import (single vs multi partition)
      intradayPartitionColumn - Column to use for determining the target partition for multi partition imports (generally Date)
      sourceElement - InputStream from which to read CSV data
      sourceStream - InputStream from which to read CSV data
      factory - CSVFieldWriter factory that will create the field writers (setters) for the table's columns
      delimiter - Single character to be used as a delimiter - normally this is a comma
      strict - Whether to fail if a field fails numeric conversion or a column is missing a setter
      fileFormat - Apache commons CSV file format to use
      constantColumnValue - A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.
      skipHeaderLines - Number of lines to skip before the header. Use this for files that have leading "garbage".
      trim - If trim is true, use CSVFormat that trims leading and trailing blanks.
      noHeader - Whether the CSV does not include a header row with column names.
      columnNames - A list of column names to use instead of a header from the CSV.
      Throws:
      IOException
  • Method Details

    • getLineCount

      public long getLineCount()
    • handleShutdown

      @InternalUseOnly public void handleShutdown()
      Close and flush all data buffered in all writers.
    • run

      public void run()
      Iterates the file stream set, or passes the InputStream directly to processFile to be imported.