Package com.illumon.iris.importers
Class GeneralCsvImporter
java.lang.Object
com.illumon.iris.importers.GeneralImporter<CsvFieldWriter>
com.illumon.iris.importers.GeneralCsvImporter
public class GeneralCsvImporter extends GeneralImporter<CsvFieldWriter>
General CSV importer class to handle standard CSV imports
-
Field Summary
Fields inherited from class com.illumon.iris.importers.GeneralImporter
customImportProperties, importTableWriterFactory, intradayPartitionColumn, log, MAX_OPEN_TABLE_WRITERS, strict
-
Constructor Summary
Constructors Constructor Description GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim)
Constructor used when importing from an InputStream - e.g.GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim, boolean noHeader, List<String> columnNames)
Constructor used when importing from an InputStream - e.g.GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames)
Constructor used when importing from an InputStream - e.g.GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, List<File> sourceFiles, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, boolean trim)
Constructor used when importing from a set of files.GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, List<File> sourceFiles, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames)
Constructor used when importing from a set of files. -
Method Summary
Modifier and Type Method Description long
getLineCount()
void
handleShutdown()
Close and flush all data buffered in all writers.void
run()
Iterates the file stream set, or passes the InputStream directly to processFile to be imported.
-
Constructor Details
-
GeneralCsvImporter
public GeneralCsvImporter(@NotNull com.fishlib.io.logger.Logger log, @NotNull ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, List<File> sourceFiles, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, boolean trim) throws IOExceptionConstructor used when importing from a set of files. Writer will be closed automatically when the set has been imported. For backwards-compatibility with callers that expect to always have a header row in the CSV and that don't support skipping footer lines.- Parameters:
log
- Logger created upstreamimportTableWriterFactory
- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn
- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement
- InputStream from which to read CSV datasourceFiles
- Stream of File objects from which to read CSV datafactory
- CSVFieldWriter factory that will create the field writers (setters) for the table's columnsdelimiter
- Single character to be used as a delimiter - normally this is a commastrict
- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat
- Apache commons CSV file format to useconstantColumnValue
- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipHeaderLines
- Number of lines to skip before the header. Use this for files that have leading "garbage".trim
- If trim is true, use CSVFormat that trims leading and trailing blanks.- Throws:
IOException
-
GeneralCsvImporter
public GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, List<File> sourceFiles, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOExceptionConstructor used when importing from a set of files. Writer will be closed automatically when the set has been imported.- Parameters:
log
- Logger created upstreamimportTableWriterFactory
- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn
- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement
- InputStream from which to read CSV datasourceFiles
- Stream of File objects from which to read CSV datafactory
- CSVFieldWriter factory that will create the field writers (setters) for the table's columnsdelimiter
- Single character to be used as a delimiter - normally this is a commastrict
- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat
- Apache commons CSV file format to useconstantColumnValue
- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipHeaderLines
- Number of lines to skip before the header. Use this for files that have leading "garbage".skipFooterLines
- Number of lines to skip at the end of the file. Use this for files that have trailing "garbage".trim
- If trim is true, use CSVFormat that trims leading and trailing blanks.noHeader
- Whether the CSV does not include a header row with column names.columnNames
- A list of column names to use instead of a header from the CSV.- Throws:
IOException
-
GeneralCsvImporter
public GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim) throws IOExceptionConstructor used when importing from an InputStream - e.g. QuandlImporter.* Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.- Parameters:
log
- Logger created upstreamimportTableWriterFactory
- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn
- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement
- InputStream from which to read CSV datasourceStream
- InputStream from which to read CSV datafactory
- CSVFieldWriter factory that will create the field writers (setters) for the table's columnsdelimiter
- Single character to be used as a delimiter - normally this is a commastrict
- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat
- Apache commons CSV file format to useconstantColumnValue
- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipLines
- Number of lines to skip before the header. Use this for files that have leading "garbage".trim
- If trim is true, use CSVFormat that trims leading and trailing blanks.- Throws:
IOException
-
GeneralCsvImporter
public GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOExceptionConstructor used when importing from an InputStream - e.g. QuandlImporter.* Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.- Parameters:
log
- Logger created upstreamimportTableWriterFactory
- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn
- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement
- InputStream from which to read CSV datasourceStream
- InputStream from which to read CSV datafactory
- CSVFieldWriter factory that will create the field writers (setters) for the table's columnsdelimiter
- Single character to be used as a delimiter - normally this is a commastrict
- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat
- Apache commons CSV file format to useconstantColumnValue
- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipLines
- Number of lines to skip before the header. Use this for files that have leading "garbage".trim
- If trim is true, use CSVFormat that trims leading and trailing blanks.noHeader
- Whether the CSV does not include a header row with column names.columnNames
- A list of column names to use instead of a header from the CSV.- Throws:
IOException
-
GeneralCsvImporter
public GeneralCsvImporter(com.fishlib.io.logger.Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, org.jdom2.Element sourceElement, InputStream sourceStream, CsvFieldWriter.Factory factory, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOExceptionConstructor used when importing from an InputStream - e.g. QuandlImporter.* Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.- Parameters:
log
- Logger created upstreamimportTableWriterFactory
- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn
- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement
- InputStream from which to read CSV datasourceStream
- InputStream from which to read CSV datafactory
- CSVFieldWriter factory that will create the field writers (setters) for the table's columnsdelimiter
- Single character to be used as a delimiter - normally this is a commastrict
- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat
- Apache commons CSV file format to useconstantColumnValue
- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipHeaderLines
- Number of lines to skip before the header. Use this for files that have leading "garbage".trim
- If trim is true, use CSVFormat that trims leading and trailing blanks.noHeader
- Whether the CSV does not include a header row with column names.columnNames
- A list of column names to use instead of a header from the CSV.- Throws:
IOException
-
-
Method Details
-
getLineCount
public long getLineCount() -
handleShutdown
Close and flush all data buffered in all writers. -
run
public void run()Iterates the file stream set, or passes the InputStream directly to processFile to be imported.
-