Package io.deephaven.importers.csv
Class GeneralCsvImporter
java.lang.Object
io.deephaven.importers.GeneralImporter<CsvFieldWriter>
io.deephaven.importers.csv.GeneralCsvImporter
General CSV importer class to handle standard CSV imports
-
Nested Class Summary
Nested classes/interfaces inherited from class io.deephaven.importers.GeneralImporter
GeneralImporter.CacheEntry, GeneralImporter.IntradayPartitionSupplier<Context> -
Field Summary
Fields inherited from class io.deephaven.importers.GeneralImporter
customImportProperties, importTableWriterFactory, intradayPartitionColumn, log, strict -
Constructor Summary
ConstructorsConstructorDescriptionGeneralCsvImporter(@NotNull Logger log, @NotNull ImportTableWriterFactory importTableWriterFactory, @Nullable String intradayPartitionColumn, @Nullable io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, char delimiter, boolean strict, String fileFormat, List<File> sourceFiles, InputStream sourceStream, @Nullable String constantColumnValue, Map<String, String> customImportProperties, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, @Nullable List<String> columnNames) Constructor used when importing from a set of files.GeneralCsvImporter(@NotNull Logger log, @NotNull ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, List<File> sourceFiles, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, boolean trim) Constructor used when importing from a set of files.GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, InputStream sourceStream, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim) Constructor used when importing from an InputStream - e.g.GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, InputStream sourceStream, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim, boolean noHeader, List<String> columnNames) Constructor used when importing from an InputStream - e.g.GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, InputStream sourceStream, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) Constructor used when importing from an InputStream - e.g.GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, List<File> sourceFiles, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) Constructor used when importing from a set of files. -
Method Summary
Modifier and TypeMethodDescriptionbuildEndOfRecordFieldWriter(io.deephaven.shadow.enterprise.com.illumon.iris.binarystore.TableWriter tableWriter) Function<io.deephaven.shadow.enterprise.com.illumon.iris.binarystore.RowSetter,CsvFieldWriter> buildFieldWriterFactory(ImporterColumnDefinition column, Class setterType, Map<String, String> importProperties, String actualPartition) longvoidClose and flush all data buffered in all writers.voidrun()Iterates the file stream set, or passes the InputStream directly to processFile to be imported.Methods inherited from class io.deephaven.importers.GeneralImporter
closeAllWriters, getImportTableWriterFactory, getMaxOpenTableWriters, getTableEntry, setMaxOpenTableWriters
-
Constructor Details
-
GeneralCsvImporter
public GeneralCsvImporter(@NotNull @NotNull Logger log, @NotNull @NotNull ImportTableWriterFactory importTableWriterFactory, @Nullable @Nullable String intradayPartitionColumn, @Nullable @Nullable io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, char delimiter, boolean strict, String fileFormat, List<File> sourceFiles, InputStream sourceStream, @Nullable @Nullable String constantColumnValue, Map<String, String> customImportProperties, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, @Nullable @Nullable List<String> columnNames) Constructor used when importing from a set of files. Writer will be closed automatically when the set has been imported. For backwards-compatibility with callers that expect to always have a header row in the CSV and that don't support skipping footer lines.- Parameters:
log- Logger created upstreamimportTableWriterFactory- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement- InputStream from which to read CSV datasourceFiles- Stream of File objects from which to read CSV datasourceStream- The @{link InputStream} from which to read CSV datadelimiter- Single character to be used as a delimiter - normally this is a commastrict- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat- Apache commons CSV file format to useconstantColumnValue- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant)skipHeaderLines- Number of lines to skip before the headertrim- If trim is true, use CSVFormat that trims leading and trailing blanks.customImportProperties- Custom import propertiesskipFooterLines- Number of lines to skip at the end of the filenoHeader- Whether the file has a header row with column namescolumnNames- the column names to import
-
GeneralCsvImporter
public GeneralCsvImporter(@NotNull @NotNull Logger log, @NotNull @NotNull ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, List<File> sourceFiles, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, boolean trim) Constructor used when importing from a set of files. Writer will be closed automatically when the set has been imported. For backwards-compatibility with callers that expect to always have a header row in the CSV and that don't support skipping footer lines.- Parameters:
log- Logger created upstreamimportTableWriterFactory- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement- InputStream from which to read CSV datasourceFiles- Stream of File objects from which to read CSV datadelimiter- Single character to be used as a delimiter - normally this is a commastrict- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat- Apache commons CSV file format to useconstantColumnValue- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipHeaderLines- Number of lines to skip before the header. Use this for files that have leading "garbage".trim- If trim is true, use CSVFormat that trims leading and trailing blanks.
-
GeneralCsvImporter
public GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, List<File> sourceFiles, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOException Constructor used when importing from a set of files. Writer will be closed automatically when the set has been imported.- Parameters:
log- Logger created upstreamimportTableWriterFactory- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement- InputStream from which to read CSV datasourceFiles- Stream of File objects from which to read CSV datadelimiter- Single character to be used as a delimiter - normally this is a commastrict- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat- Apache commons CSV file format to useconstantColumnValue- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipHeaderLines- Number of lines to skip before the header. Use this for files that have leading "garbage".skipFooterLines- Number of lines to skip at the end of the file. Use this for files that have trailing "garbage".trim- If trim is true, use CSVFormat that trims leading and trailing blanks.noHeader- Whether the CSV does not include a header row with column names.columnNames- A list of column names to use instead of a header from the CSV.- Throws:
IOException- is thrown in case of IO failure when handling given source files
-
GeneralCsvImporter
public GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, InputStream sourceStream, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim) throws IOException Constructor used when importing from an InputStream - e.g. QuandlImporter.*Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.
- Parameters:
log- Logger created upstreamimportTableWriterFactory- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement- InputStream from which to read CSV datasourceStream- InputStream from which to read CSV datadelimiter- Single character to be used as a delimiter - normally this is a commastrict- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat- Apache commons CSV file format to useconstantColumnValue- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipLines- Number of lines to skip before the header. Use this for files that have leading "garbage".trim- If trim is true, use CSVFormat that trims leading and trailing blanks.- Throws:
IOException- is thrown in case of IO failure when handling given sourceStream
-
GeneralCsvImporter
public GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, InputStream sourceStream, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOException Constructor used when importing from an InputStream - e.g. QuandlImporter.*Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.
- Parameters:
log- Logger created upstreamimportTableWriterFactory- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement- InputStream from which to read CSV datasourceStream- InputStream from which to read CSV datadelimiter- Single character to be used as a delimiter - normally this is a commastrict- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat- Apache commons CSV file format to useconstantColumnValue- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipLines- Number of lines to skip before the header. Use this for files that have leading "garbage".trim- If trim is true, use CSVFormat that trims leading and trailing blanks.noHeader- Whether the CSV does not include a header row with column names.columnNames- A list of column names to use instead of a header from the CSV.- Throws:
IOException- is thrown in case of IO failure when handling given sourceStream
-
GeneralCsvImporter
public GeneralCsvImporter(Logger log, ImportTableWriterFactory importTableWriterFactory, String intradayPartitionColumn, io.deephaven.shadow.enterprise.org.jdom2.Element sourceElement, InputStream sourceStream, char delimiter, boolean strict, String fileFormat, String constantColumnValue, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames) throws IOException Constructor used when importing from an InputStream - e.g. QuandlImporter.*Note that the CSV importer will always close any table writers it uses, so it is up to the factory to provide appending writers if appropriate.
- Parameters:
log- Logger created upstreamimportTableWriterFactory- Provides table writers on demand based on the type of import (single vs multi partition)intradayPartitionColumn- Column to use for determining the target partition for multi partition imports (generally Date)sourceElement- InputStream from which to read CSV datasourceStream- InputStream from which to read CSV datadelimiter- Single character to be used as a delimiter - normally this is a commastrict- Whether to fail if a field fails numeric conversion or a column is missing a setterfileFormat- Apache commons CSV file format to useconstantColumnValue- A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.skipHeaderLines- Number of lines to skip before the header. Use this for files that have leading "garbage".skipFooterLines- Number of lines to skip at the end of the file. Use this for files that have trailing "garbage".trim- If trim is true, use CSVFormat that trims leading and trailing blanks.noHeader- Whether the CSV does not include a header row with column names.columnNames- A list of column names to use instead of a header from the CSV.- Throws:
IOException- is thrown in case of IO failure when handling given sourceStream
-
-
Method Details
-
getLineCount
public long getLineCount() -
handleShutdown
Close and flush all data buffered in all writers. -
getColumnDefinitions
- Specified by:
getColumnDefinitionsin classGeneralImporter<CsvFieldWriter>
-
buildFieldWriterFactory
public Function<io.deephaven.shadow.enterprise.com.illumon.iris.binarystore.RowSetter,CsvFieldWriter> buildFieldWriterFactory(ImporterColumnDefinition column, Class setterType, Map<String, String> importProperties, String actualPartition) - Specified by:
buildFieldWriterFactoryin classGeneralImporter<CsvFieldWriter>
-
buildEndOfRecordFieldWriter
public CsvFieldWriter buildEndOfRecordFieldWriter(io.deephaven.shadow.enterprise.com.illumon.iris.binarystore.TableWriter tableWriter) - Specified by:
buildEndOfRecordFieldWriterin classGeneralImporter<CsvFieldWriter>
-
run
public void run()Iterates the file stream set, or passes the InputStream directly to processFile to be imported.
-