Class CsvTools

java.lang.Object
io.deephaven.importers.csv.CsvTools

public class CsvTools extends Object
Main CSV import tools class.
  • Constructor Details

    • CsvTools

      public CsvTools()
  • Method Details

    • getColumnHeaders

      @NotNull public static @NotNull List<String> getColumnHeaders(@NotNull @NotNull InputStream stream, @NotNull @NotNull CsvSpecs specs) throws IOException
      Return the column headers as list using the values from the first row. To avoid reading the entire file the CsvSpecs should restrict the num of rows to be read to 1, this check is enforced.
      Parameters:
      stream - An InputStream providing access to the CSV data.
      specs - The CsvSpecs
      Returns:
      Return the column headers using the values from the first row.
      Throws:
      IOException - throws IOException if a Reader Exception occurs
    • getCsvSpecsBuilder

      @NotNull public static CsvSpecs.Builder getCsvSpecsBuilder(@Nullable @Nullable String format, boolean hasHeader, @Nullable @Nullable Collection<String> headers, @Nullable @Nullable List<String> nullLiterals, boolean validateHeaders) throws CsvFormatException
      Returns the CsvSpecs.Builder created with appropriate properties. Allows user to make the choices for header validation to conform to deephaven column header rules. Users have flexibility to include a null literal value list
      Parameters:
      format - can be null a delimiter or one of DEFAULT, TDF, EXCEL, MYSQL, RFC4180 and TRIM
      hasHeader - true or false to indicate if data includes header row
      headers - Column names to use as, or instead of, the header row for the CSV.
      nullLiterals - The list of null value literals that should be considered as null by the parser, if null is passed then default list consisting of 1. Empty String 2. The string value null 3. The string value of null in parentheses will be used
      validateHeaders - true indicates headers will be validated with deephaven based column header rules
      Returns:
      CsvSpecs.Builder
      Throws:
      CsvFormatException - thrown for an unsupported format
    • getDefaultDelimiter

      public static char getDefaultDelimiter(String format)
      Returns the default delimiter for the specified format. If the format can't be determined it returns ','.
      Parameters:
      format - the format from CsvFormats
      Returns:
      the default delimiter
    • getImportCsvSpecsBuilder

      @NotNull public static CsvSpecs.Builder getImportCsvSpecsBuilder(@Nullable @Nullable String format, boolean hasHeader, @Nullable @Nullable Collection<String> headers) throws CsvFormatException
      Returns the CsvSpecs.Builder that doesn't include source file header validation and legalization to conform to deephaven column header rules. This method should therefore not be used where deephaven column header rules are expected to be applied. The typical use case for this is in Csv imports where the schema drive the eventual column names. In addition to not validating the headers a default list for Null Literals is used for more information see getDefaultFormatBuilder(CsvSpecs.Builder, List). The method is public to allow Csv Import related classes to access it.
      Parameters:
      format - can be null a delimiter or one of DEFAULT, TDF, EXCEL, MYSQL, RFC4180 and TRIM
      hasHeader - true or false to indicate if data includes header row
      headers - Column names to use as, or instead of, the header row for the CSV.
      Returns:
      CsvSpecs.Builder
      Throws:
      CsvFormatException - thrown for an unsupported format
    • getImportCsvSpecsBuilder

      @NotNull public static CsvSpecs.Builder getImportCsvSpecsBuilder(@NotNull @NotNull CsvFormats format, boolean hasHeader, @Nullable @Nullable Collection<String> headers) throws CsvFormatException
      Returns the CsvSpecs.Builder that doesn't include source file header validation and legalization to conform to deephaven column header rules. This method should therefore not be used where deephaven column header rules are expected to be applied. The typical use case for this is in Csv imports where the schema drive the eventual column names. In addition to not validating the headers a default list for Null Literals is used for more information see getDefaultFormatBuilder(CsvSpecs.Builder, List). The method is public to allow Csv Import related classes to access it.
      Parameters:
      format - the format from CsvFormats
      hasHeader - true or false to indicate if data includes header row
      headers - Column names to use as, or instead of, the header row for the CSV.
      Returns:
      CsvSpecs.Builder
      Throws:
      CsvFormatException - thrown for an unsupported format
    • importCsv

      public static long importCsv(@NotNull @NotNull InputStream stream, @NotNull CsvSpecs.Builder specBuilder, @NotNull @NotNull ImportTableWriterFactory tableWriterFactory, @NotNull @NotNull Logger log, @NotNull @NotNull List<String> columnNamesInFile, @NotNull @NotNull Map<String,ImporterColumnDefinition> icdMap, @NotNull @NotNull Map<String,String> importProperties, @NotNull @NotNull AtomicInteger errorCount, @NotNull @NotNull String arrayDelimiter, @Nullable @Nullable String constantColumnValue, int maxError, boolean strict) throws IOException
      Imports and writes the csv data to disk. From CsvImportTools#importCsv.
      Parameters:
      stream - InputStream from which to read CSV data
      specBuilder - The CsvSpecs.Builder
      tableWriterFactory - The passed down ImportTableWriterFactory
      log - The passed-down logger
      columnNamesInFile - The column headers in source
      icdMap - The column name to ImporterColumnDefinition map
      importProperties - Provides basic import attributes
      errorCount - Holds the current error count across all parsers being used to import csv
      arrayDelimiter - Delimiter used to parse array data types
      constantColumnValue - A String to materialize as the source column when an ImportColumn is defined with a sourceType of CONSTANT (aka ImporterColumnDefinition$IrisImportConstant). Can be null.
      maxError - Maximum number of field conversion failures allowed
      strict - Whether to fail if a field fails conversion
      Returns:
      Returns the number of rows processed
      Throws:
      IOException - thrown when encountering an error in the import call