Class CsvFileSplitter

java.lang.Object
com.illumon.iris.importers.CsvFileSplitter

public class CsvFileSplitter
extends Object
A utility for splitting a large CSV file into multiple temp files by a specified column value.
  • Constructor Details

  • Method Details

    • splitFile

      public static List<File> splitFile​(com.fishlib.io.logger.Logger log, org.jdom2.Element sourceElement, TableDefinition tableDefinition, InputStream input, String partitionColumn, int maxOpenWriters, int chunkSize)
      For backwards-compatibility with 1.0 version of this method. Split the given input file into a set of temporary output files. The rows in each output file will maintain the ordering they had in the original file. We use the univocity parser for it's high speed. Since we may not be able to hold open all the target files, we cache up to maxOpenWriter writers and close the oldest ones when we hit the limit. We process the source file in "chunks" which are sorted by the partition value as they are read, in order to minimize the closing and opening of destination files.
      Parameters:
      log - Logger created upstream
      input - Source CSV input stream.
      partitionColumn - Which Iris table column to partition the sort on (this function does the mapping to source column).
      maxOpenWriters - Maximum number of output file writers to keep open
      chunkSize - Rows to buffer before writing to output files (in partition-sorted order).
      Returns:
      List of Files to be imported.