Package com.illumon.iris.importers
Class CsvFileSplitter
java.lang.Object
com.illumon.iris.importers.CsvFileSplitter
public class CsvFileSplitter extends Object
A utility for splitting a large CSV file into multiple temp files by a specified column value.
-
Constructor Summary
Constructors Constructor Description CsvFileSplitter()
-
Method Summary
Modifier and Type Method Description static List<File>
splitFile(com.fishlib.io.logger.Logger log, org.jdom2.Element sourceElement, TableDefinition tableDefinition, InputStream input, String partitionColumn, int maxOpenWriters, int chunkSize)
For backwards-compatibility with 1.0 version of this method.
-
Constructor Details
-
CsvFileSplitter
public CsvFileSplitter()
-
-
Method Details
-
splitFile
public static List<File> splitFile(com.fishlib.io.logger.Logger log, org.jdom2.Element sourceElement, TableDefinition tableDefinition, InputStream input, String partitionColumn, int maxOpenWriters, int chunkSize)For backwards-compatibility with 1.0 version of this method. Split the given input file into a set of temporary output files. The rows in each output file will maintain the ordering they had in the original file. We use the univocity parser for it's high speed. Since we may not be able to hold open all the target files, we cache up to maxOpenWriter writers and close the oldest ones when we hit the limit. We process the source file in "chunks" which are sorted by the partition value as they are read, in order to minimize the closing and opening of destination files.- Parameters:
log
- Logger created upstreaminput
- Source CSV input stream.partitionColumn
- Which Iris table column to partition the sort on (this function does the mapping to source column).maxOpenWriters
- Maximum number of output file writers to keep openchunkSize
- Rows to buffer before writing to output files (in partition-sorted order).- Returns:
- List of Files to be imported.
-