com.illumon.iris.importers.CsvSchemaCreator

public class CsvSchemaCreator extends Object

Reads a CSV file and attempts to infer column data types and create appropriate schema and importer instructions. Also legalizes column names and adds corresponding ImportColumn entries for translation of column names.

Constructor Summary

Constructors

Constructor

Description

CsvSchemaCreator(com.fishlib.io.logger.Logger log, StatusCallback progress)
Method Summary

Modifier and Type

Method

Description

static CsvImporterHelper

getInitializedCsvImporterHelper(File sourceFile, String fileFormat, char delimiter, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames, com.fishlib.io.logger.Logger log)

Sets up and returns a CsvImporterHelper to provide column details and record parsing capabilities.

String

getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, String fileFormat, char delimiter, int skipHeaderLines, int skipFooterLines, File sourceFile, boolean bestFit, boolean trim, boolean noHeader, List<String> columnNames, boolean logProgress, int maxRows)

Get an XML String of table schema based on a file and user-provided options.

String

getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, String fileFormat, char delimiter, int skipHeaderLines, int skipFooterLines, File sourceFile, boolean bestFit, boolean trim, boolean noHeader, List<String> columnNames, boolean logProgress, int maxRows, CasingStyle casingStyle, String replacement)

Get an XML String of table schema based on a file and user-provided options.

static void

main(String... args)

Regular main entry point, used when this module is called from a java command line, or from an IntelliJ run configuration.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- CsvSchemaCreator
  
  public CsvSchemaCreator(@NotNull com.fishlib.io.logger.Logger log, StatusCallback progress)
Method Details
- getInitializedCsvImporterHelper
  
  public static CsvImporterHelper getInitializedCsvImporterHelper(@NotNull File sourceFile, String fileFormat, char delimiter, int skipHeaderLines, int skipFooterLines, boolean trim, boolean noHeader, List<String> columnNames, com.fishlib.io.logger.Logger log)
  
  Sets up and returns a CsvImporterHelper to provide column details and record parsing capabilities.
  
  Parameters:
  
  sourceFile - File object pointing to the CSV file to be analyzed.
  
  fileFormat - Apache CSV Parser file format name.
  
  delimiter - Single character delimiter.
  
  skipHeaderLines - Number of lines to skip at the top of the file before trying to read the header row.
  
  skipFooterLines - Number of lines to skip from the end of the file.
  
  trim - Whether to trim data around values between delimiters.
  
  noHeader - Indicates that the source file does not include a header row with column names.
  
  log - An Iris event logger object.
  
  Returns:
  
  A CsvImporterHelper that can process the passed sourceFile.
- getTableSchema
  
  public String getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, String fileFormat, char delimiter, int skipHeaderLines, int skipFooterLines, File sourceFile, boolean bestFit, boolean trim, boolean noHeader, List<String> columnNames, boolean logProgress, int maxRows)
  
  Get an XML String of table schema based on a file and user-provided options. This method is public so other applications, like the Schema Editor, can use it.
  
  Parameters:
  
  namespace - Namespace to use for the new schema.
  
  table - Table name to use for the new schema.
  
  groupingColumn - Optional single column name to mark as a Grouping column.
  
  partitionColumn - Which column to use as the Partitioning column.
  
  sourceName - Name to use for the CSV InputSource.
  
  sourcePartitionColumn - Column name in the source data to use for multi-partition imports
  
  fileFormat - Apache CSV Parser file format name.
  
  delimiter - Single character delimiter.
  
  skipHeaderLines - Number of lines to skip at the top of the file before trying to read the header row.
  
  skipFooterLines - Number of lines to skip from the end of the file.
  
  sourceFile - File object pointing to the CSV file to be analyzed.
  
  bestFit - Whether to try to use smaller types (true), like int and float, or just to use bigger types, like long and double.
  
  trim - Whether to trim data around values between delimiters.
  
  noHeader - Indicates that the CSV does not include a header with column names.
  
  logProgress - Whether to update the log with progress percentages.
  
  maxRows - A maximum number of rows to read, rather than reading the whole file. A value of zero or less means to read the whole file.
  
  Returns:
  
  A String with the XML of derived table schema and CSV import instructions.
- getTableSchema
  
  public String getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, String fileFormat, char delimiter, int skipHeaderLines, int skipFooterLines, File sourceFile, boolean bestFit, boolean trim, boolean noHeader, List<String> columnNames, boolean logProgress, int maxRows, CasingStyle casingStyle, String replacement)
  
  Get an XML String of table schema based on a file and user-provided options. This method is public so other applications, like the Schema Editor, can use it.
  
  Parameters:
  
  namespace - Namespace to use for the new schema.
  
  table - Table name to use for the new schema.
  
  groupingColumn - Optional single column name to mark as a Grouping column.
  
  partitionColumn - Which column to use as the Partitioning column.
  
  sourceName - Name to use for the CSV InputSource.
  
  sourcePartitionColumn - Column name in the source data to use for multi-partition imports
  
  fileFormat - Apache CSV Parser file format name.
  
  delimiter - Single character delimiter.
  
  skipHeaderLines - Number of lines to skip at the top of the file before trying to read the header row.
  
  skipFooterLines - Number of lines to skip from the end of the file.
  
  sourceFile - File object pointing to the CSV file to be analyzed.
  
  bestFit - Whether to try to use smaller types (true), like int and float, or just to use bigger types, like long and double.
  
  trim - Whether to trim data around values between delimiters.
  
  noHeader - Indicates that the CSV does not include a header with column names.
  
  logProgress - Whether to update the log with progress percentages.
  
  maxRows - A maximum number of rows to read, rather than reading the whole file. A value of zero or less means to read the whole file.
  
  casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
  
  replacement - character, or empty String, to use for replacments of space or hyphen in source column names
  
  Returns:
  
  A String with the XML of derived table schema and CSV import instructions.
- main
  
  public static void main(String... args)
  
  Regular main entry point, used when this module is called from a java command line, or from an IntelliJ run configuration.
  
  Parameters:
  
  args - Varargs list of arguments in Apache CLI format

Class CsvSchemaCreator

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

CsvSchemaCreator

Method Details

getInitializedCsvImporterHelper

getTableSchema

getTableSchema

main