com.illumon.iris.importers.XmlSchemaCreator

public class XmlSchemaCreator
extends Object

Reads an XML file and attempts to infer column data types and create appropriate schema and importer instructions. Also legalizes column names and adds corresponding ImportColumn entries for translation of column names.

Constructor Summary

Constructors

Constructor Description

XmlSchemaCreator(com.fishlib.io.logger.Logger log, StatusCallback progress)

Method Summary

Modifier and Type	Method	Description
`static List<String>`	`getElementTypeNames(File sourceFile, int startIndex, int startDepth, int maxDepth)`	Returns slash-delimited path-qualified element names based on the starting index, depth, and max-depth in the XML document.
`static com.illumon.iris.importers.CsvImporterHelperXml`	`getInitializedCsvImporterHelperXml(File sourceFile, String elementType, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, com.fishlib.io.logger.Logger log)`	Returns an import helper that can provide CSV records to parse, and column details from an XML file.
`String`	`getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, File sourceFile, String elementType, boolean bestFit, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, boolean logProgress)`	Get an XML String of table schema based on a file and user-provided options.
`String`	`getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, File sourceFile, String elementType, boolean bestFit, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, boolean logProgress, int maxRows)`	Get an XML String of table schema based on a file and user-provided options.
`String`	`getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, File sourceFile, String elementType, boolean bestFit, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, boolean logProgress, int maxRows, CasingStyle casingStyle, String replacement)`	Get an XML String of table schema based on a file and user-provided options.
`static void`	`main(String... args)`	Regular main entry point, used when this module is called from a java command line, or from an IntelliJ run configuration.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- XmlSchemaCreator
  
  public XmlSchemaCreator(@NotNull com.fishlib.io.logger.Logger log, StatusCallback progress)
Method Details
- getElementTypeNames
  
  public static List<String> getElementTypeNames(@NotNull File sourceFile, int startIndex, int startDepth, int maxDepth)
  
  Returns slash-delimited path-qualified element names based on the starting index, depth, and max-depth in the XML document.
  
  Parameters:
  
  sourceFile - File pointing to XML document to read.
  
  startIndex - From the root element, how many elements to skip before starting.
  
  startDepth - From the start index element, how many level to descend before beginning to enumerate.
  
  maxDepth - From the starting depth, how many further levels to recurse while enumerating.
  
  Returns:
  
  A List of Strings of the qualified element paths.
- getInitializedCsvImporterHelperXml
  
  public static com.illumon.iris.importers.CsvImporterHelperXml getInitializedCsvImporterHelperXml(@NotNull File sourceFile, String elementType, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, com.fishlib.io.logger.Logger log)
  
  Returns an import helper that can provide CSV records to parse, and column details from an XML file.
  
  Parameters:
  
  sourceFile - File object pointing to the CSV file to be analyzed.
  
  elementType - A string element name to match when finding elements to import from the XML
  
  startIndex - Number of elements after the root element to start looking for elements to import
  
  startDepth - How far under the element obtained from startIndex to start looking for elements to import
  
  maxDepth - How far to recurse into import elements when searching for values to import
  
  useElementValues - Whether to use values that are stored as the contents of elements
  
  useAttributeValues - Whether to user values that are stored as attributes
  
  namedValues - True to use values by name, false to use values by position
  
  startColumnIndex - Number of elements after the root element to start looking for the element that contains column names
  
  startColumnDepth - How far under the element obtained from startColumnIndex to start looking for the element that contains column names
  
  columnNameElement - The name of the element that contains column names
  
  log - An Iris logger
  
  Returns:
  
  an import helper class
- getTableSchema
  
  public String getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, File sourceFile, String elementType, boolean bestFit, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, boolean logProgress)
  
  Get an XML String of table schema based on a file and user-provided options. This method is public so other applications, like the Schema Editor, can use it.
  
  Parameters:
  
  namespace - Namespace to use for the new schema.
  
  table - Table name to use for the new schema.
  
  groupingColumn - Optional single column name to mark as a Grouping column.
  
  partitionColumn - Which column to use as the Partitioning column.
  
  sourceName - Name to use for the CSV InputSource.
  
  sourcePartitionColumn - Column name in the source data to use for multi-partition imports
  
  sourceFile - File object pointing to the CSV file to be analyzed.
  
  bestFit - Whether to try to use smaller types (true), like int and float, or just to use bigger types, like long and double.
  
  elementType - A string element name to match when finding elements to import from the XML
  
  startIndex - Number of elements after the root element to start looking for elements to import
  
  startDepth - How far under the element obtained from startIndex to start looking for elements to import
  
  maxDepth - How far to recurse into import elements when searching for values to import
  
  useElementValues - Whether to use values that are stored as the contents of elements
  
  useAttributeValues - Whether to user values that are stored as attributes
  
  namedValues - True to use values by name, false to use values by position
  
  startColumnIndex - Number of elements after the root element to start looking for the element that contains column names
  
  startColumnDepth - How far under the element obtained from startColumnIndex to start looking for the element that contains column names
  
  columnNameElement - The name of the element that contains column names
  
  logProgress - Whether to update the log with progress percentages.
  
  Returns:
  
  A String with the XML of derived table schema and CSV import instructions.
- getTableSchema
  
  public String getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, File sourceFile, String elementType, boolean bestFit, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, boolean logProgress, int maxRows)
  
  Get an XML String of table schema based on a file and user-provided options. This method is public so other applications, like the Schema Editor, can use it.
  
  Parameters:
  
  namespace - Namespace to use for the new schema.
  
  table - Table name to use for the new schema.
  
  groupingColumn - Optional single column name to mark as a Grouping column.
  
  partitionColumn - Which column to use as the Partitioning column.
  
  sourceName - Name to use for the CSV InputSource.
  
  sourcePartitionColumn - Column name in the source data to use for multi-partition imports
  
  sourceFile - File object pointing to the CSV file to be analyzed.
  
  elementType - A string element name to match when finding elements to import from the XML
  
  bestFit - Whether to try to use smaller types (true), like int and float, or just to use bigger types, like long and double.
  
  startIndex - Number of elements after the root element to start looking for elements to import
  
  startDepth - How far under the element obtained from startIndex to start looking for elements to import
  
  maxDepth - How far to recurse into import elements when searching for values to import
  
  useElementValues - Whether to use values that are stored as the contents of elements
  
  useAttributeValues - Whether to user values that are stored as attributes
  
  namedValues - True to use values by name, false to use values by position
  
  startColumnIndex - Number of elements after the root element to start looking for the element that contains column names
  
  startColumnDepth - How far under the element obtained from startColumnIndex to start looking for the element that contains column names
  
  columnNameElement - The name of the element that contains column names
  
  logProgress - Whether to update the log with progress percentages.
  
  maxRows - A maximum number of rows to read, rather than reading the whole file. A value of zero or less means to read the whole file.
  
  Returns:
  
  A String with the XML of derived table schema and CSV import instructions.
- getTableSchema
  
  public String getTableSchema(String namespace, String table, String groupingColumn, String partitionColumn, String sourceName, String sourcePartitionColumn, File sourceFile, String elementType, boolean bestFit, int startIndex, int startDepth, int maxDepth, boolean useElementValues, boolean useAttributeValues, boolean namedValues, int startColumnIndex, int startColumnDepth, String columnNameElement, boolean logProgress, int maxRows, CasingStyle casingStyle, String replacement)
  
  Get an XML String of table schema based on a file and user-provided options. This method is public so other applications, like the Schema Editor, can use it.
  
  Parameters:
  
  namespace - Namespace to use for the new schema.
  
  table - Table name to use for the new schema.
  
  groupingColumn - Optional single column name to mark as a Grouping column.
  
  partitionColumn - Which column to use as the Partitioning column.
  
  sourceName - Name to use for the CSV InputSource.
  
  sourcePartitionColumn - Column name in the source data to use for multi-partition imports
  
  sourceFile - File object pointing to the CSV file to be analyzed.
  
  elementType - A string element name to match when finding elements to import from the XML
  
  bestFit - Whether to try to use smaller types (true), like int and float, or just to use bigger types, like long and double.
  
  startIndex - Number of elements after the root element to start looking for elements to import
  
  startDepth - How far under the element obtained from startIndex to start looking for elements to import
  
  maxDepth - How far to recurse into import elements when searching for values to import
  
  useElementValues - Whether to use values that are stored as the contents of elements
  
  useAttributeValues - Whether to user values that are stored as attributes
  
  namedValues - True to use values by name, false to use values by position
  
  startColumnIndex - Number of elements after the root element to start looking for the element that contains column names
  
  startColumnDepth - How far under the element obtained from startColumnIndex to start looking for the element that contains column names
  
  columnNameElement - The name of the element that contains column names
  
  logProgress - Whether to update the log with progress percentages.
  
  maxRows - A maximum number of rows to read, rather than reading the whole file. A value of zero or less means to read the whole file.
  
  casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
  
  replacement - character, or empty String, to use for replacments of space or hyphen in source column names
  
  Returns:
  
  A String with the XML of derived table schema and CSV import instructions.
- main
  
  public static void main(String... args)
  
  Regular main entry point, used when this module is called from a java command line, or from an IntelliJ run configuration.
  
  Parameters:
  
  args - Varargs list of arguments in Apache CLI format

Class XmlSchemaCreator

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

XmlSchemaCreator

Method Details

getElementTypeNames

getInitializedCsvImporterHelperXml

getTableSchema

getTableSchema

getTableSchema

main