com.illumon.iris.utils.SchemaCreatorUtils

public class SchemaCreatorUtils
extends Object

Support classes and methods for use by the CSV and JDBC schema creator utilities.

Nested Class Summary

Nested Classes
Modifier and Type	Class	Description
`static class`	`SchemaCreatorUtils.Converter`	An entry for a match pattern, formula template, and source/target data types to be used when creating ImportSource entries for data import.

Constructor Summary

Constructors

Constructor Description

SchemaCreatorUtils()

Method Summary

Modifier and Type	Method	Description
`static void`	`checkColumnType(SchemaCreatorColumnDetails currentColumn, String value, boolean bestFit)`	Tries to find a data type for a column based on data in a String value.
`static String`	`checkDestinationFile(String schemaPath, String namespace, String table, boolean replace)`	Prepares the output directory for a schema file, and, will fail if a file already exists and output mode is not set to REPLACE.
`static String`	`createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn)`	Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
`static String`	`createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, CasingStyle casingStyle, String replacement)`	Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
`static String`	`createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter)`	Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
`static String`	createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)	Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
`static org.jdom2.Document`	createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter)	Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
`static org.jdom2.Document`	createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)	Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
`static String`	`fixColumnName(String originalColumnName, Set<String> usedNames)`	Ensures that columns names are valid for use in Iris
`static String`	`fixColumnName(String originalColumnName, Set<String> usedNames, CasingStyle casing, String replacement)`	Ensures that columns names are valid for use in Iris and applies optional casing rules
`static boolean`	`hasValue(String value)`
`static boolean`	`isNumeric(String str)`	Check whether a String contains numeric data
`static void`	`resolveRemainingColumns(List<String> unresolvedColumns, Map<String,SchemaCreatorColumnDetails> columnProperties, com.fishlib.io.logger.Logger log)`	Sets remaining unresolved columns to String, and warns if a column is being left as String because of ambiguous matches for converters.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SchemaCreatorUtils
  
  public SchemaCreatorUtils()
Method Details
- isNumeric
  
  public static boolean isNumeric(String str)
  
  Check whether a String contains numeric data
  Modified and enhanced version of Apache Commons isNumeric method. Added floating point, signs, and whitespace support. Will return false for numeric Strings with leading or trailing white space. Strings that will be trimmed during import should be trimmed before being passed to this method.
  
  Parameters:
  
  str - The String to check for numeric data
  
  Returns:
  
  True if the String contains numeric data
- fixColumnName
  
  public static String fixColumnName(String originalColumnName, @NotNull Set<String> usedNames)
  
  Ensures that columns names are valid for use in Iris
  
  Parameters:
  
  originalColumnName - Column name to be checked for validity and uniqueness
  
  usedNames - List of names already used in the table
  
  Returns:
  
  legalized, uniqueified, column name
- fixColumnName
  
  public static String fixColumnName(String originalColumnName, @NotNull Set<String> usedNames, CasingStyle casing, @NotNull String replacement)
  
  Ensures that columns names are valid for use in Iris and applies optional casing rules
  
  Parameters:
  
  originalColumnName - Column name to be checked for validity and uniqueness
  
  usedNames - List of names already used in the table
  
  casing - Optional CasingStyle to use when processing source names, if null or None the source name's casing is not modified
  
  replacement - A String to use as a replacement for invalid characters in the source name
  
  Returns:
  
  legalized, uniqueified, column name, with specified Guava casing applied
- hasValue
  
  public static boolean hasValue(String value)
- checkColumnType
  
  public static void checkColumnType(SchemaCreatorColumnDetails currentColumn, String value, boolean bestFit)
  
  Tries to find a data type for a column based on data in a String value. This may be a column for which no type is yet known, or one where a type has been found, but still needs to be validated for additional data entries from the data source. E.g. a float column might need to be upgraded to double, as larger values are found.
  
  Parameters:
  
  currentColumn - The properties of the column for which the data type is to be set or validated
  
  value - A String value to be parsed for data to determine an appropriate data type
  
  bestFit - Whether (true) or not (false - default) to use the smallest numeric type that will fit the data. When false, floating point values use double, and integer values use long.
- createTableImportSchema
  
  public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn)
  
  Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
  
  Parameters:
  
  columnNames - the names of the columns to be created in the schema
  
  namespace - namespace for the table
  
  tableName - name for the table
  
  groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
  
  partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
  
  sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
  
  columnProperties - a map of details about the columns to be added, such as data types and formulae
  
  type - the type for the ImportSource block, typically CSV or JDBC
  
  sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
  
  Returns:
  
  a String with the complete table schema XML.
- createTableImportSchema
  
  public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, CasingStyle casingStyle, String replacement)
  
  Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
  
  Parameters:
  
  columnNames - the names of the columns to be created in the schema
  
  namespace - namespace for the table
  
  tableName - name for the table
  
  groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
  
  partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
  
  sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
  
  columnProperties - a map of details about the columns to be added, such as data types and formulae
  
  type - the type for the ImportSource block, typically CSV or JDBC
  
  sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
  
  casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
  
  replacement - character, or empty String, to use for replacments of space or hyphen in source column names
  
  Returns:
  
  a String with the complete table schema XML.
- createTableImportSchema
  
  public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter)
  
  Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
  
  Parameters:
  
  columnNames - the names of the columns to be created in the schema
  
  namespace - namespace for the table
  
  tableName - name for the table
  
  groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
  
  partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
  
  sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
  
  columnProperties - a map of details about the columns to be added, such as data types and formulae
  
  type - the type for the ImportSource block, typically CSV or JDBC
  
  sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
  
  inputModel - style of logger/listener generation (null for none)
  
  outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
  
  loggerClass - custom logger class name (optional)
  
  listenerClass - custom listener class name (optional)
  
  loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
  
  logFormat - for format to use for generated loggers/listeners
  
  loggerInputClassName - if not null, the name of a class to be used as the Logger argument
  
  useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
  
  maxError - maximum number of errors to tolerate during schema inference
  
  arrayDelimiter - a string to use when parsing string values as arrays
  
  Returns:
  
  a String with the complete table schema XML
- createTableImportSchema
  
  public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)
  
  Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
  
  Parameters:
  
  columnNames - the names of the columns to be created in the schema
  
  namespace - namespace for the table
  
  tableName - name for the table
  
  groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
  
  partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
  
  sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
  
  columnProperties - a map of details about the columns to be added, such as data types and formulae
  
  type - the type for the ImportSource block, typically CSV or JDBC
  
  sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
  
  inputModel - style of logger/listener generation (null for none)
  
  outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
  
  loggerClass - custom logger class name (optional)
  
  listenerClass - custom listener class name (optional)
  
  loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
  
  logFormat - for format to use for generated loggers/listeners
  
  loggerInputClassName - if not null, the name of a class to be used as the Logger argument
  
  useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
  
  maxError - maximum number of errors to tolerate during schema inference
  
  arrayDelimiter - a string to use when parsing string values as arrays
  
  casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
  
  replacement - character, or empty String, to use for replacments of space or hyphen in source column names
  
  Returns:
  
  a String with the complete table schema XML
- createTableImportSchemaDocument
  
  public static org.jdom2.Document createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter)
  
  Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
  
  Parameters:
  
  columnNames - the names of the columns to be created in the schema
  
  namespace - namespace for the table
  
  tableName - name for the table
  
  groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
  
  partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
  
  sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
  
  columnProperties - a map of details about the columns to be added, such as data types and formulae
  
  type - the type for the ImportSource block, typically CSV or JDBC
  
  sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
  
  inputModel - style of logger/listener generation (null for none)
  
  outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
  
  loggerClass - custom logger class name (optional)
  
  listenerClass - custom listener class name (optional)
  
  loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
  
  logFormat - for format to use for generated loggers/listeners
  
  loggerSystemInputClass - if not null, the name of a class to be used as the Logger argument
  
  useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
  
  maxError - maximum number of errors to tolerate during schema inference
  
  arrayDelimiter - a string to use when parsing string values as arrays
  
  Returns:
  
  a String with the complete table schema XML
- createTableImportSchemaDocument
  
  public static org.jdom2.Document createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)
  
  Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
  
  Parameters:
  
  columnNames - the names of the columns to be created in the schema
  
  namespace - namespace for the table
  
  tableName - name for the table
  
  groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
  
  partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
  
  sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
  
  columnProperties - a map of details about the columns to be added, such as data types and formulae
  
  type - the type for the ImportSource block, typically CSV or JDBC
  
  sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
  
  inputModel - style of logger/listener generation (null for none)
  
  outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
  
  loggerClass - custom logger class name (optional)
  
  listenerClass - custom listener class name (optional)
  
  loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
  
  logFormat - for format to use for generated loggers/listeners
  
  loggerSystemInputClass - if not null, the name of a class to be used as the Logger argument
  
  useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
  
  maxError - maximum number of errors to tolerate during schema inference
  
  arrayDelimiter - a string to use when parsing string values as arrays
  
  casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
  
  replacement - character, or empty String, to use for replacments of space or hyphen in source column names
  
  Returns:
  
  a String with the complete table schema XML
- resolveRemainingColumns
  
  public static void resolveRemainingColumns(List<String> unresolvedColumns, Map<String,SchemaCreatorColumnDetails> columnProperties, com.fishlib.io.logger.Logger log)
  
  Sets remaining unresolved columns to String, and warns if a column is being left as String because of ambiguous matches for converters.
  
  Parameters:
  
  unresolvedColumns - A list of names of unresolved columns.
  
  columnProperties - A map of column names and schema creator details for the columns.
  
  log - The process logger object owned by the caller.
- checkDestinationFile
  
  public static String checkDestinationFile(String schemaPath, @NotNull String namespace, @NotNull String table, boolean replace)
  
  Prepares the output directory for a schema file, and, will fail if a file already exists and output mode is not set to REPLACE.
  
  Parameters:
  
  schemaPath - Explicit path to write the target file to, or null if no path was specified.
  
  namespace - Namespace to use for the schema file.
  
  table - Table name to use for the schema file.
  
  replace - When true, overwrite an existing file; when false, throw a runtime exception if a file already exists.
  
  Returns:
  
  The string path and file name for the output file.

Class SchemaCreatorUtils

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Method Details