Class SchemaCreatorUtils

java.lang.Object
com.illumon.iris.utils.SchemaCreatorUtils

public class SchemaCreatorUtils
extends Object
Support classes and methods for use by the CSV and JDBC schema creator utilities.
  • Constructor Details

  • Method Details

    • isNumeric

      public static boolean isNumeric​(String str)
      Check whether a String contains numeric data

      Modified and enhanced version of Apache Commons isNumeric method. Added floating point, signs, and whitespace support. Will return false for numeric Strings with leading or trailing white space. Strings that will be trimmed during import should be trimmed before being passed to this method.

      Parameters:
      str - The String to check for numeric data
      Returns:
      True if the String contains numeric data
    • fixColumnName

      public static String fixColumnName​(String originalColumnName, @NotNull Set<String> usedNames)
      Ensures that columns names are valid for use in Iris
      Parameters:
      originalColumnName - Column name to be checked for validity and uniqueness
      usedNames - List of names already used in the table
      Returns:
      legalized, uniqueified, column name
    • fixColumnName

      public static String fixColumnName​(String originalColumnName, @NotNull Set<String> usedNames, CasingStyle casing, @NotNull String replacement)
      Ensures that columns names are valid for use in Iris and applies optional casing rules
      Parameters:
      originalColumnName - Column name to be checked for validity and uniqueness
      usedNames - List of names already used in the table
      casing - Optional CasingStyle to use when processing source names, if null or None the source name's casing is not modified
      replacement - A String to use as a replacement for invalid characters in the source name
      Returns:
      legalized, uniqueified, column name, with specified Guava casing applied
    • hasValue

      public static boolean hasValue​(String value)
    • checkColumnType

      public static void checkColumnType​(SchemaCreatorColumnDetails currentColumn, String value, boolean bestFit)
      Tries to find a data type for a column based on data in a String value. This may be a column for which no type is yet known, or one where a type has been found, but still needs to be validated for additional data entries from the data source. E.g. a float column might need to be upgraded to double, as larger values are found.
      Parameters:
      currentColumn - The properties of the column for which the data type is to be set or validated
      value - A String value to be parsed for data to determine an appropriate data type
      bestFit - Whether (true) or not (false - default) to use the smallest numeric type that will fit the data. When false, floating point values use double, and integer values use long.
    • createTableImportSchema

      public static String createTableImportSchema​(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,​SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn)
      Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
      Parameters:
      columnNames - the names of the columns to be created in the schema
      namespace - namespace for the table
      tableName - name for the table
      groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
      partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
      sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
      columnProperties - a map of details about the columns to be added, such as data types and formulae
      type - the type for the ImportSource block, typically CSV or JDBC
      sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
      Returns:
      a String with the complete table schema XML.
    • createTableImportSchema

      public static String createTableImportSchema​(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,​SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, CasingStyle casingStyle, String replacement)
      Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
      Parameters:
      columnNames - the names of the columns to be created in the schema
      namespace - namespace for the table
      tableName - name for the table
      groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
      partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
      sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
      columnProperties - a map of details about the columns to be added, such as data types and formulae
      type - the type for the ImportSource block, typically CSV or JDBC
      sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
      casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
      replacement - character, or empty String, to use for replacments of space or hyphen in source column names
      Returns:
      a String with the complete table schema XML.
    • createTableImportSchema

      public static String createTableImportSchema​(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,​SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter)
      Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
      Parameters:
      columnNames - the names of the columns to be created in the schema
      namespace - namespace for the table
      tableName - name for the table
      groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
      partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
      sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
      columnProperties - a map of details about the columns to be added, such as data types and formulae
      type - the type for the ImportSource block, typically CSV or JDBC
      sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
      inputModel - style of logger/listener generation (null for none)
      outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
      loggerClass - custom logger class name (optional)
      listenerClass - custom listener class name (optional)
      loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
      logFormat - for format to use for generated loggers/listeners
      loggerInputClassName - if not null, the name of a class to be used as the Logger argument
      useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
      maxError - maximum number of errors to tolerate during schema inference
      arrayDelimiter - a string to use when parsing string values as arrays
      Returns:
      a String with the complete table schema XML
    • createTableImportSchema

      public static String createTableImportSchema​(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,​SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)
      Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
      Parameters:
      columnNames - the names of the columns to be created in the schema
      namespace - namespace for the table
      tableName - name for the table
      groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
      partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
      sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
      columnProperties - a map of details about the columns to be added, such as data types and formulae
      type - the type for the ImportSource block, typically CSV or JDBC
      sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
      inputModel - style of logger/listener generation (null for none)
      outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
      loggerClass - custom logger class name (optional)
      listenerClass - custom listener class name (optional)
      loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
      logFormat - for format to use for generated loggers/listeners
      loggerInputClassName - if not null, the name of a class to be used as the Logger argument
      useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
      maxError - maximum number of errors to tolerate during schema inference
      arrayDelimiter - a string to use when parsing string values as arrays
      casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
      replacement - character, or empty String, to use for replacments of space or hyphen in source column names
      Returns:
      a String with the complete table schema XML
    • createTableImportSchemaDocument

      public static org.jdom2.Document createTableImportSchemaDocument​(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,​SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter)
      Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
      Parameters:
      columnNames - the names of the columns to be created in the schema
      namespace - namespace for the table
      tableName - name for the table
      groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
      partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
      sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
      columnProperties - a map of details about the columns to be added, such as data types and formulae
      type - the type for the ImportSource block, typically CSV or JDBC
      sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
      inputModel - style of logger/listener generation (null for none)
      outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
      loggerClass - custom logger class name (optional)
      listenerClass - custom listener class name (optional)
      loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
      logFormat - for format to use for generated loggers/listeners
      loggerSystemInputClass - if not null, the name of a class to be used as the Logger argument
      useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
      maxError - maximum number of errors to tolerate during schema inference
      arrayDelimiter - a string to use when parsing string values as arrays
      Returns:
      a String with the complete table schema XML
    • createTableImportSchemaDocument

      public static org.jdom2.Document createTableImportSchemaDocument​(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,​SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)
      Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.
      Parameters:
      columnNames - the names of the columns to be created in the schema
      namespace - namespace for the table
      tableName - name for the table
      groupingColumn - if not null, the column to mark as columnType="Grouping" in the schema
      partitionColumn - the column to mark as columnType="Partitioning" in the schema, defaults to Date
      sourceName - if not null, a specific name to use for the ImportSource block, defaults to Iris+type
      columnProperties - a map of details about the columns to be added, such as data types and formulae
      type - the type for the ImportSource block, typically CSV or JDBC
      sourcePartitionColumn - if not null, the name of a source column that maps to the Deephaven partition column
      inputModel - style of logger/listener generation (null for none)
      outPackage - package name for generated loggers/listeners (required if generating loggers and listeners)
      loggerClass - custom logger class name (optional)
      listenerClass - custom listener class name (optional)
      loggerInterfaceClass - if not null, the name of a class that a generated Logger should implement
      logFormat - for format to use for generated loggers/listeners
      loggerSystemInputClass - if not null, the name of a class to be used as the Logger argument
      useNanos - whether to use nanoseconds (instead of millis) for logging timestamp values
      maxError - maximum number of errors to tolerate during schema inference
      arrayDelimiter - a string to use when parsing string values as arrays
      casingStyle - if not null, CasingStyle to apply to column names - None or null = no change to casing
      replacement - character, or empty String, to use for replacments of space or hyphen in source column names
      Returns:
      a String with the complete table schema XML
    • resolveRemainingColumns

      public static void resolveRemainingColumns​(List<String> unresolvedColumns, Map<String,​SchemaCreatorColumnDetails> columnProperties, com.fishlib.io.logger.Logger log)
      Sets remaining unresolved columns to String, and warns if a column is being left as String because of ambiguous matches for converters.
      Parameters:
      unresolvedColumns - A list of names of unresolved columns.
      columnProperties - A map of column names and schema creator details for the columns.
      log - The process logger object owned by the caller.
    • checkDestinationFile

      public static String checkDestinationFile​(String schemaPath, @NotNull String namespace, @NotNull String table, boolean replace)
      Prepares the output directory for a schema file, and, will fail if a file already exists and output mode is not set to REPLACE.
      Parameters:
      schemaPath - Explicit path to write the target file to, or null if no path was specified.
      namespace - Namespace to use for the schema file.
      table - Table name to use for the schema file.
      replace - When true, overwrite an existing file; when false, throw a runtime exception if a file already exists.
      Returns:
      The string path and file name for the output file.