Package com.illumon.iris.utils
Class SchemaCreatorUtils
java.lang.Object
com.illumon.iris.utils.SchemaCreatorUtils
public class SchemaCreatorUtils extends Object
Support classes and methods for use by the CSV and JDBC schema creator utilities.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SchemaCreatorUtils.Converter
An entry for a match pattern, formula template, and source/target data types to be used when creating ImportSource entries for data import. -
Constructor Summary
Constructors Constructor Description SchemaCreatorUtils()
-
Method Summary
Modifier and Type Method Description static void
checkColumnType(SchemaCreatorColumnDetails currentColumn, String value, boolean bestFit)
Tries to find a data type for a column based on data in a String value.static String
checkDestinationFile(String schemaPath, String namespace, String table, boolean replace)
Prepares the output directory for a schema file, and, will fail if a file already exists and output mode is not set to REPLACE.static String
createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn)
Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.static String
createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, CasingStyle casingStyle, String replacement)
Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.static String
createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter)
Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.static String
createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)
Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.static org.jdom2.Document
createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter)
Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.static org.jdom2.Document
createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)
Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.static String
fixColumnName(String originalColumnName, Set<String> usedNames)
Ensures that columns names are valid for use in Irisstatic String
fixColumnName(String originalColumnName, Set<String> usedNames, CasingStyle casing, String replacement)
Ensures that columns names are valid for use in Iris and applies optional casing rulesstatic boolean
hasValue(String value)
static boolean
isNumeric(String str)
Check whether a String contains numeric datastatic void
resolveRemainingColumns(List<String> unresolvedColumns, Map<String,SchemaCreatorColumnDetails> columnProperties, com.fishlib.io.logger.Logger log)
Sets remaining unresolved columns to String, and warns if a column is being left as String because of ambiguous matches for converters.
-
Constructor Details
-
SchemaCreatorUtils
public SchemaCreatorUtils()
-
-
Method Details
-
isNumeric
Check whether a String contains numeric dataModified and enhanced version of Apache Commons isNumeric method. Added floating point, signs, and whitespace support. Will return false for numeric Strings with leading or trailing white space. Strings that will be trimmed during import should be trimmed before being passed to this method.
- Parameters:
str
- The String to check for numeric data- Returns:
- True if the String contains numeric data
-
fixColumnName
Ensures that columns names are valid for use in Iris- Parameters:
originalColumnName
- Column name to be checked for validity and uniquenessusedNames
- List of names already used in the table- Returns:
- legalized, uniqueified, column name
-
fixColumnName
public static String fixColumnName(String originalColumnName, @NotNull Set<String> usedNames, CasingStyle casing, @NotNull String replacement)Ensures that columns names are valid for use in Iris and applies optional casing rules- Parameters:
originalColumnName
- Column name to be checked for validity and uniquenessusedNames
- List of names already used in the tablecasing
- Optional CasingStyle to use when processing source names, if null or None the source name's casing is not modifiedreplacement
- A String to use as a replacement for invalid characters in the source name- Returns:
- legalized, uniqueified, column name, with specified Guava casing applied
-
hasValue
-
checkColumnType
public static void checkColumnType(SchemaCreatorColumnDetails currentColumn, String value, boolean bestFit)Tries to find a data type for a column based on data in a String value. This may be a column for which no type is yet known, or one where a type has been found, but still needs to be validated for additional data entries from the data source. E.g. a float column might need to be upgraded to double, as larger values are found.- Parameters:
currentColumn
- The properties of the column for which the data type is to be set or validatedvalue
- A String value to be parsed for data to determine an appropriate data typebestFit
- Whether (true) or not (false - default) to use the smallest numeric type that will fit the data. When false, floating point values use double, and integer values use long.
-
createTableImportSchema
public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn)Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.- Parameters:
columnNames
- the names of the columns to be created in the schemanamespace
- namespace for the tabletableName
- name for the tablegroupingColumn
- if not null, the column to mark as columnType="Grouping" in the schemapartitionColumn
- the column to mark as columnType="Partitioning" in the schema, defaults to DatesourceName
- if not null, a specific name to use for the ImportSource block, defaults to Iris+typecolumnProperties
- a map of details about the columns to be added, such as data types and formulaetype
- the type for the ImportSource block, typically CSV or JDBCsourcePartitionColumn
- if not null, the name of a source column that maps to the Deephaven partition column- Returns:
- a String with the complete table schema XML.
-
createTableImportSchema
public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, CasingStyle casingStyle, String replacement)Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.- Parameters:
columnNames
- the names of the columns to be created in the schemanamespace
- namespace for the tabletableName
- name for the tablegroupingColumn
- if not null, the column to mark as columnType="Grouping" in the schemapartitionColumn
- the column to mark as columnType="Partitioning" in the schema, defaults to DatesourceName
- if not null, a specific name to use for the ImportSource block, defaults to Iris+typecolumnProperties
- a map of details about the columns to be added, such as data types and formulaetype
- the type for the ImportSource block, typically CSV or JDBCsourcePartitionColumn
- if not null, the name of a source column that maps to the Deephaven partition columncasingStyle
- if not null, CasingStyle to apply to column names - None or null = no change to casingreplacement
- character, or empty String, to use for replacments of space or hyphen in source column names- Returns:
- a String with the complete table schema XML.
-
createTableImportSchema
public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter)Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.- Parameters:
columnNames
- the names of the columns to be created in the schemanamespace
- namespace for the tabletableName
- name for the tablegroupingColumn
- if not null, the column to mark as columnType="Grouping" in the schemapartitionColumn
- the column to mark as columnType="Partitioning" in the schema, defaults to DatesourceName
- if not null, a specific name to use for the ImportSource block, defaults to Iris+typecolumnProperties
- a map of details about the columns to be added, such as data types and formulaetype
- the type for the ImportSource block, typically CSV or JDBCsourcePartitionColumn
- if not null, the name of a source column that maps to the Deephaven partition columninputModel
- style of logger/listener generation (null for none)outPackage
- package name for generated loggers/listeners (required if generating loggers and listeners)loggerClass
- custom logger class name (optional)listenerClass
- custom listener class name (optional)loggerInterfaceClass
- if not null, the name of a class that a generated Logger should implementlogFormat
- for format to use for generated loggers/listenersloggerInputClassName
- if not null, the name of a class to be used as the Logger argumentuseNanos
- whether to use nanoseconds (instead of millis) for logging timestamp valuesmaxError
- maximum number of errors to tolerate during schema inferencearrayDelimiter
- a string to use when parsing string values as arrays- Returns:
- a String with the complete table schema XML
-
createTableImportSchema
public static String createTableImportSchema(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerInputClassName, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.- Parameters:
columnNames
- the names of the columns to be created in the schemanamespace
- namespace for the tabletableName
- name for the tablegroupingColumn
- if not null, the column to mark as columnType="Grouping" in the schemapartitionColumn
- the column to mark as columnType="Partitioning" in the schema, defaults to DatesourceName
- if not null, a specific name to use for the ImportSource block, defaults to Iris+typecolumnProperties
- a map of details about the columns to be added, such as data types and formulaetype
- the type for the ImportSource block, typically CSV or JDBCsourcePartitionColumn
- if not null, the name of a source column that maps to the Deephaven partition columninputModel
- style of logger/listener generation (null for none)outPackage
- package name for generated loggers/listeners (required if generating loggers and listeners)loggerClass
- custom logger class name (optional)listenerClass
- custom listener class name (optional)loggerInterfaceClass
- if not null, the name of a class that a generated Logger should implementlogFormat
- for format to use for generated loggers/listenersloggerInputClassName
- if not null, the name of a class to be used as the Logger argumentuseNanos
- whether to use nanoseconds (instead of millis) for logging timestamp valuesmaxError
- maximum number of errors to tolerate during schema inferencearrayDelimiter
- a string to use when parsing string values as arrayscasingStyle
- if not null, CasingStyle to apply to column names - None or null = no change to casingreplacement
- character, or empty String, to use for replacments of space or hyphen in source column names- Returns:
- a String with the complete table schema XML
-
createTableImportSchemaDocument
public static org.jdom2.Document createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter)Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.- Parameters:
columnNames
- the names of the columns to be created in the schemanamespace
- namespace for the tabletableName
- name for the tablegroupingColumn
- if not null, the column to mark as columnType="Grouping" in the schemapartitionColumn
- the column to mark as columnType="Partitioning" in the schema, defaults to DatesourceName
- if not null, a specific name to use for the ImportSource block, defaults to Iris+typecolumnProperties
- a map of details about the columns to be added, such as data types and formulaetype
- the type for the ImportSource block, typically CSV or JDBCsourcePartitionColumn
- if not null, the name of a source column that maps to the Deephaven partition columninputModel
- style of logger/listener generation (null for none)outPackage
- package name for generated loggers/listeners (required if generating loggers and listeners)loggerClass
- custom logger class name (optional)listenerClass
- custom listener class name (optional)loggerInterfaceClass
- if not null, the name of a class that a generated Logger should implementlogFormat
- for format to use for generated loggers/listenersloggerSystemInputClass
- if not null, the name of a class to be used as the Logger argumentuseNanos
- whether to use nanoseconds (instead of millis) for logging timestamp valuesmaxError
- maximum number of errors to tolerate during schema inferencearrayDelimiter
- a string to use when parsing string values as arrays- Returns:
- a String with the complete table schema XML
-
createTableImportSchemaDocument
public static org.jdom2.Document createTableImportSchemaDocument(List<String> columnNames, String namespace, String tableName, String groupingColumn, String partitionColumn, String sourceName, Map<String,SchemaCreatorColumnDetails> columnProperties, String type, String sourcePartitionColumn, SchemaDescriptor.InputModel inputModel, String outPackage, String loggerClass, String listenerClass, String loggerInterfaceClass, int logFormat, String loggerSystemInputClass, boolean useNanos, int maxError, String arrayDelimiter, CasingStyle casingStyle, String replacement)Creates the schema for a set of columns, including an ImportSource block to map things like column renames, formulae, and default values.- Parameters:
columnNames
- the names of the columns to be created in the schemanamespace
- namespace for the tabletableName
- name for the tablegroupingColumn
- if not null, the column to mark as columnType="Grouping" in the schemapartitionColumn
- the column to mark as columnType="Partitioning" in the schema, defaults to DatesourceName
- if not null, a specific name to use for the ImportSource block, defaults to Iris+typecolumnProperties
- a map of details about the columns to be added, such as data types and formulaetype
- the type for the ImportSource block, typically CSV or JDBCsourcePartitionColumn
- if not null, the name of a source column that maps to the Deephaven partition columninputModel
- style of logger/listener generation (null for none)outPackage
- package name for generated loggers/listeners (required if generating loggers and listeners)loggerClass
- custom logger class name (optional)listenerClass
- custom listener class name (optional)loggerInterfaceClass
- if not null, the name of a class that a generated Logger should implementlogFormat
- for format to use for generated loggers/listenersloggerSystemInputClass
- if not null, the name of a class to be used as the Logger argumentuseNanos
- whether to use nanoseconds (instead of millis) for logging timestamp valuesmaxError
- maximum number of errors to tolerate during schema inferencearrayDelimiter
- a string to use when parsing string values as arrayscasingStyle
- if not null, CasingStyle to apply to column names - None or null = no change to casingreplacement
- character, or empty String, to use for replacments of space or hyphen in source column names- Returns:
- a String with the complete table schema XML
-
resolveRemainingColumns
public static void resolveRemainingColumns(List<String> unresolvedColumns, Map<String,SchemaCreatorColumnDetails> columnProperties, com.fishlib.io.logger.Logger log)Sets remaining unresolved columns to String, and warns if a column is being left as String because of ambiguous matches for converters.- Parameters:
unresolvedColumns
- A list of names of unresolved columns.columnProperties
- A map of column names and schema creator details for the columns.log
- The process logger object owned by the caller.
-
checkDestinationFile
public static String checkDestinationFile(String schemaPath, @NotNull String namespace, @NotNull String table, boolean replace)Prepares the output directory for a schema file, and, will fail if a file already exists and output mode is not set to REPLACE.- Parameters:
schemaPath
- Explicit path to write the target file to, or null if no path was specified.namespace
- Namespace to use for the schema file.table
- Table name to use for the schema file.replace
- When true, overwrite an existing file; when false, throw a runtime exception if a file already exists.- Returns:
- The string path and file name for the output file.
-