Class DataQualityTestCase

java.lang.Object
com.illumon.iris.validation.DataQualityTestCase
All Implemented Interfaces:
DataQualityTestCaseInterface
Direct Known Subclasses:
AuditEventLogValidator, DynamicValidator, NonEmptyValidator, NullValidator, PersistentQueryConfigurationLogV2Validator, PersistentQueryStateLogValidator, ProcessEventLogValidator, QueryOperationPerformanceLogValidator, QueryPerformanceLogValidator, UpdatePerformanceLogValidator

public abstract class DataQualityTestCase extends Object implements DataQualityTestCaseInterface
Tests to assure the quality of the data in a table. All data quality tests should inherit from this class.
  • Field Details

    • validationTableDescription

      protected final ValidationTableDescription validationTableDescription
      Description of the table to validate.
    • table

      protected final Table table
      Table to validate.
  • Constructor Details

    • DataQualityTestCase

      public DataQualityTestCase(ValidationTableDescription validationTableDescription)
      Create a test case for use in validation.
      Parameters:
      validationTableDescription - description of the table to validate.
  • Method Details

    • getPartitionColumnNames

      protected static String[] getPartitionColumnNames(Table t)
      Gets the names of the partitioning columns from a table.
      Parameters:
      t - table
      Returns:
      names of the partitioning columns
    • getPartitionColumnNames

      protected static String[] getPartitionColumnNames(TableDefinition td)
      Gets the names of the partitioning columns from a table definition.
      Parameters:
      td - table definition
      Returns:
      names of the partitioning columns
    • getPartitionTable

      protected static Table getPartitionTable(Database database, FullTableLocationKey.AggregateTableLocationKey location)
      Gets the table from a database for a given partition.
    • setUp

      public void setUp()
      Description copied from interface: DataQualityTestCaseInterface
      Setup the test.
      Specified by:
      setUp in interface DataQualityTestCaseInterface
    • tearDown

      public void tearDown()
      Description copied from interface: DataQualityTestCaseInterface
      Tear down the test.
      Specified by:
      tearDown in interface DataQualityTestCaseInterface
    • clearMessages

      public void clearMessages()
      Description copied from interface: DataQualityTestCaseInterface
      Empties the list of messages.
      Specified by:
      clearMessages in interface DataQualityTestCaseInterface
    • getMessages

      public List<String> getMessages()
      Description copied from interface: DataQualityTestCaseInterface
      Gets the messages to log.
      Specified by:
      getMessages in interface DataQualityTestCaseInterface
      Returns:
      messages to log
    • message

      public void message(String m)
      Description copied from interface: DataQualityTestCaseInterface
      Add a message to log.
      Specified by:
      message in interface DataQualityTestCaseInterface
      Parameters:
      m - message to log.
    • clean

      public static Table clean(Table t, String column, boolean removeNull, boolean removeNaN, boolean removeInf)
      Remove rows containing various values from a table.
      Parameters:
      t - table
      column - column
      removeNull - true to remove rows where column is NULL
      removeNaN - true to remove rows where column is NaN
      removeInf - true to remove rows where column is Inf
      Returns:
      table with rows matching the indicated filters removed
    • clean

      public static Table clean(Table t, String[] columns, boolean removeNull, boolean removeNaN, boolean removeInf)
      Remove rows containing various values from a table.
      Parameters:
      t - table
      columns - columns
      removeNull - true to remove rows where columns are NULL
      removeNaN - true to remove rows where columns are NaN
      removeInf - true to remove rows where columns are Inf
      Returns:
      table with rows matching the indicated filters removed
    • message

      public void message(String message, Table t, int nRows)
      Write a message and a the first rows of a table out to the message queue.
      Parameters:
      message - message
      t - table to log
      nRows - number of rows to log
    • message

      public void message(String message, Table t)
      Write a message and a the first rows of a table out to the message queue.
      Parameters:
      message - message
      t - table to log
    • messageIfNotEmpty

      public void messageIfNotEmpty(String message, Table t, int nRows)
      Write a message and a the first rows of a table out to the message queue. Output is only generated if the table is not empty.
      Parameters:
      message - message
      t - table to log
      nRows - number of rows to log
    • messageIfNotEmpty

      public void messageIfNotEmpty(String message, Table t)
      Write a message and a the first rows of a table out to the message queue. Output is only generated if the table is not empty.
      Parameters:
      message - message
      t - table to log
    • fail

      public static void fail(String message)
      Fails the test
      Parameters:
      message - message describing the failure
    • fail

      public static void fail()
      Fails the test
    • assertTrue

      public static void assertTrue(String message, boolean value)
      Asserts that value is true.
      Parameters:
      message - message describing the failure
      value - value to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFalse

      public static void assertFalse(String message, boolean value)
      Asserts that value is false.
      Parameters:
      message - message describing the failure
      value - value to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertEquals

      public static void assertEquals(String testName, String tableName, String column, Object value, Object target)
      Asserts that a value equals a target.
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      target - target value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNotEquals

      public static void assertNotEquals(String testName, String tableName, String column, Object value, Object target)
      Asserts that a value does not equal a target.
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      target - target value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertInRange

      public static <T extends Comparable<T>> void assertInRange(String testName, String tableName, String column, T value, T min, T max)
      Asserts that a value is in the inclusive range [min,max].
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      min - minimum value for the value range
      max - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertInRange

      public static void assertInRange(String testName, String tableName, String column, long value, long min, long max)
      Asserts that a value is in the inclusive range [min,max].
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      min - minimum value for the value range
      max - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertInRange

      public static void assertInRange(String testName, String tableName, String column, double value, double min, double max)
      Asserts that a value is in the inclusive range [min,max].
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      min - minimum value for the value range
      max - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertSize

      public void assertSize(Table t, long min, long max)
      Asserts the number of rows in the table is in the inclusive range [min,max].
      Parameters:
      t - table to validate
      min - minimum number of table rows
      max - maximum number of table rows
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertColumnType

      public void assertColumnType(Table t, String column, Class type)
      Asserts that a column is of the expected type.
      Parameters:
      t - table to validate
      column - column to validate
      type - expected type
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertColumnGrouped

      public void assertColumnGrouped(Table t, String column)
      Asserts that a column is grouped.
      Parameters:
      t - table to validate
      column - column to validate
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesEqual

      public void assertAllValuesEqual(Table t, String column)
      Asserts that a column only contains a single value.
      Parameters:
      t - table to validate
      column - column to validate
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesNotEqual

      public void assertAllValuesNotEqual(Table t, String column)
      Asserts that a column does not contain repeated values.
      Parameters:
      t - table to validate
      column - column to validate
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesEqual

      public void assertAllValuesEqual(Table t, String column, Object value)
      Asserts that a column only contains a single value.
      Parameters:
      t - table to validate
      column - column to validate
      value - make sure the column only contains this value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesNotEqual

      public void assertAllValuesNotEqual(Table t, String column, Object value)
      Asserts that a column does not contain a specified value.
      Parameters:
      t - table to validate
      column - column to validate
      value - make sure the column does not contain this value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertEqual

      public void assertEqual(Table t, String column1, String column2)
      Asserts that all values in column1 are equal to all values in column2.
      Parameters:
      t - table to validate
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNotEqual

      public void assertNotEqual(Table t, String column1, String column2)
      Asserts that all values in column1 are not equal to all values in column2.
      Parameters:
      t - table to validate
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertLess

      public void assertLess(Table t, String column1, String column2)
      Asserts that all values in column1 are less than all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertLessEqual

      public void assertLessEqual(Table t, String column1, String column2)
      Asserts that all values in column1 are less than or equal to all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertGreater

      public void assertGreater(Table t, String column1, String column2)
      Asserts that all values in column1 are greater than all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertGreaterEqual

      public void assertGreaterEqual(Table t, String column1, String column2)
      Asserts that all values in column1 are greater than or equal to all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNumberDistinctValues

      public void assertNumberDistinctValues(Table t, String columns, long min, long max)
      Asserts the number of distinct values is in the inclusive range [min,max].
      Parameters:
      t - Table
      columns - comma separated list of columns to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInDistinctSet

      public void assertAllValuesInDistinctSet(Table t, String column, Object... expectedValues)
      Asserts that all values in a column are present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      expectedValues - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInArrayInDistinctSet

      public void assertAllValuesInArrayInDistinctSet(Table t, String column, Object... expectedValues)
      Asserts that all values contained in arrays in a column are present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      expectedValues - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInStringSetInDistinctSet

      public void assertAllValuesInStringSetInDistinctSet(Table t, String column, Object... expectedValues)
      Asserts that all values contained in string sets in a column are present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      expectedValues - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesNotInDistinctSet

      public void assertAllValuesNotInDistinctSet(Table t, String column, Object... values)
      Asserts that all values in a column are not present in a set of values.
      Parameters:
      t - Table
      column - column to test
      values - set of values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInArrayNotInDistinctSet

      public void assertAllValuesInArrayNotInDistinctSet(Table t, String column, Object... values)
      Asserts that all values contained in arrays in a column are not present in a set of values.
      Parameters:
      t - Table
      column - column to test
      values - set of values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInStringSetNotInDistinctSet

      public void assertAllValuesInStringSetNotInDistinctSet(Table t, String column, Object... values)
      Asserts that all values contained in string sets in a column are not present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      values - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracWhere

      public void assertFracWhere(Table t, String filter, double min, double max)
      Asserts the fraction of a table's rows matching the provided filter falls within a defined range.
      Parameters:
      t - table to validate
      filter - filter
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracNull

      public void assertFracNull(Table t, String column, double min, double max)
      Asserts that the fraction of NULL values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNotNull

      public void assertNotNull(Table t, String... columns)
    • assertFracNan

      public void assertFracNan(Table t, String column, double min, double max)
      Asserts that the fraction of NaN values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracInf

      public void assertFracInf(Table t, String column, double min, double max)
      Asserts that the fraction of infinite values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracZero

      public void assertFracZero(Table t, String column, double min, double max)
      Asserts that the fraction of zero values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracValuesBetween

      public void assertFracValuesBetween(Table t, String column, Comparable minValue, Comparable maxValue, double min, double max)
      Asserts that the fraction of values between [minValue,maxValue] is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      minValue - minimum value for the value range
      maxValue - maximum value for the value range
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesBetween

      public void assertAllValuesBetween(Table t, String column, Comparable minValue, Comparable maxValue)
      Asserts that all values between [minValue,maxValue] is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      minValue - minimum value for the value range
      maxValue - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertMin

      public void assertMin(Table t, String column, Comparable min, Comparable max, String... groupByColumns)
      Asserts that the minimum value of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertMax

      public void assertMax(Table t, String column, Comparable min, Comparable max, String... groupByColumns)
      Asserts that the maximum value of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAvg

      public void assertAvg(Table t, String column, double min, double max, String... groupByColumns)
      Asserts that the average of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertStd

      public void assertStd(Table t, String column, double min, double max, String... groupByColumns)
      Asserts that the standard deviation of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertPercentile

      public void assertPercentile(Table t, String column, double percentile, double min, double max, String... groupByColumns)
      Asserts that the defined percentile of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      percentile - percentile of the column to test. Between 0 and 1.
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAscending

      public void assertAscending(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically increasing values. Consecutive values within a group must be equal or increasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertStrictlyAscending

      public void assertStrictlyAscending(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically strictly increasing values. Consecutive values within a group must be increasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertDescending

      public void assertDescending(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically decreasing values. Consecutive values within a group must be equal or decreasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertStrictlyDescending

      public void assertStrictlyDescending(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically strictly decreasing values. Consecutive values within a group must be decreasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertExpectedTableSize

      public void assertExpectedTableSize(int partitionsBefore, int partitionsAfter, double min, double max)
      Asserts the number of rows in the table is within a specified fraction of the expected size, determined by looking at the other tables in the database. For example min=0.8 and max=1.2 would assert that the table size is within 80% and 120% of the typical table size.
      Parameters:
      partitionsBefore - number of partitions before the current partition to compute expectation
      partitionsAfter - number of partitions after the current partition to compute expectation
      min - minimum fraction of expected rows in this table
      max - maximum fraction of expected rows in this table
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • getActualTableSize

      protected static long getActualTableSize(ValidationTableDescription validationTableDescription)
      Gets the size of table.
      Parameters:
      validationTableDescription - description of the table to validate
      Returns:
      size of the table for the partition.
    • getActualTableSize

      protected static long getActualTableSize(Database database, String namespace, String tableName, String partition)
      Gets the size of table.
      Parameters:
      database - database
      namespace - namespace for the database
      tableName - name of the table
      partition - to get the expected size for.
      Returns:
      size of the table for the partition.
    • getExpectedTableSize

      protected static long getExpectedTableSize(ValidationTableDescription validationTableDescription, int partitionsBefore, int partitionsAfter)
      Gets the size of table we expect by looking at other tables in the database.
      Parameters:
      validationTableDescription - description of the table to validate
      partitionsBefore - number of partitions before the current partition to use as a baseline
      partitionsAfter - number of partitions after the current partition to use as a baseline
      Returns:
      expected size of the table for the partition.
    • getExpectedTableSize

      protected static long getExpectedTableSize(Database database, String namespace, String tableName, String partition, int partitionsBefore, int partitionsAfter)
      Gets the size of table we expect by looking at other tables in the database.
      Parameters:
      database - database
      namespace - namespace for the database
      tableName - name of the table
      partition - to get the expected size for.
      partitionsBefore - number of partitions before the current partition to use as a baseline
      partitionsAfter - number of partitions after the current partition to use as a baseline
      Returns:
      expected size of the table for the partition.
    • assertCountEqual

      public void assertCountEqual(Table t, String column, Object value1, Object value2)
      Asserts that a column contains the same number of rows for two given values.
      Parameters:
      t - table to validate
      column - column to validate
      value1 - make sure the column has the same number of value1 and value2 entries
      value2 - make sure the column has the same number of value1 and value2 entries
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertColumnTypes

      public void assertColumnTypes(Table t)
      Asserts that the column types in the table match the column types in the schema.
      Parameters:
      t - table to validate