Class DataQualityTestCase

java.lang.Object
com.illumon.iris.validation.DataQualityTestCase
All Implemented Interfaces:
DataQualityTestCaseInterface
Direct Known Subclasses:
AuditEventLogValidator, DynamicValidator, NonEmptyValidator, NullValidator, PersistentQueryConfigurationLogV2Validator, PersistentQueryStateLogValidator, ProcessEventLogValidator, QueryOperationPerformanceLogValidator, QueryPerformanceLogValidator, UpdatePerformanceLogValidator

public abstract class DataQualityTestCase
extends Object
implements DataQualityTestCaseInterface
Tests to assure the quality of the data in a table. All data quality tests should inherit from this class.
  • Field Details

  • Constructor Details

    • DataQualityTestCase

      public DataQualityTestCase​(ValidationTableDescription validationTableDescription)
      Create a test case for use in validation.
      Parameters:
      validationTableDescription - description of the table to validate.
  • Method Details

    • getPartitionColumnNames

      protected static String[] getPartitionColumnNames​(Table t)
      Gets the names of the partitioning columns from a table.
      Parameters:
      t - table
      Returns:
      names of the partitioning columns
    • getPartitionColumnNames

      protected static String[] getPartitionColumnNames​(TableDefinition td)
      Gets the names of the partitioning columns from a table definition.
      Parameters:
      td - table definition
      Returns:
      names of the partitioning columns
    • getPartitionTable

      protected static Table getPartitionTable​(Database database, FullTableLocationKey.AggregateTableLocationKey location)
      Gets the table from a database for a given partition.
    • setUp

      public void setUp()
      Description copied from interface: DataQualityTestCaseInterface
      Setup the test.
      Specified by:
      setUp in interface DataQualityTestCaseInterface
    • tearDown

      public void tearDown()
      Description copied from interface: DataQualityTestCaseInterface
      Tear down the test.
      Specified by:
      tearDown in interface DataQualityTestCaseInterface
    • clearMessages

      public void clearMessages()
      Description copied from interface: DataQualityTestCaseInterface
      Empties the list of messages.
      Specified by:
      clearMessages in interface DataQualityTestCaseInterface
    • getMessages

      public List<String> getMessages()
      Description copied from interface: DataQualityTestCaseInterface
      Gets the messages to log.
      Specified by:
      getMessages in interface DataQualityTestCaseInterface
      Returns:
      messages to log
    • message

      public void message​(String m)
      Description copied from interface: DataQualityTestCaseInterface
      Add a message to log.
      Specified by:
      message in interface DataQualityTestCaseInterface
      Parameters:
      m - message to log.
    • clean

      public static Table clean​(Table t, String column, boolean removeNull, boolean removeNaN, boolean removeInf)
      Remove rows containing various values from a table.
      Parameters:
      t - table
      column - column
      removeNull - true to remove rows where column is NULL
      removeNaN - true to remove rows where column is NaN
      removeInf - true to remove rows where column is Inf
      Returns:
      table with rows matching the indicated filters removed
    • clean

      public static Table clean​(Table t, String[] columns, boolean removeNull, boolean removeNaN, boolean removeInf)
      Remove rows containing various values from a table.
      Parameters:
      t - table
      columns - columns
      removeNull - true to remove rows where columns are NULL
      removeNaN - true to remove rows where columns are NaN
      removeInf - true to remove rows where columns are Inf
      Returns:
      table with rows matching the indicated filters removed
    • message

      public void message​(String message, Table t, int nRows)
      Write a message and a the first rows of a table out to the message queue.
      Parameters:
      message - message
      t - table to log
      nRows - number of rows to log
    • message

      public void message​(String message, Table t)
      Write a message and a the first rows of a table out to the message queue.
      Parameters:
      message - message
      t - table to log
    • messageIfNotEmpty

      public void messageIfNotEmpty​(String message, Table t, int nRows)
      Write a message and a the first rows of a table out to the message queue. Output is only generated if the table is not empty.
      Parameters:
      message - message
      t - table to log
      nRows - number of rows to log
    • messageIfNotEmpty

      public void messageIfNotEmpty​(String message, Table t)
      Write a message and a the first rows of a table out to the message queue. Output is only generated if the table is not empty.
      Parameters:
      message - message
      t - table to log
    • fail

      public static void fail​(String message)
      Fails the test
      Parameters:
      message - message describing the failure
    • fail

      public static void fail()
      Fails the test
    • assertTrue

      public static void assertTrue​(String message, boolean value)
      Asserts that value is true.
      Parameters:
      message - message describing the failure
      value - value to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFalse

      public static void assertFalse​(String message, boolean value)
      Asserts that value is false.
      Parameters:
      message - message describing the failure
      value - value to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertEquals

      public static void assertEquals​(String testName, String tableName, String column, Object value, Object target)
      Asserts that a value equals a target.
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      target - target value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNotEquals

      public static void assertNotEquals​(String testName, String tableName, String column, Object value, Object target)
      Asserts that a value does not equal a target.
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      target - target value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertInRange

      public static <T extends Comparable<T>> void assertInRange​(String testName, String tableName, String column, T value, T min, T max)
      Asserts that a value is in the inclusive range [min,max].
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      min - minimum value for the value range
      max - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertInRange

      public static void assertInRange​(String testName, String tableName, String column, long value, long min, long max)
      Asserts that a value is in the inclusive range [min,max].
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      min - minimum value for the value range
      max - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertInRange

      public static void assertInRange​(String testName, String tableName, String column, double value, double min, double max)
      Asserts that a value is in the inclusive range [min,max].
      Parameters:
      testName - test name
      tableName - table name
      column - column name
      value - to test
      min - minimum value for the value range
      max - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertSize

      public void assertSize​(Table t, long min, long max)
      Asserts the number of rows in the table is in the inclusive range [min,max].
      Parameters:
      t - table to validate
      min - minimum number of table rows
      max - maximum number of table rows
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertColumnType

      public void assertColumnType​(Table t, String column, Class type)
      Asserts that a column is of the expected type.
      Parameters:
      t - table to validate
      column - column to validate
      type - expected type
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertColumnGrouped

      public void assertColumnGrouped​(Table t, String column)
      Asserts that a column is grouped.
      Parameters:
      t - table to validate
      column - column to validate
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesEqual

      public void assertAllValuesEqual​(Table t, String column)
      Asserts that a column only contains a single value.
      Parameters:
      t - table to validate
      column - column to validate
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesNotEqual

      public void assertAllValuesNotEqual​(Table t, String column)
      Asserts that a column does not contain repeated values.
      Parameters:
      t - table to validate
      column - column to validate
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesEqual

      public void assertAllValuesEqual​(Table t, String column, Object value)
      Asserts that a column only contains a single value.
      Parameters:
      t - table to validate
      column - column to validate
      value - make sure the column only contains this value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesNotEqual

      public void assertAllValuesNotEqual​(Table t, String column, Object value)
      Asserts that a column does not contain a specified value.
      Parameters:
      t - table to validate
      column - column to validate
      value - make sure the column does not contain this value
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertEqual

      public void assertEqual​(Table t, String column1, String column2)
      Asserts that all values in column1 are equal to all values in column2.
      Parameters:
      t - table to validate
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNotEqual

      public void assertNotEqual​(Table t, String column1, String column2)
      Asserts that all values in column1 are not equal to all values in column2.
      Parameters:
      t - table to validate
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertLess

      public void assertLess​(Table t, String column1, String column2)
      Asserts that all values in column1 are less than all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertLessEqual

      public void assertLessEqual​(Table t, String column1, String column2)
      Asserts that all values in column1 are less than or equal to all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertGreater

      public void assertGreater​(Table t, String column1, String column2)
      Asserts that all values in column1 are greater than all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertGreaterEqual

      public void assertGreaterEqual​(Table t, String column1, String column2)
      Asserts that all values in column1 are greater than or equal to all values in column2.
      Parameters:
      t - Table
      column1 - column to test
      column2 - column to test
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNumberDistinctValues

      public void assertNumberDistinctValues​(Table t, String columns, long min, long max)
      Asserts the number of distinct values is in the inclusive range [min,max].
      Parameters:
      t - Table
      columns - comma separated list of columns to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInDistinctSet

      public void assertAllValuesInDistinctSet​(Table t, String column, Object... expectedValues)
      Asserts that all values in a column are present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      expectedValues - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInArrayInDistinctSet

      public void assertAllValuesInArrayInDistinctSet​(Table t, String column, Object... expectedValues)
      Asserts that all values contained in arrays in a column are present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      expectedValues - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInStringSetInDistinctSet

      public void assertAllValuesInStringSetInDistinctSet​(Table t, String column, Object... expectedValues)
      Asserts that all values contained in string sets in a column are present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      expectedValues - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesNotInDistinctSet

      public void assertAllValuesNotInDistinctSet​(Table t, String column, Object... values)
      Asserts that all values in a column are not present in a set of values.
      Parameters:
      t - Table
      column - column to test
      values - set of values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInArrayNotInDistinctSet

      public void assertAllValuesInArrayNotInDistinctSet​(Table t, String column, Object... values)
      Asserts that all values contained in arrays in a column are not present in a set of values.
      Parameters:
      t - Table
      column - column to test
      values - set of values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesInStringSetNotInDistinctSet

      public void assertAllValuesInStringSetNotInDistinctSet​(Table t, String column, Object... values)
      Asserts that all values contained in string sets in a column are not present in a set of expected values.
      Parameters:
      t - Table
      column - column to test
      values - set of expected values
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracWhere

      public void assertFracWhere​(Table t, String filter, double min, double max)
      Asserts the fraction of a table's rows matching the provided filter falls within a defined range.
      Parameters:
      t - table to validate
      filter - filter
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracNull

      public void assertFracNull​(Table t, String column, double min, double max)
      Asserts that the fraction of NULL values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertNotNull

      public void assertNotNull​(Table t, String... columns)
    • assertFracNan

      public void assertFracNan​(Table t, String column, double min, double max)
      Asserts that the fraction of NaN values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracInf

      public void assertFracInf​(Table t, String column, double min, double max)
      Asserts that the fraction of infinite values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracZero

      public void assertFracZero​(Table t, String column, double min, double max)
      Asserts that the fraction of zero values is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertFracValuesBetween

      public void assertFracValuesBetween​(Table t, String column, Comparable minValue, Comparable maxValue, double min, double max)
      Asserts that the fraction of values between [minValue,maxValue] is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      minValue - minimum value for the value range
      maxValue - maximum value for the value range
      min - minimum fraction of values remaining after the filter. Between 0 and 1.
      max - maximum fraction of values remaining after the filter. Between 0 and 1.
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAllValuesBetween

      public void assertAllValuesBetween​(Table t, String column, Comparable minValue, Comparable maxValue)
      Asserts that all values between [minValue,maxValue] is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      minValue - minimum value for the value range
      maxValue - maximum value for the value range
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertMin

      public void assertMin​(Table t, String column, Comparable min, Comparable max, String... groupByColumns)
      Asserts that the minimum value of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertMax

      public void assertMax​(Table t, String column, Comparable min, Comparable max, String... groupByColumns)
      Asserts that the maximum value of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAvg

      public void assertAvg​(Table t, String column, double min, double max, String... groupByColumns)
      Asserts that the average of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertStd

      public void assertStd​(Table t, String column, double min, double max, String... groupByColumns)
      Asserts that the standard deviation of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertPercentile

      public void assertPercentile​(Table t, String column, double percentile, double min, double max, String... groupByColumns)
      Asserts that the defined percentile of the column is in the inclusive range [min,max].
      Parameters:
      t - Table
      column - column to test
      percentile - percentile of the column to test. Between 0 and 1.
      min - minimum value for the value range
      max - maximum value for the value range
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertAscending

      public void assertAscending​(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically increasing values. Consecutive values within a group must be equal or increasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertStrictlyAscending

      public void assertStrictlyAscending​(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically strictly increasing values. Consecutive values within a group must be increasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertDescending

      public void assertDescending​(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically decreasing values. Consecutive values within a group must be equal or decreasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertStrictlyDescending

      public void assertStrictlyDescending​(Table t, String column, String... groupByColumns)
      Asserts that sub-groups of a column have monotonically strictly decreasing values. Consecutive values within a group must be decreasing.
      Parameters:
      t - Table
      column - column to test
      groupByColumns - columns delineating groups for testing monotonicity
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertExpectedTableSize

      public void assertExpectedTableSize​(int partitionsBefore, int partitionsAfter, double min, double max)
      Asserts the number of rows in the table is within a specified fraction of the expected size, determined by looking at the other tables in the database. For example min=0.8 and max=1.2 would assert that the table size is within 80% and 120% of the typical table size.
      Parameters:
      partitionsBefore - number of partitions before the current partition to compute expectation
      partitionsAfter - number of partitions after the current partition to compute expectation
      min - minimum fraction of expected rows in this table
      max - maximum fraction of expected rows in this table
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • getActualTableSize

      protected static long getActualTableSize​(ValidationTableDescription validationTableDescription)
      Gets the size of table.
      Parameters:
      validationTableDescription - description of the table to validate
      Returns:
      size of the table for the partition.
    • getActualTableSize

      protected static long getActualTableSize​(Database database, String namespace, String tableName, String partition)
      Gets the size of table.
      Parameters:
      database - database
      namespace - namespace for the database
      tableName - name of the table
      partition - to get the expected size for.
      Returns:
      size of the table for the partition.
    • getExpectedTableSize

      protected static long getExpectedTableSize​(ValidationTableDescription validationTableDescription, int partitionsBefore, int partitionsAfter)
      Gets the size of table we expect by looking at other tables in the database.
      Parameters:
      validationTableDescription - description of the table to validate
      partitionsBefore - number of partitions before the current partition to use as a baseline
      partitionsAfter - number of partitions after the current partition to use as a baseline
      Returns:
      expected size of the table for the partition.
    • getExpectedTableSize

      protected static long getExpectedTableSize​(Database database, String namespace, String tableName, String partition, int partitionsBefore, int partitionsAfter)
      Gets the size of table we expect by looking at other tables in the database.
      Parameters:
      database - database
      namespace - namespace for the database
      tableName - name of the table
      partition - to get the expected size for.
      partitionsBefore - number of partitions before the current partition to use as a baseline
      partitionsAfter - number of partitions after the current partition to use as a baseline
      Returns:
      expected size of the table for the partition.
    • assertCountEqual

      public void assertCountEqual​(Table t, String column, Object value1, Object value2)
      Asserts that a column contains the same number of rows for two given values.
      Parameters:
      t - table to validate
      column - column to validate
      value1 - make sure the column has the same number of value1 and value2 entries
      value2 - make sure the column has the same number of value1 and value2 entries
      Throws:
      DataQualityTestCase.AssertionFailed - the assertion failed
    • assertColumnTypes

      public void assertColumnTypes​(Table t)
      Asserts that the column types in the table match the column types in the schema.
      Parameters:
      t - table to validate