com.illumon.iris.db.v2.by.ApproximatePercentile

public class ApproximatePercentile extends Object

Generate approximate percentile aggregations of a table.

The underlying data structure and algorithm used is a t-digest as described at https://github.com/tdunning/t-digest, which has a "compression" parameter that determines the size of the retained values. From the t-digest documentation, "100 is a common value for normal uses. 1000 is extremely large. The number of centroids retained will be a smallish (usually less than 10) multiple of this number."

All input columns are cast to doubles and the result columns are doubles.

The input table must be add only, if modifications or removals take place; then an UnsupportedOperationException is thrown. For tables with adds and removals you must use exact percentiles with ComboAggregateFactory.AggPct(double, java.lang.String...).

You may compute either one approximate percentile or several approximate percentiles at once. For example, to compute the 95th percentile of all other columns, by the "Sym" column you would call:

ApproximatePercentile.approximatePercentile(input, 0.95, "Sym")

If you need to compute several percentiles, it is more efficient to compute them simultaneously. For example, this example computes the 75th, 95th, and 99th percentiles of the "Latency" column using a builder pattern, and the 95th and 99th percentiles of the "Size" column by "Sym":

     new ApproximatePercentile.PercentileDefinition("Latency").add(0.75, "L75").add(0.95, "L95").add(0.99, "L99").nextColumn("Size").add(0.95, "S95").add(0.99, "S99");
     final Table aggregated = ApproximatePercentile.approximatePercentiles(input, definition);

When parallelizing a workload, you may want to divide it based on natural partitioning and then compute an overall percentile. In these cases, you should use the ApproximatePercentile.PercentileDefinition.exposeDigest(java.lang.String) method to expose the internal t-digest structure as a column. If you then perform an array aggregation (Table.by(com.illumon.iris.db.v2.by.AggregationStateFactory, com.illumon.iris.db.v2.select.SelectColumn...)), you can call the accumulateDigests(com.illumon.iris.db.tables.dbarrays.DbArray<com.tdunning.math.stats.TDigest>) function to produce a single digest that represents all of the constituent digests. The amount of error introduced is related to the compression factor that you have selected for the digests. Once you have a combined digest object, you can call the quantile or other functions to extract the desired percentile.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

ApproximatePercentile.PercentileDefinition

A builder class for an approximate percentile definition to be used with approximatePercentiles(com.illumon.iris.db.tables.Table, com.illumon.iris.db.v2.by.ApproximatePercentile.PercentileDefinition, com.illumon.iris.db.v2.select.SelectColumn...).
Field Summary

Fields

Modifier and Type

Field

Description

static double

DEFAULT_COMPRESSION
Method Summary

Modifier and Type

Method

Description

static com.tdunning.math.stats.TDigest

accumulateDigests(DbArray<com.tdunning.math.stats.TDigest> array)

Accumulate an DbArray of TDigests into a single new TDigest.

static Table

approximatePercentile(Table input, double percentile)

Compute the approximate percentiles for the table.

static Table

approximatePercentile(Table input, double compression, double percentile, SelectColumn... groupByColumns)

Compute the approximate percentiles for the table.

static Table

approximatePercentile(Table input, double percentile, SelectColumn... groupByColumns)

Compute the approximate percentiles for the table.

static Table

approximatePercentile(Table input, double percentile, String... groupByColumns)

Compute the approximate percentiles for the table.

static Table

approximatePercentiles(Table input, ApproximatePercentile.PercentileDefinition percentileDefinitions)

Compute a set of approximate percentiles for input according to the definitions in percentileDefinitions.

static Table

approximatePercentiles(Table input, ApproximatePercentile.PercentileDefinition percentileDefinitions, SelectColumn... groupByColumns)

Compute a set of approximate percentiles for input according to the definitions in percentileDefinitions.

static Table

approximatePercentiles(Table input, ApproximatePercentile.PercentileDefinition percentileDefinitions, String... groupByColumns)

Compute a set of approximate percentiles for input according to the definitions in percentileDefinitions.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- DEFAULT_COMPRESSION
  
  public static double DEFAULT_COMPRESSION
Method Details
- approximatePercentile
  
  public static Table approximatePercentile(Table input, double percentile)
  
  Compute the approximate percentiles for the table.
  
  Parameters:
  
  input - the input table
  
  percentile - the percentile to compute for each column
  
  Returns:
  
  a single row table with double columns representing the approximate percentile for each column of the input table
- approximatePercentile
  
  public static Table approximatePercentile(Table input, double percentile, String... groupByColumns)
  
  Compute the approximate percentiles for the table.
  
  Parameters:
  
  input - the input table
  
  percentile - the percentile to compute for each column
  
  groupByColumns - the columns to group by
  
  Returns:
  
  a with the groupByColumns and double columns representing the approximate percentile for each remaining column of the input table
- approximatePercentile
  
  public static Table approximatePercentile(Table input, double percentile, SelectColumn... groupByColumns)
  
  Compute the approximate percentiles for the table.
  
  Parameters:
  
  input - the input table
  
  percentile - the percentile to compute for each column
  
  groupByColumns - the columns to group by
  
  Returns:
  
  a with the groupByColumns and double columns representing the approximate percentile for each remaining column of the input table
- approximatePercentile
  
  public static Table approximatePercentile(Table input, double compression, double percentile, SelectColumn... groupByColumns)
  
  Compute the approximate percentiles for the table.
  
  Parameters:
  
  input - the input table
  
  compression - the t-digest compression parameter
  
  percentile - the percentile to compute for each column
  
  groupByColumns - the columns to group by
  
  Returns:
  
  a with the groupByColumns and double columns representing the approximate percentile for each remaining column of the input table
- approximatePercentiles
  
  public static Table approximatePercentiles(Table input, ApproximatePercentile.PercentileDefinition percentileDefinitions, SelectColumn... groupByColumns)
  
  Compute a set of approximate percentiles for input according to the definitions in percentileDefinitions.
  
  Parameters:
  
  input - the table to compute approximate percentiles for
  
  percentileDefinitions - the compression factor, and map of input columns to output columns
  
  groupByColumns - the columns to group by
  
  Returns:
  
  a table containing the groupByColumns and the approximate percentiles
- approximatePercentiles
  
  public static Table approximatePercentiles(Table input, ApproximatePercentile.PercentileDefinition percentileDefinitions, String... groupByColumns)
  
  Compute a set of approximate percentiles for input according to the definitions in percentileDefinitions.
  
  Parameters:
  
  input - the table to compute approximate percentiles for
  
  percentileDefinitions - the compression factor, and map of input columns to output columns
  
  groupByColumns - the columns to group by
  
  Returns:
  
  a table containing the groupByColumns and the approximate percentiles
- approximatePercentiles
  
  public static Table approximatePercentiles(Table input, ApproximatePercentile.PercentileDefinition percentileDefinitions)
  
  Compute a set of approximate percentiles for input according to the definitions in percentileDefinitions.
  
  Parameters:
  
  input - the table to compute approximate percentiles for
  
  percentileDefinitions - the compression factor, and map of input columns to output columns
  
  Returns:
  
  a table containing a single row with the the approximate percentiles
- accumulateDigests
  
  public static com.tdunning.math.stats.TDigest accumulateDigests(DbArray<com.tdunning.math.stats.TDigest> array)
  
  Accumulate an DbArray of TDigests into a single new TDigest.
  Accumulate the digests within the DbArray into a single TDigest. The compression factor is one third of the compression factor of the first digest within the array. If the array has only a single element, then that element is returned. If a null array is passed in, null is returned.
  
  This function is intended to be used for parallelization. The first step is to independently compute approximate percentiles with an exposed digest column using your desired buckets. Next, call Table.by(String...) to produce arrays of Digests for each relevant bucket. Once the arrays are created, use this function to accumulate the arrays of digests within an Table.update(String...) statement. Finally, you may call the TDigest quantile function (or others) to produce the desired approximate percentile.
  
  Parameters:
  
  array - an array of TDigests
  
  Returns:
  
  the accumulated TDigests

Class ApproximatePercentile

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

DEFAULT_COMPRESSION

Method Details

approximatePercentile

approximatePercentile

approximatePercentile

approximatePercentile

approximatePercentiles

approximatePercentiles

approximatePercentiles

accumulateDigests