com.illumon.iris.db.tables.utils.AutoTuningSimulation

public class AutoTuningSimulation
extends Object

A tool to consistently filter several input tables by timestamp.

When trying to assess the performance of an intraday query, it can be useful to replay historical data to determine how many rows per second may be processed. Importantly, when processing in micro batches it is important to have a realistic number of rows per batch. When too few rows are released in each batch there is additional overhead; when too many rows are released per batch an overly optimistic picture of performance may be given. In Deephaven, it makes sense to release as many rows as we can expect to process within the LiveTableMonitor cycle time.

For a single table the AutoTuningIncrementalReleaseFilter will release an initial set of rows and measure the cycle time. Using the computed ratio of rows to cycle time, it adjusts the number of rows released for the next cycle. However, this does not provide any mechanism for keeping multiple tables in sync, which very much matters for use cases that involve joins.

The AutoTuningSimulation bridges that gap by releasing data based on a timestamp and the ratio of "data time" processed to the time that the LiveTableMonitor takes to process that data. The general pattern is to create the simulation, add several tables, and then run the simulation. The simulation provides a statistics table that can be used to then analyze performance.

In this simple example, quotes are joined with trades:

 import com.illumon.iris.db.tables.utils.AutoTuningSimulation;

 quotes = db.t("MarketUs", "QuoteNbboStock").where("Date >= `2020-06-01` && Date <= `2020-06-01`")
 trades = db.t("MarketUs", "TradeNbboStock").where("Date >= `2020-06-01` && Date <= `2020-06-01`")

 sim = new AutoTuningSimulation();

 quotesTuned = sim.addTable(quotes, "Timestamp")
 tradesTuned = sim.addTable(trades, "Timestamp")

 joined = quotesTuned.aj(tradesTuned, "Sym,Timestamp", "Price,Size,TradeTime=Timestamp").reverse()

 stats = sim.getStatisticsTable()

After establishing the simulation, no tables tick until it is run:

sim.run()

You can put the run in the same command as setting up the simulation; but then you will not be able to watch the results tick as the system processes them.

Constructor Summary

Constructors
Constructor	Description
`AutoTuningSimulation()`	Create an AutoTuningSimulation with the default log.
`AutoTuningSimulation(com.fishlib.io.logger.Logger log)`	Create an AutoTuningSimulation with the specified logger.

Method Summary

Modifier and Type	Method	Description
`Table`	`addTable(Table table)`	Add a table using the "Timestamp" column for our data timestamp.
`Table`	`addTable(Table table, String tsColumn)`	Add a table using the given timestamp column for our data timestamp.
`Table`	`getStatisticsTable()`	Get the performance results of this simulation.
`void`	`run()`	Start the simulation, and run to completion.
`AutoTuningSimulation`	`setMaximumStepNanos(long maximumStepNanos)`	Sets the maximum step in nanoseconds.
`AutoTuningSimulation`	`setMinimumStepNanos(long minimumStepNanos)`	Sets the minimum step in nanoseconds.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- AutoTuningSimulation
  
  public AutoTuningSimulation()
  
  Create an AutoTuningSimulation with the default log.
- AutoTuningSimulation
  
  public AutoTuningSimulation(com.fishlib.io.logger.Logger log)
  
  Create an AutoTuningSimulation with the specified logger.

Method Details

addTable

public Table addTable(Table table)

Add a table using the "Timestamp" column for our data timestamp.

Parameters:

table - the table to add

Returns:

the table filtered by this simulation's clock
addTable

public Table addTable(Table table, String tsColumn)

Add a table using the given timestamp column for our data timestamp.
Note that the table's timestamp column must be fully read in order to compute the minimum and maximum value, so this method may take significant time.

Parameters:

table - the table to add

tsColumn - the name of the data timestamp column

Returns:

the table filtered by this simulation's clock
run

public void run() throws InterruptedException

Start the simulation, and run to completion.

Throws:

InterruptedException - if the thread is interrupted
setMaximumStepNanos

public AutoTuningSimulation setMaximumStepNanos(long maximumStepNanos)

Sets the maximum step in nanoseconds. Defaults to 1 hour.
This is useful, because if there is a segment of time with very little data or data that is quick to process, the simulation could determine that an overly large amount of data should be released. Setting this to a smaller value ensures that the simulation will not take a small number of big steps.

Parameters:

maximumStepNanos - the maximum number of nanoseconds of data released on each step

Returns:

this simulation
setMinimumStepNanos

public AutoTuningSimulation setMinimumStepNanos(long minimumStepNanos)

Sets the minimum step in nanoseconds. Defaults to 1 millisecond.
This is useful to prevent the simulation from stalling by releasing too little data on each step and failing to make any progress.

Parameters:

minimumStepNanos - the minimum number of nanoseconds of data released on each step

Returns:

this simulation

getStatisticsTable

public Table getStatisticsTable()

Get the performance results of this simulation.

This table can be used to analyze the results of the simulation. The following columns are available:

Timestamp	Wall clock time of cycle completion
DataTimestamp	Timestamp of released data
CycleNanos	How many nano seconds the cycle just completed took
NextCycleDataStepNanos	How many nano seconds of data will be released on the next cycle
Complete	True if the simulation is complete, false otherwise
RowsReleased	An array of rows released per table, in the order they were added to the simulation
FractionComplete	The fraction of each table that has been released, in the order they were added to the simulation
TotalRowsReleased	The sum of the values in RowsReleased
ETANanos	How many wall clock nanos are expected to elapsed before completion
DataTimeRealTimeRatio	How many nano seconds of data were processed per nanosecond of real time
CompletionTarget	Wall clock time that the simulation is expected to be complete

Returns:: a table with statistics that updates for each cycle.

Class AutoTuningSimulation

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Method Details