Class AutoTuningSimulation

java.lang.Object
com.illumon.iris.db.tables.utils.AutoTuningSimulation

public class AutoTuningSimulation
extends Object

A tool to consistently filter several input tables by timestamp.

When trying to assess the performance of an intraday query, it can be useful to replay historical data to determine how many rows per second may be processed. Importantly, when processing in micro batches it is important to have a realistic number of rows per batch. When too few rows are released in each batch there is additional overhead; when too many rows are released per batch an overly optimistic picture of performance may be given. In Deephaven, it makes sense to release as many rows as we can expect to process within the LiveTableMonitor cycle time.

For a single table the AutoTuningIncrementalReleaseFilter will release an initial set of rows and measure the cycle time. Using the computed ratio of rows to cycle time, it adjusts the number of rows released for the next cycle. However, this does not provide any mechanism for keeping multiple tables in sync, which very much matters for use cases that involve joins.

The AutoTuningSimulation bridges that gap by releasing data based on a timestamp and the ratio of "data time" processed to the time that the LiveTableMonitor takes to process that data. The general pattern is to create the simulation, add several tables, and then run the simulation. The simulation provides a statistics table that can be used to then analyze performance.

In this simple example, quotes are joined with trades:

 import com.illumon.iris.db.tables.utils.AutoTuningSimulation;

 quotes = db.t("MarketUs", "QuoteNbboStock").where("Date >= `2020-06-01` && Date <= `2020-06-01`")
 trades = db.t("MarketUs", "TradeNbboStock").where("Date >= `2020-06-01` && Date <= `2020-06-01`")

 sim = new AutoTuningSimulation();

 quotesTuned = sim.addTable(quotes, "Timestamp")
 tradesTuned = sim.addTable(trades, "Timestamp")

 joined = quotesTuned.aj(tradesTuned, "Sym,Timestamp", "Price,Size,TradeTime=Timestamp").reverse()

 stats = sim.getStatisticsTable()
 
After establishing the simulation, no tables tick until it is run:
sim.run()
You can put the run in the same command as setting up the simulation; but then you will not be able to watch the results tick as the system processes them.
  • Constructor Details

    • AutoTuningSimulation

      public AutoTuningSimulation()
      Create an AutoTuningSimulation with the default log.
    • AutoTuningSimulation

      public AutoTuningSimulation​(com.fishlib.io.logger.Logger log)
      Create an AutoTuningSimulation with the specified logger.
  • Method Details

    • addTable

      public Table addTable​(Table table)
      Add a table using the "Timestamp" column for our data timestamp.
      Parameters:
      table - the table to add
      Returns:
      the table filtered by this simulation's clock
    • addTable

      public Table addTable​(Table table, String tsColumn)
      Add a table using the given timestamp column for our data timestamp.

      Note that the table's timestamp column must be fully read in order to compute the minimum and maximum value, so this method may take significant time.

      Parameters:
      table - the table to add
      tsColumn - the name of the data timestamp column
      Returns:
      the table filtered by this simulation's clock
    • run

      public void run() throws InterruptedException
      Start the simulation, and run to completion.
      Throws:
      InterruptedException - if the thread is interrupted
    • setMaximumStepNanos

      public AutoTuningSimulation setMaximumStepNanos​(long maximumStepNanos)
      Sets the maximum step in nanoseconds. Defaults to 1 hour.

      This is useful, because if there is a segment of time with very little data or data that is quick to process, the simulation could determine that an overly large amount of data should be released. Setting this to a smaller value ensures that the simulation will not take a small number of big steps.

      Parameters:
      maximumStepNanos - the maximum number of nanoseconds of data released on each step
      Returns:
      this simulation
    • setMinimumStepNanos

      public AutoTuningSimulation setMinimumStepNanos​(long minimumStepNanos)
      Sets the minimum step in nanoseconds. Defaults to 1 millisecond.

      This is useful to prevent the simulation from stalling by releasing too little data on each step and failing to make any progress.

      Parameters:
      minimumStepNanos - the minimum number of nanoseconds of data released on each step
      Returns:
      this simulation
    • getStatisticsTable

      public Table getStatisticsTable()
      Get the performance results of this simulation.

      This table can be used to analyze the results of the simulation. The following columns are available:

      TimestampWall clock time of cycle completion
      DataTimestampTimestamp of released data
      CycleNanosHow many nano seconds the cycle just completed took
      NextCycleDataStepNanosHow many nano seconds of data will be released on the next cycle
      CompleteTrue if the simulation is complete, false otherwise
      RowsReleasedAn array of rows released per table, in the order they were added to the simulation
      FractionCompleteThe fraction of each table that has been released, in the order they were added to the simulation
      TotalRowsReleasedThe sum of the values in RowsReleased
      ETANanosHow many wall clock nanos are expected to elapsed before completion
      DataTimeRealTimeRatioHow many nano seconds of data were processed per nanosecond of real time
      CompletionTargetWall clock time that the simulation is expected to be complete
      Returns:
      a table with statistics that updates for each cycle.