Class AutoTuningSimulation
public class AutoTuningSimulation extends Object
A tool to consistently filter several input tables by timestamp.
When trying to assess the performance of an intraday query, it can be useful to replay historical data to determine how many rows per second may be processed. Importantly, when processing in micro batches it is important to have a realistic number of rows per batch. When too few rows are released in each batch there is additional overhead; when too many rows are released per batch an overly optimistic picture of performance may be given. In Deephaven, it makes sense to release as many rows as we can expect to process within the LiveTableMonitor cycle time.
For a single table the
AutoTuningIncrementalReleaseFilter
will release an initial set of rows and
measure the cycle time. Using the computed ratio of rows to cycle time, it adjusts the number of rows released for
the next cycle. However, this does not provide any mechanism for keeping multiple tables in sync, which very much
matters for use cases that involve joins.
The AutoTuningSimulation bridges that gap by releasing data based on a timestamp and the ratio of "data time" processed to the time that the LiveTableMonitor takes to process that data. The general pattern is to create the simulation, add several tables, and then run the simulation. The simulation provides a statistics table that can be used to then analyze performance.
In this simple example, quotes are joined with trades:
import com.illumon.iris.db.tables.utils.AutoTuningSimulation; quotes = db.t("MarketUs", "QuoteNbboStock").where("Date >= `2020-06-01` && Date <= `2020-06-01`") trades = db.t("MarketUs", "TradeNbboStock").where("Date >= `2020-06-01` && Date <= `2020-06-01`") sim = new AutoTuningSimulation(); quotesTuned = sim.addTable(quotes, "Timestamp") tradesTuned = sim.addTable(trades, "Timestamp") joined = quotesTuned.aj(tradesTuned, "Sym,Timestamp", "Price,Size,TradeTime=Timestamp").reverse() stats = sim.getStatisticsTable()After establishing the simulation, no tables tick until it is run:
sim.run()You can put the run in the same command as setting up the simulation; but then you will not be able to watch the results tick as the system processes them.
-
Constructor Summary
Constructors Constructor Description AutoTuningSimulation()
Create an AutoTuningSimulation with the default log.AutoTuningSimulation(com.fishlib.io.logger.Logger log)
Create an AutoTuningSimulation with the specified logger. -
Method Summary
Modifier and Type Method Description Table
addTable(Table table)
Add a table using the "Timestamp" column for our data timestamp.Table
addTable(Table table, String tsColumn)
Add a table using the given timestamp column for our data timestamp.Table
getStatisticsTable()
Get the performance results of this simulation.void
run()
Start the simulation, and run to completion.AutoTuningSimulation
setMaximumStepNanos(long maximumStepNanos)
Sets the maximum step in nanoseconds.AutoTuningSimulation
setMinimumStepNanos(long minimumStepNanos)
Sets the minimum step in nanoseconds.
-
Constructor Details
-
AutoTuningSimulation
public AutoTuningSimulation()Create an AutoTuningSimulation with the default log. -
AutoTuningSimulation
public AutoTuningSimulation(com.fishlib.io.logger.Logger log)Create an AutoTuningSimulation with the specified logger.
-
-
Method Details
-
addTable
Add a table using the "Timestamp" column for our data timestamp.- Parameters:
table
- the table to add- Returns:
- the table filtered by this simulation's clock
-
addTable
Add a table using the given timestamp column for our data timestamp.Note that the table's timestamp column must be fully read in order to compute the minimum and maximum value, so this method may take significant time.
- Parameters:
table
- the table to addtsColumn
- the name of the data timestamp column- Returns:
- the table filtered by this simulation's clock
-
run
Start the simulation, and run to completion.- Throws:
InterruptedException
- if the thread is interrupted
-
setMaximumStepNanos
Sets the maximum step in nanoseconds. Defaults to 1 hour.This is useful, because if there is a segment of time with very little data or data that is quick to process, the simulation could determine that an overly large amount of data should be released. Setting this to a smaller value ensures that the simulation will not take a small number of big steps.
- Parameters:
maximumStepNanos
- the maximum number of nanoseconds of data released on each step- Returns:
- this simulation
-
setMinimumStepNanos
Sets the minimum step in nanoseconds. Defaults to 1 millisecond.This is useful to prevent the simulation from stalling by releasing too little data on each step and failing to make any progress.
- Parameters:
minimumStepNanos
- the minimum number of nanoseconds of data released on each step- Returns:
- this simulation
-
getStatisticsTable
Get the performance results of this simulation.This table can be used to analyze the results of the simulation. The following columns are available:
Timestamp Wall clock time of cycle completion DataTimestamp Timestamp of released data CycleNanos How many nano seconds the cycle just completed took NextCycleDataStepNanos How many nano seconds of data will be released on the next cycle Complete True if the simulation is complete, false otherwise RowsReleased An array of rows released per table, in the order they were added to the simulation FractionComplete The fraction of each table that has been released, in the order they were added to the simulation TotalRowsReleased The sum of the values in RowsReleased ETANanos How many wall clock nanos are expected to elapsed before completion DataTimeRealTimeRatio How many nano seconds of data were processed per nanosecond of real time CompletionTarget Wall clock time that the simulation is expected to be complete - Returns:
- a table with statistics that updates for each cycle.
-