Class CDCReplicator

java.lang.Object
com.illumon.iris.replication.mssql.CDCReplicator

public class CDCReplicator
extends Object
This class is designed to read from SQL Server "Change Data Capture (CDC)" instances and output the results into corresponding Iris binary logs. Each INSERT/UPDATE/DELETE statement on the underlying SQL Server table results in log entries in a special CDC table. By reading and replaying these into Iris we replicate the SQL server state asynchronously. The state of the replication sequence is represented by an SQL Server "LSN" (log sequence number), which is stored in each output row (as well a few other special columns including a column indicating INSERT/UPDATE/DELETE). When the replication starts for each table (the CDCTableReplicator class), it attempts to load the most recent LSN written (from the most recent Iris log file) and (re)starts from that point. If no state is found, and depending on configuration, the replicator will typically also take an initial snapshot from the source table and play that into the Iris log as INSERTS for each row before beginning replication. A single CDCReplicator instance can replicate any number of tables, as specified in the configuration. For each configured table, an instance of CDCTableReplicator is created to execute the replication process. For each table, there are a number of required configuration properties, including the details of of the CDC instance and the Iris logger class name. The Iris loggers are loaded by reflection, and must implement SQLReplicationLogger. The CDC replicator uses a ScheduledExecutorService, which has an underlying thread pool that is used to run the replication tasks. The number of threads in the pool (configuration property "replicationThreads") is important, since that is the maximum number of replication tasks that may run in parallel, and each of these tasks opens a connection to the database. This must be consistent with the maximum number of connections SQL Server to configured to accept and other applications accessing the server (if you are replicating 100 tables and SQL Server can only handle 50 concurrent connections, you should have 50 or fewer threads in the pool to avoid possible blocking or refused connection attempts). There is a tradeoff here between concurrency and load on the SQL Server database. A thread pool of perhaps 10 would not be an unreasonable compromise for a typical situation. We may in the future want to implement SQL Server connection pooling to further mitigate this issue.
  • Method Details

    • main

      public static void main​(String... args) throws IOException
      Regular main entry point, used when this module is called from a java command line, or from an IntelliJ run configuration.
      Parameters:
      args - Varargs list of arguments in Apache CLI format
      Throws:
      IOException