Class CDCReplicator
java.lang.Object
com.illumon.iris.replication.mssql.CDCReplicator
public class CDCReplicator extends Object
This class is designed to read from SQL Server "Change Data Capture (CDC)" instances and output the results into
corresponding Iris binary logs. Each INSERT/UPDATE/DELETE statement on the underlying SQL Server table results in
log entries in a special CDC table. By reading and replaying these into Iris we replicate the SQL server state
asynchronously. The state of the replication sequence is represented by an SQL Server "LSN" (log sequence number),
which is stored in each output row (as well a few other special columns including a column indicating
INSERT/UPDATE/DELETE). When the replication starts for each table (the CDCTableReplicator class), it attempts to
load the most recent LSN written (from the most recent Iris log file) and (re)starts from that point. If no state is
found, and depending on configuration, the replicator will typically also take an initial snapshot from the source
table and play that into the Iris log as INSERTS for each row before beginning replication.
A single CDCReplicator instance can replicate any number of tables, as specified in the configuration. For each
configured table, an instance of CDCTableReplicator is created to execute the replication process.
For each table, there are a number of required configuration properties, including the details of of the CDC instance
and the Iris logger class name. The Iris loggers are loaded by reflection, and must implement SQLReplicationLogger.
The CDC replicator uses a ScheduledExecutorService, which has an underlying thread pool that is used
to run the replication tasks. The number of threads in the pool (configuration property "replicationThreads") is
important, since that is the maximum number of replication tasks that may run in parallel, and each of these tasks
opens a connection to the database. This must be consistent with the maximum number of connections SQL Server to
configured to accept and other applications accessing the server (if you are replicating 100 tables and SQL Server
can only handle 50 concurrent connections, you should have 50 or fewer threads in the pool to avoid possible
blocking or refused connection attempts). There is a tradeoff here between concurrency and load on the SQL Server
database. A thread pool of perhaps 10 would not be an unreasonable compromise for a typical situation. We may in the
future want to implement SQL Server connection pooling to further mitigate this issue.
-
Method Details
-
main
Regular main entry point, used when this module is called from a java command line, or from an IntelliJ run configuration.- Parameters:
args
- Varargs list of arguments in Apache CLI format- Throws:
IOException
-