Modern information systems have pretty high requirements for the speed of working out search queries. These systems should quickly scale without compromising system speed. One way to meet this need is to create a database that supports the data replication mechanism.
Replication is a set of technologies that allows users to copy, distribute, and synchronize specific types of database objects and associated data from one database to one or more other databases.
Almost every critical database has a replica or even more than one. Among the tasks solved by replication are:
- support of the backup database in case of loss of the main one;
- reducing the load on the base due to the transfer of some requests for replicas;
- Transfer data to archival or analytical systems.
There are different methods of replication. One of the most popular patterns is Change Data Capture (CDC).
What is CDC in replication?
CDC is an application that can process change logs, highlight data change events from them, and notify the consumer of changes implementing business logic. Businesses can use updated data for Business Intelligence (BI) and data analysis.
The most famous platforms of this class are:
• IBM InfoSphere Data Replication
• Oracle GoldenGate
• Qlik Data Integration
• Informatica PowerExchange CDC.
The platform's task is to read database logs, transform information, and transfer information to a replica. The log has to contain information about the changed fields. The use of an additional application allows users to immediately perform complex transformations of replicated data and build complex replication mechanisms.
• the ability to replicate between different DBMSs, including uploading data to reporting systems;
• broadest data processing and conversion capabilities
• minimal traffic between nodes (the platform cuts off unnecessary data and can compress traffic)
• built-in replication status monitoring capabilities.
Unlike other replication methods, CDC is fast, comprehensive, and lightweight.
• increasing the volume of logs, as in logical replication by DBMS;
• new software (difficult to configure and/or with expensive licenses).
It is a CDC platform traditionally used to update corporate data warehouses in a mode close to real-time. The correct solution to build an analytical Data Warehouse from a DBMS is to use this technology. Otherwise, the database will slow down or will not work at all.