Data Replication

In designing any system, we always tend to ensure data replication at the database level so that if one database node goes down, another is available to serve client requests. Data replication generally happens on separate hardware, following the master-slave (primary-replica) architecture.

Write operations are generally performed on the main or primary database, while read operations can be distributed across the slave or duplicate databases. In case the master database goes out of service, the slaves choose one master among themselves. After the original master becomes active again, all the changes are synchronized back to it.

Synchronous replication:

In this type of replication, all write or update operations are performed on both the primary as well as secondary databases within the same I/O transaction. The client receives a write acknowledgment only after both copies are successfully committed, ensuring zero or near-zero data loss and strong data consistency. This method is ideal for mission-critical applications like financial systems, healthcare databases, and global inventory management where data integrity is a priority. However, this method is slower and more expensive due to higher latency and specialized infrastructure requirements.

Asynchronous replication:

Asynchronous replication writes data to the primary system first, then transmits it to the secondary system later, often in batches or with a delay. The primary system acknowledges the write immediately, enabling faster performance and lower latency, making it suitable for high-throughput applications like AdTech, fraud detection, and real-time personalization services.

Brain Split Issue:

The brain split issue is a critical distributed system failure that occurs when a network partition cuts communication between nodes, causing the cluster to divide into isolated groups. When two nodes are connected and the link between them goes down (without either node actually failing), both nodes may assume they are independent and start accepting write operations separately. This leads to serious data inconsistency.

One common way to solve the brain split problem is by using a 3-node cluster architecture instead of just two nodes. In a 3-node setup, the system follows a majority voting or quorum-based approach to decide which node should remain active as the primary.

In this approach, at least two out of three nodes must agree before any node can continue accepting write operations. If a network partition happens and one node gets isolated, it will not have the majority, so it automatically stops acting as the master. The other two nodes, which are still connected, form the majority and continue to operate normally. This prevents both sides from thinking they are the primary and avoids data inconsistency.

Another technique used in a 3-node system is a witness or arbitrator node. This third node does not store actual data but only helps in deciding which side of the partition should remain active. It helps break the tie and ensures that only one group of nodes can act as the master.

Using this majority-based method ensures that at any given time, only one node or group of nodes can accept write operations, completely preventing the brain split scenario.

Data Replication

Synchronous replication:

Asynchronous replication:

Brain Split Issue:

Comments

More from this blog

Bloom Filter

Sharding in DBMS

Hashing

Load Balancing

Command Palette

Synchronous replication:

Asynchronous replication:

Brain Split Issue:

Comments

More from this blog