{"id":103976,"date":"2024-10-18T08:27:00","date_gmt":"2024-10-18T08:27:00","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=103976"},"modified":"2024-09-25T20:40:55","modified_gmt":"2024-09-25T20:40:55","slug":"consistency-and-concurrency-in-newsql-database-systems","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/cloud\/big-data\/consistency-and-concurrency-in-newsql-database-systems\/","title":{"rendered":"Consistency and Concurrency in NewSQL Database Systems"},"content":{"rendered":"\n<p>Companies today require database systems that are reliable and capable of efficiently handling large volumes of data and numerous transactions. Traditional relational databases, once the foundation of data management, often struggle to meet these modern demands, leading to delays and program slowdowns. In response, <a href=\"https:\/\/www.dremio.com\/wiki\/newsql\/#:~:text=NewSQL%20is%20a%20class%20of,Durability)%20of%20traditional%20database%20systems.\">NewSQL<\/a> databases, a new class of <a href=\"https:\/\/aws.amazon.com\/what-is\/sql\/#:~:text=Structured%20query%20language%20(SQL)%20is,information%20in%20a%20relational%20database.\">SQL<\/a> systems, has emerged. These databases combine the best features of contemporary <a href=\"https:\/\/www.ibm.com\/topics\/nosql-databases#:~:text=IBM-,What%20is%20a%20NoSQL%20database%3F,structures%20found%20in%20relational%20databases.\">NoSQL<\/a> systems and traditional SQL databases, aiming to deliver the scalability and performance associated with NoSQL while maintaining the reliability and consistency of SQL.<\/p>\n\n\n\n<p>NewSQL is a term that describes a new class of databases that aim to deliver the same reliable and predictable performance as traditional SQL databases but with the scalability needed for modern applications. These databases are designed to handle large-scale, high-transaction environments, making them ideal for applications that require both high availability and high performance.<\/p>\n\n\n\n<p>This means they can spread their workload across multiple servers without sacrificing the benefits of the structured query language (SQL) and transactional integrity. When you make a transaction in a database, whether it&#8217;s transferring money between bank accounts or updating a user profile, you want to ensure that the transaction is completed reliably. This is where ACID properties come in. ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These properties are essential for ensuring that database transactions are processed reliably, and that the database stays in a correct and consistent state even in the face of errors, crashes, or other issues.<\/p>\n\n\n\n<p>Maintaining consistency in distributed environments, where data is spread across multiple servers or data centers is challenging. In such environments, ensuring that all copies of the data remain in sync and reflect the same state at any given time is difficult due to various factors like network latency, server failures, and data replication delays.<\/p>\n\n\n\n<p>NewSQL databases aim to maintain ACID properties, which are critical for ensuring data integrity, while also scaling out across distributed systems, but there are limitations. One of the main challenges is defined in the <a href=\"https:\/\/en.wikipedia.org\/wiki\/CAP_theorem\"><strong>CAP theorem<\/strong><\/a>, which states that a distributed database can only guarantee two out of the following three properties simultaneously: Consistency, Availability, and Partition tolerance. Consistency means that every reader receives the most recent write. Availability means that every request receives a response in a reasonably amount of time, even if some of the data is out of date. Partition tolerance means that the system continues to operate despite network partitions. In practice, this means that achieving strong consistency often requires sacrificing some availability or partition tolerance.<\/p>\n\n\n\n<p>Another challenge is network latency and speed, including the time it takes data to be processed and the time it takes for data to travel across the network. In a distributed system, data must often be synchronized across multiple servers, which can introduce delays. These delays can lead to temporary inconsistencies where different servers have different versions of the data.<\/p>\n\n\n\n<p>Network partitions, where communication between servers is temporarily broken, also pose a challenge. During a partition, some servers may not be able to communicate with others, leading to inconsistent states. Once the partition is resolved, the system must reconcile these differences and ensure all servers are brought back to a consistent state.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.30j0zll\"><\/a>Understanding ACID Properties of Transactions<\/h2>\n\n\n\n<p>Ensuring these databases work reliably is important, and this is where the ACID properties come in. These properties ensure that database transactions are processed in a reliable and predictable manner. In this section, I will explain each of these properties to help you understand why they are important and how they work.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.1fob9te\"><\/a>Atomicity<\/h3>\n\n\n\n<p>Atomicity in database terms means that a transaction is an indivisible unit. All steps are either completed in full or not at all. If any part of the transaction fails, the entire transaction fails, and the database is left unchanged. Atomicity prevents partial updates, which could lead to inconsistencies. Even single SQL statements that change just one column in one row will have multiple steps to change and log rows for recovery.<\/p>\n\n\n\n<p>Imagine you are transferring money from one bank account to another. This transaction involves two operations: deducting money from one account and adding it to another. If the transaction only completes the first operation (deducting money) but fails to complete the second (adding money), the money will disappear. Atomicity ensures that this doesn\u2019t happen.<\/p>\n\n\n\n<p>For example, let&#8217;s say we have a database of bank accounts, and we want to transfer $200 from Account A to Account B.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">BEGIN TRANSACTION;\n\nUPDATE accounts\nSET balance = balance - 200\nWHERE account_id = 'A';\n\nUPDATE accounts\nSET balance = balance + 200\nWHERE account_id = 'B';\n\nCOMMIT;<\/pre>\n\n\n\n<p>In this example, the transaction starts with:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">BEGIN TRANSACTION. \n\nUPDATE accounts \nSET balance = balance - 200 \nWHERE account_id = \u2018A\u2019;<\/pre>\n\n\n\n<p>Which deducts $200 from Account A. while:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">UPDATE accounts \nSET balance = balance - 100 \nWHERE account_id = \u2018B\u2019;<\/pre>\n\n\n\n<p>Adds $200 to Account B. If any part of this transaction fails:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">COMMIT<\/pre>\n\n\n\n<p>Will not be executed; it finalizes the transaction, and the database will roll back to its previous state, ensuring atomicity. Note that you may need additional error handling, depending on the client tools that are executing these statements.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.3znysh7\"><\/a>Consistency<\/h3>\n\n\n\n<p>Consistency ensures that a transaction brings the database from one valid state to another, adhering to all predefined rules, such as constraints, cascades, and triggers. This means that any transaction will leave the database in a valid state. Consistency is important for maintaining the integrity of the database. Consistency prevents data from becoming corrupt or illogical. Sadly, it only keeps it as consistent as the rules you implement!<\/p>\n\n\n\n<p>For instance, if a database has a rule that the balance of any bank account cannot be negative, a transaction that would result in a negative balance should be rejected.<\/p>\n\n\n\n<p>Continuing with our bank example, let&#8217;s enforce a rule that no account can have a negative balance.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">BEGIN TRANSACTION;\n\n-- Deduct $200 from Account A\nUPDATE accounts\nSET balance = balance - 200\nWHERE account_id = 'A';\n\n-- Check if Account A's balance is still \n-- non-negative after the deduction\nIF (SELECT balance FROM accounts WHERE account_id = 'A') &gt;= 0 THEN\n\n    -- If the balance is non-negative, add $200 to Account B\n    UPDATE accounts\n    SET balance = balance + 200\n    WHERE account_id = 'B';\n\n    -- Finalize the transaction and make changes permanent\n    COMMIT;\n\nELSE\n\n    -- If the balance is negative, undo the \n    -- transaction to prevent inconsistency\n    ROLLBACK;\n\nEND IF;<\/pre>\n\n\n\n<p>In the example above, the transaction begins with the <code>BEGIN TRANSACTION<\/code> command, which marks the start of a series of operations that need to be treated as a single unit. The first operation is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">UPDATE accounts \nSET balance = balance - 200 \nWHERE account_id = 'A';<\/pre>\n\n\n\n<p>Which deducts $200 from Account A. After this deduction, the transaction checks if the balance of Account A remains non-negative with the condition<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">IF (SELECT balance FROM accounts WHERE account_id = 'A') &gt;= 0.<\/pre>\n\n\n\n<p>If this condition is true, meaning Account A still has a non-negative balance, the subsequent operation<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">UPDATE accounts \nSET balance = balance + 200 \nWHERE account_id = 'B';<\/pre>\n\n\n\n<p>is performed, adding $200 to Account B, and the transaction is finalized with <code>COMMIT<\/code>.<\/p>\n\n\n\n<p>If the condition is false, indicating that the balance would be negative. In that case, the transaction is aborted with <code>ROLLBACK<\/code>, undoing any changes made during the transaction and thus maintaining the database&#8217;s consistency by preventing an invalid state where an account balance is negative. However, it\u2019s important to note that under certain isolation levels, there is no guarantee that Account A&#8217;s balance will remain non-negative by the time the second <code>UPDATE<\/code> occurs. This is because other concurrent transactions could modify the balance of Account A after the initial check but before the <code>UPDATE<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.2et92p0\"><\/a>Isolation<\/h3>\n\n\n\n<p>Isolation actively shields concurrent transactions from interfering with each other. This ensures each transaction runs as if it&#8217;s the only one happening, even when multiple transactions occur simultaneously. Without isolation, concurrent transactions could cause inconsistencies. For example, if two people simultaneously attempt to withdraw money from the same bank account, both transactions might read the same initial balance, leading to an overdraft. Isolation protects against such issues by managing how transactions interact, allowing you to fine-tune the level of interference permitted between them. This fine-tuning is important for maintaining data consistency and integrity in systems where multiple transactions occur simultaneously.<\/p>\n\n\n\n<p>Consider two transactions attempting to withdraw $200 from Account A simultaneously.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">-- Transaction 1\nBEGIN TRANSACTION;\n\n-- Lock the row for Account A to prevent other \n-- transactions from modifying it until this \n-- transaction completes.\nSELECT balance \nFROM accounts \nWHERE account_id = 'A' FOR UPDATE;\n\n-- Deduct $200 from Account A\nUPDATE accounts\nSET balance = balance - 200\nWHERE account_id = 'A';\n\n-- Finalize the transaction and make changes permanent\nCOMMIT;\n\n\n-- Transaction 2 (a different connection\nBEGIN TRANSACTION;\n\n-- Attempt to lock the same row for Account A. \n-- This will wait if Transaction 1 hasn\u2019t finished yet.\nSELECT balance FROM accounts WHERE account_id = 'A' FOR UPDATE;\n\n-- Deduct $200 from Account A after the lock is acquired\nUPDATE accounts\nSET balance = balance - 200\nWHERE account_id = 'A';\n\n-- Finalize the transaction and make changes permanent\nCOMMIT;<\/pre>\n\n\n\n<p>In this example, the process begins with initiating each transaction using the <code>BEGIN TRANSACTION<\/code> command. Following this, the database locks the row corresponding to Account A using<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT balance\nFROM accounts\nWHERE account_id = 'A' FOR UPDATE;<code><\/code><\/pre>\n\n\n\n<p>This lock ensures that only one transaction at a time can access and modify the balance of Account A, thereby preventing potential conflicts and maintaining isolation between concurrent transactions. Next, the transaction deducts $200 from Account A with:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">UPDATE accounts \nSET balance = balance - 200 \nWHERE account_id = 'A';.<\/pre>\n\n\n\n<p>Once the deduction is made, the transaction is finalized and committed using <code>COMMIT<\/code>, ensuring that all changes are permanently saved in the database.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.tyjcwt\"><\/a>Durability<\/h3>\n\n\n\n<p>Durability guarantees that once a transaction is committed, it will remain so, even in the event of a system crash. This guarantee is typically achieved through logging changes that have been made in the data file. Durability is important for ensuring that data is not lost once a transaction is completed. For instance, once the bank confirms a money transfer, it should be permanent, even if the system crashes immediately after.<\/p>\n\n\n\n<p>Of course, this only works if the crashed system does not lose its data and log drives simultaneously. Then, you need to have backups to go back to a point in time, and you likely lose very recent transactions.<\/p>\n\n\n\n<p>Let&#8217;s discuss how the engine ensures that our bank transactions are durable.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">-- Start the transaction\nBEGIN TRANSACTION;\n\n-- Deduct $200 from Account A\nUPDATE accounts\nSET balance = balance - 200\nWHERE account_id = 'A';\n\n-- Add $200 to Account B\nUPDATE accounts\nSET balance = balance + 200\nWHERE account_id = 'B';\n\n-- Commit the transaction to make changes permanent\nCOMMIT;<\/pre>\n\n\n\n<p>Here, <code>BEGIN TRANSACTION<\/code> starts a new transaction, signalling that all following actions should be treated as part of this single transaction. The code continues as:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">UPDATE accounts \nSET balance = balance - 200 \nWHERE account_id = \u2018A\u2019<\/pre>\n\n\n\n<p>The statement deducts $200 from Account A, and then:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">UPDATE accounts \nSET balance = balance + 200 \nWHERE account_id = \u2018B\u2019<\/pre>\n\n\n\n<p>Will add $200 to Account B. These changes are temporarily staged and will not be visible to others until COMMIT is executed, which finalizes the transaction and makes all changes permanent. Internally, databases use Write-Ahead Logging (<a href=\"https:\/\/hevodata.com\/learn\/write-ahead-logging\/\">WAL<\/a>) to ensure durability. WAL involves writing changes to a log file before updating the actual database files. This log is stored on durable storage, allowing the database to recover and apply committed transactions even if a failure occurs. In SQL Server, for example, this process is handled automatically by the system, so users do not need to manually insert log records. This mechanism ensures that all changes made in a transaction are preserved and will survive system failures.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.3dy6vkm\"><\/a>NewSQL Architecture<\/h2>\n\n\n\n<p>In this section, let&#8217;s look at the key components of NewSQL architecture.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.1t3h5sf\"><\/a>Key components of NewSQL architecture<\/h3>\n\n\n\n<p>NewSQL databases are built on several core components that enable them to handle large-scale, high-performance workloads while maintaining ACID properties. Here are the key components:<\/p>\n\n\n\n<p><strong>Distributed Data Storage<\/strong>: One of the fundamental aspects of NewSQL databases is distributed data storage. Instead of storing all data on a single server, NewSQL databases distribute data across multiple nodes or servers. This distribution allows the system to balance the load and scale horizontally, meaning more servers can be added to handle increasing amounts of data and traffic.<\/p>\n\n\n\n<p><strong>Automatic Sharding<\/strong>: Sharding is the process of splitting a database into smaller, more manageable pieces called shards. NewSQL databases automatically handle sharding, distributing data across nodes to optimize performance and storage.<\/p>\n\n\n\n<p><strong>Distributed Transaction Management<\/strong>: NewSQL databases handle transactions in a distributed manner while maintaining ACID properties. They often use sophisticated algorithms like <a href=\"https:\/\/medium.com\/@mani.saksham12\/raft-and-paxos-consensus-algorithms-for-distributed-systems-138cd7c2d35a\">Paxos or Raft<\/a> for distributed consensus, ensuring that all nodes agree on the transaction&#8217;s outcome before it is committed.<\/p>\n\n\n\n<p><strong>Concurrency Control<\/strong>: To manage simultaneous transactions, NewSQL databases employ advanced concurrency control mechanisms. Techniques like optimistic concurrency control and multi-version concurrency control (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Multiversion_concurrency_control\">MVCC<\/a>) helps prevent conflicts and ensure that transactions are processed smoothly without interfering with each other.<\/p>\n\n\n\n<p><strong>Replication<\/strong>: Replication is crucial for high availability and fault tolerance. NewSQL databases replicate data across multiple nodes so that if one node fails, others can continue to provide access to the data. This replication can be synchronous or asynchronous, depending on the specific needs of the application.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.4d34og8\"><\/a>Examples of NewSQL Database Systems<\/h3>\n\n\n\n<p>Two examples of NewSQL databases are Google Spanner and CockroachDB. Let&#8217;s look at each of these in detail.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.2s8eyo1\"><\/a>Google Spanner<\/h3>\n\n\n\n<p><a href=\"https:\/\/cloud.google.com\/spanner?hl=en\">Google Spanner<\/a> is a globally distributed NewSQL database that provides strong consistency and horizontal scalability. It uses <a href=\"https:\/\/cloud.google.com\/spanner\/docs\/true-time-external-consistency\">Google&#8217;s TrueTime API<\/a> to achieve external consistency, which ensures that transactions are consistently ordered across distributed nodes. This makes it possible to have a single, globally consistent view of the data.<\/p>\n\n\n\n<p><strong>Key Features<\/strong><\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>Global Distribution:<\/strong> Spanner can distribute data across data centers worldwide, providing low-latency access to data from any location.<\/li>\n\n\n\n<li><strong>Strong Consistency:<\/strong> Using the <a href=\"https:\/\/www.educative.io\/courses\/grokking-the-principles-and-practices-of-advanced-system-design\/truetime-api-in-spanner\">TrueTime API<\/a>, Spanner ensures that all nodes agree on the exact order of transactions, maintaining strong consistency.<\/li>\n\n\n\n<li><strong>Scalability:<\/strong> Spanner can handle massive amounts of data and high transaction volumes, making it ideal for large-scale applications.<\/li>\n\n\n\n<li><strong>Works Like Relational Databases:<\/strong> Supports the relational model, making it less of a learning curve for users of typical relational databases. Supports both Google SQL, and PostgreSQL\u2019s SQL Dialect. For more detail, see <a href=\"https:\/\/cloud.google.com\/spanner\/docs\/reference\/dialect-differences\">this article<\/a> in the Google Cloud documentation.<\/li>\n<\/ul>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.17dp8vu\"><\/a>CockroachDB<\/h3>\n\n\n\n<p><a href=\"https:\/\/www.cockroachlabs.com\/\">CockroachDB<\/a> is an open-source NewSQL database designed to be globally distributed and resilient to failures. It uses a distributed consensus protocol based on Raft to manage transaction coordination and maintain consistency across nodes. CockroachDB is known for its ease of deployment and ability to scale horizontally.<\/p>\n\n\n\n<p><strong>Key Features<\/strong><\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Resilience to Failures: CockroachDB is designed to handle node failures gracefully, ensuring that data remains accessible and consistent.<\/li>\n\n\n\n<li>Horizontal Scalability: The database can scale out by adding more nodes, distributing data, and load to maintain performance.<\/li>\n\n\n\n<li>Strong Consistency: CockroachDB uses the Raft consensus algorithm to ensure that all nodes agree on the state of the database, maintaining strong consistency.<\/li>\n\n\n\n<li>Standard SQL Support: Supports a wide range of SQL statements that developers and users will be familiar with.<\/li>\n\n\n\n<li>Global Distribution: Allows for multiple region deployment and geo-partitioning for applications that have users around the world.<\/li>\n<\/ul>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.3rdcrjn\"><\/a>Consistency in Distributed Environments<\/h2>\n\n\n\n<p>Data isn&#8217;t stored in just one place. Instead, it&#8217;s spread across multiple servers, sometimes scattered around the globe. This distribution helps with scaling and performance but introduces a significant challenge: maintaining consistency. Consistency in a distributed environment means ensuring that all copies of the data reflect the same information, no matter where they are or how many transactions are happening at once. Consistency in the context of distributed systems ensures that any read operation retrieves the most recent write operation&#8217;s result. If you update a piece of data, all subsequent reads should reflect this update, regardless of where they happen in the system. However, achieving this in a distributed setup is not straightforward due to various factors like network delays, server failures, and concurrent transactions.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.26in1rg\"><\/a>Types of Consistency<\/h3>\n\n\n\n<p><strong>Strong Consistency:<\/strong> Strong consistency guarantees that after a write is acknowledged, all subsequent reads will reflect that write. This type of consistency is the easiest to reason about but can be challenging to achieve in a distributed environment.<\/p>\n\n\n\n<p><strong>Eventual Consistency:<\/strong> Eventual consistency guarantees that, given enough time, all copies of the data will converge to the same value. It doesn&#8217;t guarantee immediate consistency but ensures that the data will be consistent eventually. This is often used in systems where high availability is more critical than immediate consistency.<\/p>\n\n\n\n<p><strong>Causal Consistency:<\/strong> Causal consistency is about preserving the order of operations based on their causal relationships. It ensures that if one operation causally influences another, the system maintains this sequence, but it does not require a strict global order across all operations.<\/p>\n\n\n\n<p>Several factors make maintaining consistency in distributed environments difficult:<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>Network Latency:<\/strong> The time it takes for data to travel across the network can cause delays in synchronization.<\/li>\n\n\n\n<li><strong>Partition Tolerance:<\/strong> In a distributed system, network partitions can occur, isolating parts of the system. Ensuring consistency during partitions is challenging.<\/li>\n\n\n\n<li><strong>Concurrency:<\/strong> Multiple transactions occurring simultaneously can lead to conflicts and inconsistencies if not managed properly.<\/li>\n<\/ul>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\">Ways To Ensure Consistency<\/h3>\n\n\n\n<p>In this section I will cover several of the major tactics that are used to ensure consistency when saving data in more than one location.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Two-Phase Commit (2PC)<\/strong><\/h4>\n\n\n\n<p>The <a href=\"https:\/\/www.dremio.com\/wiki\/two-phase-commit\/#:~:text=Two%2DPhase%20Commit%20consists%20of,transaction%20based%20on%20participant%20responses.\">Two-Phase Commit protocol<\/a> is a classic algorithm used to ensure atomicity and consistency across distributed systems. It involves two phases: the prepare phase and the commit phase.<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li><strong>Prepare Phase:<\/strong> The coordinator asks all participating nodes if they can commit the transaction.<\/li>\n\n\n\n<li><strong>Commit Phase:<\/strong> If all nodes agree, the coordinator sends a commit message. Otherwise, it sends a rollback message.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Example:<\/p>\n\n\n\n<p>Let&#8217;s say we have a distributed banking system where we need to transfer money between accounts on different servers.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">class TwoPhaseCommit:\n    def __init__(self):\n        # Initialize the coordinator with an empty list of participants\n        self.participants = []\n\n    def add_participant(self, participant):\n        # Add a participant (server or database) to the coordinator list\n        self.participants.append(participant)\n\n    def prepare(self):\n        # Phase 1: Prepare each participant for committing\n        for participant in self.participants:\n            if not participant.prepare():\n                # If any participant cannot prepare, roll back and return false\n                self.rollback()\n                return False\n\n        # All participants prepared successfully, return true\n        return True\n\n    def commit(self):\n        # Phase 2: Commit all participants if preparation was successful\n        for participant in self.participants:\n            participant.commit()\n\n    def rollback(self):\n        # Rollback all participants in case of failure during preparation\n        for participant in self.participants:\n            participant.rollback()\n\nclass Participant:\n    def __init__(self, name):\n        # Initialize participant with a name and an initial state\n        self.name = name\n        self.state = 'INIT'\n\n    def prepare(self):\n        # Prepare the participant for committing\n        print(f'{self.name}: Preparing...')\n        self.state = 'PREPARED'\n        return True  # Assume preparation always succeeds in this example\n\n    def commit(self):\n        # Commit the participant's transaction\n        print(f'{self.name}: Committing...')\n        self.state = 'COMMITTED'\n\n    def rollback(self):\n        # Roll back the participant's transaction\n        print(f'{self.name}: Rolling back...')\n        self.state = 'ROLLED BACK'\n\n# Example usage\ncoordinator = TwoPhaseCommit()\nparticipant1 = Participant('Server1')\nparticipant2 = Participant('Server2')\n\n# Add participants to the coordinator\ncoordinator.add_participant(participant1)\ncoordinator.add_participant(participant2)\n\n# Start the two-phase commit process\nif coordinator.prepare():\n    # If all participants are prepared, commit the transaction\n    coordinator.commit()\nelse:\n    # If any participant fails preparation, roll back the transaction\n    coordinator.rollback()<\/pre>\n\n\n\n<p>In this example, the <strong>TwoPhaseCommit<\/strong> class coordinates a distributed transaction by managing a list of participants, which are represented by the <strong>Participant<\/strong> class. During initialization, the <strong>TwoPhaseCommit <\/strong>object starts with an empty participant list, while each <strong>Participant <\/strong>begins in the <strong>&#8216;INIT&#8217;<\/strong> state.<\/p>\n\n\n\n<p>When participants are added using <strong>add_participant<\/strong>(), the coordinator initiates the prepare phase with <strong>coordinator.prepare()<\/strong>. Each participant responds by setting its state to &#8216;PREPARED&#8217; if it can commit; otherwise, it returns <strong>False<\/strong>, prompting the coordinator to call rollback() and abort the transaction. If all participants are prepared, the coordinator proceeds to the commit phase with <strong>coordinator.commit(), <\/strong>where each participant finalizes the transaction by changing its state to &#8216;COMMITTED&#8217;.<\/p>\n\n\n\n<p>If any participant fails preparation, the coordinator will roll back the transaction, calling rollback() for each participant to revert their state to &#8216;ROLLED BACK&#8217;. The<strong> TwoPhaseCommit Class m<\/strong>anages the 2PC protocol, coordinating prepare, commit, and rollback operations. The Participant Class represents a node in the distributed system. Each participant can prepare, commit, or rollback a transaction. <strong>Example Usage <\/strong>creates a coordinator and two participants. The coordinator prepares the transaction. If all participants are ready, it commits the transaction; otherwise, it rolls back.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.44sinio\"><\/a>Consensus Algorithms<\/h4>\n\n\n\n<p><a href=\"https:\/\/www.techtarget.com\/whatis\/definition\/consensus-algorithm#:~:text=A%20consensus%20algorithm%20is%20a,among%20distributed%20processes%20or%20systems.\">Consensus algorithms<\/a> like <a href=\"https:\/\/medium.com\/@mani.saksham12\/raft-and-paxos-consensus-algorithms-for-distributed-systems-138cd7c2d35a\">Paxos and Raft<\/a> are used to ensure consistency in distributed systems by achieving agreement among distributed nodes. Paxos ensures that a majority of nodes agree on the same value, which is important for maintaining consistency, while Raft provides a more straightforward and practical approach to consensus, which can lead to fewer implementation issues and better overall performance. For end users, this means that data should not be lost, and consistency should be maintained, but there might be delays if network issues or leader failures occur.<\/p>\n\n\n\n<p><a id=\"post-103976-_heading=h.2jxsxqh\"><\/a><strong>Example:<\/strong><\/p>\n\n\n\n<p>Here&#8217;s an example of a consensus algorithm using the Raft protocol.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import random\n# RaftNode represents an individual node in a Raft cluster\nclass RaftNode:\n    def __init__(self, name):\n        self.name = name           # Unique identifier for the node\n        self.term = 0              # Current term of the node (logical time period)\n        self.voted_for = None      # Keeps track of the candidate the node voted \n                                   # for in the current term\n        self.state = 'FOLLOWER'    # Nodes state: Follower, Candidate, or Leader\n        self.log = []              # Log entries that the node maintains (not \n                                   # used in this simple example)\n\n    # Method to request a vote from this node\n    def request_vote(self, candidate_term):\n        # Node grants its vote if the candidates term is higher than the \n        # nodes current term\n        if candidate_term > self.term:\n            self.term = candidate_term  # Update the nodes term to the \n                                        # candidate's term\n            self.voted_for = 'CANDIDATE'  # Record that this node has voted \n                                          #for the candidate\n            return True\n        return False  # If the candidate term is not higher, do not grant vote\n\n# RaftCluster represents the entire Raft cluster composed of multiple nodes\nclass RaftCluster:\n    def __init__(self):\n        self.nodes = []  # List to hold all nodes in the cluster\n\n    # Method to add a node to the cluster\n    def add_node(self, node):\n        self.nodes.append(node)\n\n    # Method to start an election process in the cluster\n    def start_election(self):\n        candidate = random.choice(self.nodes)  # Randomly select a candidate \n                                               # from the nodes\n        candidate.state = 'CANDIDATE'  # Change the candidate's state to CANDIDATE\n        candidate.term += 1  # Increment the candidate's term as it \n                             # starts a new election\n        votes = 0  # Counter to track the number of votes received\n        # Loop through all nodes in the cluster to request votes\n        for node in self.nodes:\n            if node.request_vote(candidate.term):  # If a node grants a vote\n                votes += 1  # Increment the vote counter\n\n        # If the candidate receives more than half of the \n        # votes, it becomes the leader\n        if votes > len(self.nodes) \/ 2:\n            candidate.state = 'LEADER'  # Update the candidate's state to LEADER\n            print(f'{candidate.name} is elected as leader with term {candidate.term}')\n        else:\n            # If the candidate fails to get a majority of votes, \n            # it remains a candidate or follower\n            print(f'{candidate.name} failed to become leader')\n\n# Example usage of the RaftCluster and RaftNode classes\ncluster = RaftCluster()\nnode1 = RaftNode('Node1')\nnode2 = RaftNode('Node2')\nnode3 = RaftNode('Node3')\n\n# Adding nodes to the cluster\ncluster.add_node(node1)\ncluster.add_node(node2)\ncluster.add_node(node3)\n\n# Starting an election in the cluster\ncluster.start_election()<\/pre>\n\n\n\n<p>The Raft algorithm is a way for multiple computers (or nodes) in a network to agree on a leader who will manage tasks for everyone. In this example, each computer is represented by a &#8220;<strong>RaftNode<\/strong>,&#8221; which has a &#8220;<strong>term<\/strong>&#8221; (the current election period), &#8220;<strong>voted_for<\/strong>&#8221; (who it voted for), and &#8220;<strong>state<\/strong>&#8221; (whether it is a <strong>&#8216;FOLLOWER&#8217;<\/strong>, <strong>&#8216;CANDIDATE&#8217;<\/strong>, or <strong>&#8216;LEADER&#8217;<\/strong>). All the nodes are managed by a &#8220;<strong>RaftCluster<\/strong>&#8221; class, which can add nodes to the group and start an election. When an election starts, one node is randomly picked to be a candidate; it then increases its term and asks the other nodes for votes. If it gets more than half of the votes, it becomes the leader; if not, the election fails, and another election might be needed later. Normally, each node would run on a different machine and communicate over a network. The leader ensures that all nodes follow the same instructions and records.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.z337ya\"><\/a>Quorum-based Approaches<\/h4>\n\n\n\n<p><a href=\"https:\/\/www.educative.io\/answers\/what-is-quorum-in-distributed-systems\">Quorum<\/a>-based approaches ensure that a transaction is committed only if a minimum number of nodes (a quorum) agree on it. This means a user doesn&#8217;t have to wait for every node to agree before seeing their action (like a post or an item added to a shopping cart). The system responds quickly as soon as the quorum is reached, providing a balance between speed and consistency. If not, all nodes agree or some are temporarily down, the system can still proceed as long as the quorum is achieved. This is why you might see your action reflected immediately, even if some nodes are lagging. However, in such cases, there is a small risk of inconsistency, like losing an item in your shopping cart. This approach is faster but may sacrifice some consistency in favor of a smoother user experience.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"># Class representing a simple quorum system for managing distributed writes\nclass QuorumSystem:\n    def __init__(self, nodes):\n        self.nodes = nodes  # List of nodes (servers\/databases) in the system\n\n    # Method to write data to the quorum system\n    def write(self, data):\n        acknowledgements = 0  # Count of nodes that acknowledge the write\n\n        for node in self.nodes:\n\n            # Attempt to write data to each node\n            if node.write(data):\n                acknowledgements += 1  # Increment if the write is successful\n\n        # Check if a majority (quorum) of nodes acknowledged the write\n        if acknowledgements &gt;= len(self.nodes) \/\/ 2 + 1:\n            print('Write successful')  # Quorum achieved, write is successful\n        else:\n            print('Write failed')  # Quorum not achieved, write failed\n\n# Class representing an individual node (server\/database) in the system\nclass Node:\n    def __init__(self, name):\n        self.name = name  # Name of the node for identification\n        self.data = None  # Placeholder for data storage\n\n    # Method to simulate writing data to the node\n    def write(self, data):\n        self.data = data  # Store the data in the node\n        print(f'{self.name}: Write {data}')  # Print the write action for \n                                             # demonstration\n        return True  # Assume write is always successful for simplicity\n\n# Example usage\nnodes = [Node('Node1'), Node('Node2'), Node('Node3')]  # Create a list of nodes\nquorum_system = QuorumSystem(nodes)  # Initialize the quorum system with the nodes\n\n# Write data to the quorum system, which attempts to write to all nodes\nquorum_system.write('Hello World') # 'Hello World' is the data being written<\/pre>\n\n\n\n<p>Here<strong>, <\/strong>the<strong> QuorumSystem<\/strong> class is a simple system that manages writing data across multiple servers or databases, called nodes. It makes sure that a write operation is only considered successful if a majority of the nodes (more than half) confirm that they received the data. The <strong>Node<\/strong> class represents each server or database in this system, and each node can write data to itself. In this example, the <strong>write<\/strong> method of each node just prints the data being written and always returns <strong>True<\/strong> to indicate success. When the <strong>write<\/strong> method of the <strong>QuorumSystem<\/strong> is called, it tries to write the data to all its nodes. It counts how many nodes acknowledge (or confirm) the write. If more than half of the nodes (a quorum) acknowledge it, the write is considered successful, and it prints &#8220;Write successful&#8221;; otherwise, it prints &#8220;Write failed.&#8221; The process involves initializing the quorum system with three nodes, attempting to write the data to all nodes, checking if a majority of nodes acknowledge the write, and then printing the result based on whether the quorum is achieved or not.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.3j2qqm3\"><\/a>Concurrency Control Mechanisms<\/h2>\n\n\n\n<p><a href=\"https:\/\/www.geeksforgeeks.org\/concurrency-control-in-dbms\/\">Concurrency control<\/a> refers to the management of simultaneous operations on a database without conflicting with each other. The goal is to ensure that transactions are executed so that the end result is correct and consistent, even if they occur concurrently.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.1y810tw\"><\/a>Techniques<\/h3>\n\n\n\n<p>In this section I will cover some of the techniques that are applied to data processing to ensure that access to data is isolated on different connections in such a way that we can be certain that changes to data doesn\u2019t corrupt anyone\u2019s view.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.4i7ojhp\"><\/a>Optimistic Concurrency Control (OCC)<\/h3>\n\n\n\n<p><a href=\"https:\/\/www.tutorialspoint.com\/what-is-an-optimistic-concurrency-control-in-dbms\">Optimistic concurrency control<\/a> allows transactions to move forward without locks, assuming conflicts are rare. At commit time, it checks for conflicts. If a conflict is detected, the transaction is rolled back and may be retried. OCC focuses on detecting conflicts after transactions execute, leading to potential rollbacks.<\/p>\n\n\n\n<p>Steps in OCC<\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Read Phase: The transaction reads the data it needs without acquiring any locks.<\/li>\n\n\n\n<li>Validation Phase: Before committing, the transaction checks if the data it read has been modified by another transaction.<\/li>\n\n\n\n<li>Write Phase: If no conflict is detected, the transaction writes its changes to the database. If a conflict is detected, the transaction is rolled back and retried.<\/li>\n<\/ul>\n<\/div>\n\n\n<p>Let&#8217;s use the bank example to demonstrate OCC.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">class Account:\n    def __init__(self, balance):\n        self.balance = balance\n\nclass Transaction:\n    def __init__(self):\n        # Initialize read and write sets\n        self.read_set = {}  # Keeps track of the initial balances \n                            # read during the transaction\n\n        self.write_set = {} # Keeps track of the new balances to be written\n\n    def read(self, account):\n        # Record the current balance of the account\n        self.read_set[account] = account.balance\n        return account.balance\n\n    def write(self, account, amount):\n        # Schedule a new balance for the account\n        self.write_set[account] = amount\n\n    def commit(self):\n        # Check if the balance has changed since the read operation\n        for account, initial_balance in self.read_set.items():\n            if account.balance != initial_balance:\n                print('Transaction failed due to conflict')\n                return False  # Transaction failed due to a conflict\n        \n        # Apply all scheduled writes\n        for account, new_balance in self.write_set.items():\n            account.balance = new_balance\n        \n        print('Transaction committed successfully')\n        return True  # Transaction committed successfully\n\n# Example usage\naccount_A = Account(1000)  # Initialize account A with $1000\naccount_B = Account(500)   # Initialize account B with $500\n\n# Transaction 1\ntransaction1 = Transaction()  # Create a new transaction\nbalance_A = transaction1.read(account_A)  # Read initial balance of account A\nbalance_B = transaction1.read(account_B)  # Read initial balance of account B\n# Schedule $100 deduction from account A \ntransaction1.write(account_A, balance_A - 100)  \n# Schedule $100 addition to account B\ntransaction1.write(account_B, balance_B + 100)  \ntransaction1.commit()  # Attempt to commit the transaction\n\n# Transaction 2\ntransaction2 = Transaction()  # Create another new transaction\nbalance_B = transaction2.read(account_B)  # Read initial balance of account B\nbalance_A = transaction2.read(account_A)  # Read initial balance of account A\n\n# Schedule $50 deduction from account B\ntransaction2.write(account_B, balance_B - 50)  \n# Schedule $50 addition to account A\ntransaction2.write(account_A, balance_A + 50)  \ntransaction2.commit()  # Attempt to commit the transaction<\/pre>\n\n\n\n<p>The code initializes two classes, <strong>Account<\/strong> and <strong>Transaction<\/strong>, to manage and handle balance changes in a simplified way. The <strong>Account<\/strong> class represents an account with a balance, while the <strong>Transaction<\/strong> class is responsible for reading and writing balances. The <strong>Transaction<\/strong> class has methods to record initial balances (<strong>read<\/strong>), schedule new balances (<strong>write<\/strong>), and commit changes (<strong>commit<\/strong>). The <strong>commit<\/strong> method checks for conflicts by comparing the current balance with the initially recorded balance; if no discrepancies are found, it applies the new balances. If conflicts are detected, the transaction fails to maintain consistency. In the example, two accounts are created with starting balances, and two transactions are executed. Each transaction reads the current balances, schedules updates, and attempts to commit. Success or failure is printed based on whether the transaction encountered conflicts, demonstrating basic transaction management with conflict detection.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.3whwml4\"><\/a>Pessimistic Concurrency Control (PCC)<\/h3>\n\n\n\n<p><a href=\"https:\/\/embeddedcomputing.com\/technology\/storage\/pessimistic-optimistic-generic-ref-to-images\">Pessimistic Concurrency<\/a> Control is a technique used to manage concurrent transactions by preventing conflicts through locking. In this approach, when a transaction wants to access a resource (such as a row in a database), it locks that resource to prevent other transactions from modifying it until the lock is released. This ensures data integrity and consistency but can lead to reduced performance due to waiting times for lock acquisition.<\/p>\n\n\n\n<p><a id=\"post-103976-_heading=h.2bn6wsx\"><\/a><strong>Steps in PCC:<\/strong><\/p>\n\n\n<div class=\"block-core-list\">\n<ul class=\"wp-block-list\">\n<li>Lock Acquisition: Before accessing a resource, a transaction acquires a lock. If the lock is already held by another transaction, it must wait.<\/li>\n\n\n\n<li>Data Access: The transaction reads or modifies the resource.<\/li>\n\n\n\n<li>Lock Release: After completing its operations, the transaction releases the lock, making the resource available for other transactions.<\/li>\n<\/ul>\n<\/div>\n\n\n<p><a id=\"post-103976-_heading=h.qsh70q\"><\/a><strong>Example:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import threading\n\nclass PessimisticAccount:\ndef __init__(self, balance):\nself.balance = balance\n\n# Initialize a lock to manage concurrent access\nself.lock = threading.Lock()\n\ndef transfer(self, target_account, amount):\n# Acquire the lock on the source account to ensure exclusive access\nwith self.lock:\nprint(f'Lock acquired for {threading.current_thread().name} on source account')\n\n# Acquire the lock on the target account to ensure exclusive access\nwith target_account.lock:\nprint(f'Lock acquired for {threading.current_thread().name} on target account')\n\n# Perform the balance transfer\nself.balance -= amount\ntarget_account.balance += amount\nprint(f'{threading.current_thread().name} transferred {amount} from Source to Target')\n\n# Example usage\naccount_A = PessimisticAccount(1000)\naccount_B = PessimisticAccount(500)\n\ndef transaction1():\n# Transaction 1: Transfer $100 from account_A to account_B\naccount_A.transfer(account_B, 100)\n\ndef transaction2():\n# Transaction 2: Transfer $50 from account_B to account_A\naccount_B.transfer(account_A, 50)\n\n# Create and start threads for concurrent transactions\nthread1 = threading.Thread(name='Transaction1', target=transaction1)\nthread2 = threading.Thread(name='Transaction2', target=transaction2)\n\nthread1.start()\nthread2.start()\n\n# Wait for both threads to complete\nthread1.join()\nthread2.join()\n\n# Print final balances after all transactions\nprint(f'Final Balance A: {account_A.balance}, Final Balance B: {account_B.balance}')\n\n<\/pre>\n\n\n\n<p>In this example, each transaction acquires locks on the accounts it needs to access, ensuring that no other transaction can interfere. This prevents inconsistencies but can cause delays if transactions need to wait for locks to be released. The <strong>PessimisticAccount<\/strong> class represents an account with a balance and a lock to manage concurrent access. When the <strong>transfer<\/strong> method is called, it first acquires the lock on the source account to ensure that no other transaction can access or modify it while the current transaction is in progress. Next, it acquires the lock on the target account to prevent other transactions from interfering with the target account during the transfer. After both locks are obtained, the method transfers funds between the accounts and prints the transaction details. In the example usage, two accounts are initialized, and two functions (<strong>transaction1<\/strong> and <strong>transaction2<\/strong>) simulate concurrent transactions. Threads are created and started to execute these functions concurrently, and the <strong>join<\/strong> method ensures that the main thread waits for both transactions to complete before printing the final balances. This example uses pessimistic locking to prevent concurrent modifications, and delays can occur if transactions are blocked while waiting for locks. Optimistic techniques, such as retrying or versioning, could complement this approach to handle delays and potential deadlocks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.3as4poj\"><\/a>Multiversion Concurrency Control (MVCC)<\/h3>\n\n\n\n<p>Multiversion Concurrency Control (MVCC) allows multiple versions of data to exist simultaneously. This technique improves read performance by allowing reads to occur without waiting for writes to complete. Each transaction sees a snapshot (refers to the state of the database at the time a transaction starts. It is a consistent view of the data that allows transactions to operate without being affected by concurrent changes) of the database at a specific point in time, ensuring consistency without locking.<\/p>\n\n\n\n<p><strong>Example<\/strong>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">class VersionedAccount:\n    def __init__(self, balance):\n        # Initialize with one version of the data\n        self.versions = [(balance, 0)]  # List of (balance, transaction_id) tuples\n\n    def read(self, transaction_id):\n        # Read the balance as of the given transaction ID\n        for balance, txn_id in reversed(self.versions):\n            if txn_id &lt;= transaction_id:\n                return balance\n        return self.versions[0][0]  # Return the initial balance if \n                                    #no versions match\n\n    def write(self, transaction_id, balance):\n        # Write a new version with the updated balance\n        self.versions.append((balance, transaction_id))\n\n# Example usage\naccount_A = VersionedAccount(1000)\naccount_B = VersionedAccount(500)\n\n# Transaction 1\ntxn_id1 = 1\nbalance_A = account_A.read(txn_id1)  # Reads balance as of txn_id1\nbalance_B = account_B.read(txn_id1)  # Reads balance as of txn_id1\naccount_A.write(txn_id1, balance_A - 100)  # Creates a new version with updated \n                                           # balance for Account A\naccount_B.write(txn_id1, balance_B + 100)  # Creates a new version with updated \n                                           # balance for Account B\n\n# Transaction 2\ntxn_id2 = 2\nbalance_B = account_B.read(txn_id2)  # Reads balance as of txn_id2\nbalance_A = account_A.read(txn_id2)  # Reads balance as of txn_id2\naccount_B.write(txn_id2, balance_B - 50)  # Creates a new version with updated \n                                          # balance for Account B\naccount_A.write(txn_id2, balance_A + 50)  # Creates a new version with updated \n                                          # balance for Account A\n\n# Reading the final balance for a new transaction\ntxn_id3 = 3\nfinal_balance_A = account_A.read(txn_id3)  # Reads the latest version of \n                                           # balance for Account A\nfinal_balance_B = account_B.read(txn_id3)  # Reads the latest version of balance \n                                           # for Account B\nprint(f'Final Balance A: {final_balance_A}, Final Balance B: {final_balance_B}')<\/pre>\n\n\n\n<p>In this example, each transaction reads the account balances as of its <strong>transaction ID<\/strong> and writes a new balance, creating a new version. Subsequent transactions see the appropriate version (Each version of the data is associated with a <strong>transaction ID<\/strong>, representing the state of the data after each transaction. New versions are created for every write operation, ensuring that each transaction sees a consistent view of the data as of its transaction ID) of the data based on their transaction ID, ensuring consistency without locking.<\/p>\n\n\n\n<p>The `<strong>VersionedAccount<\/strong>` is initialized with a starting balance and a single version of the data, then the `<strong>read<\/strong>` method returns the balance as of the provided transaction ID. It scans through the versions list in reverse order, checking for versions with a transaction ID less than or equal to the provided ID. This ensures the transaction sees the correct snapshot of the data; the `write` method creates a new version of the data with the updated balance and associates it with the current transaction ID.<\/p>\n\n\n\n<p>This new version is appended to the versions list. `<strong>Transaction 1<\/strong>`reads the initial balance of both accounts, updates the balances, and writes new versions with transaction ID 1. `<strong>Transaction 2`<\/strong> reads the updated balances from the latest versions, makes further updates, and writes new versions with transaction ID 2. `<strong>Transaction 3<\/strong>` reads the final balances from the latest versions, reflecting all updates made by previous transactions. These techniques ensure that transactions are processed in a way that maintains data integrity and consistency, even in complex distributed systems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.1pxezwc\"><\/a>ACID Compliance in NewSQL<\/h2>\n\n\n\n<p>ACID compliance is a crucial aspect of database management, ensuring that transactions are processed reliably and consistently. In the context of NewSQL databases, maintaining ACID properties is paramount to guarantee data integrity, even in distributed environments.<\/p>\n\n\n\n<p>Let&#8217;s look into what ACID compliance means in NewSQL, even though these terms have been introduced a bit deeper earlier in the article..<\/p>\n\n\n\n<p><strong>Atomicity<\/strong>: Atomicity ensures that each transaction is treated as a single, indivisible unit. In NewSQL databases, transactions are either fully completed or fully aborted, ensuring that no partial changes are made to the database.<\/p>\n\n\n\n<p><strong>Consistency<\/strong>: Consistency ensures that the database transitions from one valid state to another valid state after each transaction. NewSQL databases maintain consistency by enforcing integrity constraints and ensuring that transactions adhere to predefined rules.<\/p>\n\n\n\n<p><strong>Isolation<\/strong>: Isolation ensures that transactions are executed independently of each other, preventing interference between concurrent transactions. NewSQL databases employ concurrency control mechanisms to maintain isolation, allowing multiple transactions to occur simultaneously without affecting each other&#8217;s outcomes. Two-Phase Locking (2PL) ensures transactions acquire locks before changing data and release them only after completion, preventing conflicts.<\/p>\n\n\n\n<p><em>Multiversion Concurrency Control (MVCC):<\/em> allows transactions to read from and write to different data versions without locking, ensuring consistent views of data.<\/p>\n\n\n\n<p><em>Optimistic Concurrency Control (OCC):<\/em> lets transactions proceed without locks, checking for conflicts only at commit time and rolling back if inconsistencies are detected. These mechanisms prevent conflicts and inconsistencies by carefully managing how transactions access and modify data.<\/p>\n\n\n\n<p><strong>Durability<\/strong>: Durability guarantees that once a transaction is committed, its changes are permanently stored in the database, even in the event of system failures. NewSQL databases ensure durability by writing transaction logs and data to disk or other persistent storage mediums.<\/p>\n\n\n\n<p>NewSQL databases maintain durability across geographic or computational boundaries by combining transaction logging with distributed replication and consensus protocols. It ensures that committed transactions are reliably stored and can be recovered, preserving data integrity and consistency despite node failures or system crashes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.49x2ik5\"><\/a>Advantages and Limitations of NewSQL<\/h2>\n\n\n\n<p>NewSQL databases have lots of advantages; one of the most significant advantages is their ability to scale horizontally while maintaining ACID compliance. This means that NewSQL databases can handle increasing amounts of data and traffic by adding more servers to the system, rather than just upgrading a single server&#8217;s hardware. This scalability is important for modern applications that experience rapid growth and need to maintain consistent performance and reliability. Additionally, NewSQL databases use advanced algorithms and distributed architectures to ensure data integrity and high availability, which are necessary for enterprise applications that require robust transaction processing.<\/p>\n\n\n\n<p>Another key advantage of NewSQL is its support for SQL, a language that is widely known and used by developers and database administrators. This compatibility allows organizations to leverage their existing SQL knowledge and tools, reducing the learning curve and making it easier to integrate NewSQL databases into their current systems. Furthermore, NewSQL databases offer improved performance by optimizing query execution and minimizing latency. They achieve this through techniques like in-memory processing, distributed transactions, and parallel query execution, which collectively enhance the speed and efficiency of data operations.<\/p>\n\n\n\n<p>However, NewSQL databases are not without their limitations. One of the primary challenges is the complexity involved in setting up and managing a distributed database system. Ensuring data consistency across multiple nodes, handling network partitions, and managing distributed transactions require sophisticated infrastructure and expertise. This complexity can increase the operational overhead and necessitate specialized skills, which may not be readily available in all organizations. Moreover, the cost associated with NewSQL databases can be higher than that of traditional databases, particularly for enterprise-grade solutions that demand high availability and fault tolerance. Licensing fees, advanced hardware requirements, and the need for ongoing maintenance can add to the overall expense, potentially making it a significant investment.<\/p>\n\n\n\n<p>Another limitation of NewSQL is its relative novelty and limited adoption compared to more established database technologies. While it offers many advantages, some organizations may be hesitant to adopt NewSQL due to concerns about compatibility with existing systems, vendor lock-in, and the maturity of the technology. This cautious approach can slow down the widespread adoption of NewSQL, as businesses may prefer to stick with familiar solutions that have a longer track record. Additionally, maintaining strong consistency in a distributed environment can introduce performance trade-offs, as ensuring that all nodes in the system agree on the state of the data may require additional time and resources, potentially impacting overall throughput.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><a id=\"post-103976-_heading=h.2p2csry\"><\/a>Conclusion<\/h2>\n\n\n\n<p>Understanding these advantages and limitations is important for making informed decisions about adopting NewSQL technologies and effectively integrating them into an organization&#8217;s data management strategy. In distributed environments, NewSQL databases guarantee reliable and predictable transactions. They provide durability against system failures, preserve data consistency and integrity, and manage concurrent transactions effectively. NewSQL is, therefore, a solid option for contemporary applications that require strong transactional guarantees in addition to scalability. Devs can build strong and reliable database systems by comprehending and putting these principles into practice.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Companies today require database systems that are reliable and capable of efficiently handling large volumes of data and numerous transactions. Traditional relational databases, once the foundation of data management, often struggle to meet these modern demands, leading to delays and program slowdowns. In response, NewSQL databases, a new class of SQL systems, has emerged. These&#8230;&hellip;<\/p>\n","protected":false},"author":342096,"featured_media":104004,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[137094,53,143539],"tags":[5109,159195],"coauthors":[159002],"class_list":["post-103976","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data","category-featured","category-theory-and-design","tag-google","tag-newsql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/103976","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/342096"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=103976"}],"version-history":[{"count":29,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/103976\/revisions"}],"predecessor-version":[{"id":104010,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/103976\/revisions\/104010"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media\/104004"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=103976"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=103976"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=103976"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=103976"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}