Consistency and Concurrency in NewSQL Database Systems

Companies today require database systems that are reliable and capable of efficiently handling large volumes of data and numerous transactions. Traditional relational databases, once the foundation of data management, often struggle to meet these modern demands, leading to delays and program slowdowns. In response, NewSQL databases, a new class of SQL systems, has emerged. These databases combine the best features of contemporary NoSQL systems and traditional SQL databases, aiming to deliver the scalability and performance associated with NoSQL while maintaining the reliability and consistency of SQL.

NewSQL is a term that describes a new class of databases that aim to deliver the same reliable and predictable performance as traditional SQL databases but with the scalability needed for modern applications. These databases are designed to handle large-scale, high-transaction environments, making them ideal for applications that require both high availability and high performance.

This means they can spread their workload across multiple servers without sacrificing the benefits of the structured query language (SQL) and transactional integrity. When you make a transaction in a database, whether it’s transferring money between bank accounts or updating a user profile, you want to ensure that the transaction is completed reliably. This is where ACID properties come in. ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These properties are essential for ensuring that database transactions are processed reliably, and that the database stays in a correct and consistent state even in the face of errors, crashes, or other issues.

Maintaining consistency in distributed environments, where data is spread across multiple servers or data centers is challenging. In such environments, ensuring that all copies of the data remain in sync and reflect the same state at any given time is difficult due to various factors like network latency, server failures, and data replication delays.

NewSQL databases aim to maintain ACID properties, which are critical for ensuring data integrity, while also scaling out across distributed systems, but there are limitations. One of the main challenges is defined in the CAP theorem, which states that a distributed database can only guarantee two out of the following three properties simultaneously: Consistency, Availability, and Partition tolerance. Consistency means that every reader receives the most recent write. Availability means that every request receives a response in a reasonably amount of time, even if some of the data is out of date. Partition tolerance means that the system continues to operate despite network partitions. In practice, this means that achieving strong consistency often requires sacrificing some availability or partition tolerance.

Another challenge is network latency and speed, including the time it takes data to be processed and the time it takes for data to travel across the network. In a distributed system, data must often be synchronized across multiple servers, which can introduce delays. These delays can lead to temporary inconsistencies where different servers have different versions of the data.

Network partitions, where communication between servers is temporarily broken, also pose a challenge. During a partition, some servers may not be able to communicate with others, leading to inconsistent states. Once the partition is resolved, the system must reconcile these differences and ensure all servers are brought back to a consistent state.

Understanding ACID Properties of Transactions

Ensuring these databases work reliably is important, and this is where the ACID properties come in. These properties ensure that database transactions are processed in a reliable and predictable manner. In this section, I will explain each of these properties to help you understand why they are important and how they work.

Atomicity

Atomicity in database terms means that a transaction is an indivisible unit. All steps are either completed in full or not at all. If any part of the transaction fails, the entire transaction fails, and the database is left unchanged. Atomicity prevents partial updates, which could lead to inconsistencies. Even single SQL statements that change just one column in one row will have multiple steps to change and log rows for recovery.

Imagine you are transferring money from one bank account to another. This transaction involves two operations: deducting money from one account and adding it to another. If the transaction only completes the first operation (deducting money) but fails to complete the second (adding money), the money will disappear. Atomicity ensures that this doesn’t happen.

For example, let’s say we have a database of bank accounts, and we want to transfer $200 from Account A to Account B.

BEGIN TRANSACTION;

UPDATE accounts

SET balance = balance - 200

WHERE account_id = 'A';

UPDATE accounts

SET balance = balance + 200

WHERE account_id = 'B';

COMMIT;

In this example, the transaction starts with:

BEGIN TRANSACTION.

UPDATE accounts

SET balance = balance - 200

WHERE account_id = ‘A’;

Which deducts $200 from Account A. while:

UPDATE accounts

SET balance = balance - 100

WHERE account_id = ‘B’;

Adds $200 to Account B. If any part of this transaction fails:

COMMIT

Will not be executed; it finalizes the transaction, and the database will roll back to its previous state, ensuring atomicity. Note that you may need additional error handling, depending on the client tools that are executing these statements.

Consistency

Consistency ensures that a transaction brings the database from one valid state to another, adhering to all predefined rules, such as constraints, cascades, and triggers. This means that any transaction will leave the database in a valid state. Consistency is important for maintaining the integrity of the database. Consistency prevents data from becoming corrupt or illogical. Sadly, it only keeps it as consistent as the rules you implement!

For instance, if a database has a rule that the balance of any bank account cannot be negative, a transaction that would result in a negative balance should be rejected.

Continuing with our bank example, let’s enforce a rule that no account can have a negative balance.

BEGIN TRANSACTION;

-- Deduct $200 from Account A

UPDATE accounts

SET balance = balance - 200

WHERE account_id = 'A';

-- Check if Account A's balance is still

-- non-negative after the deduction

IF (SELECT balance FROM accounts WHERE account_id = 'A') >= 0 THEN

-- If the balance is non-negative, add $200 to Account B

UPDATE accounts

SET balance = balance + 200

WHERE account_id = 'B';

-- Finalize the transaction and make changes permanent

COMMIT;

ELSE

-- If the balance is negative, undo the

-- transaction to prevent inconsistency

ROLLBACK;

END IF;

In the example above, the transaction begins with the BEGIN TRANSACTION command, which marks the start of a series of operations that need to be treated as a single unit. The first operation is:

UPDATE accounts

SET balance = balance - 200

WHERE account_id = 'A';

Which deducts $200 from Account A. After this deduction, the transaction checks if the balance of Account A remains non-negative with the condition

1	IF (SELECT balance FROM accounts WHERE account_id = 'A') >= 0.

If this condition is true, meaning Account A still has a non-negative balance, the subsequent operation

UPDATE accounts

SET balance = balance + 200

WHERE account_id = 'B';

is performed, adding $200 to Account B, and the transaction is finalized with COMMIT.

If the condition is false, indicating that the balance would be negative. In that case, the transaction is aborted with ROLLBACK, undoing any changes made during the transaction and thus maintaining the database’s consistency by preventing an invalid state where an account balance is negative. However, it’s important to note that under certain isolation levels, there is no guarantee that Account A’s balance will remain non-negative by the time the second UPDATE occurs. This is because other concurrent transactions could modify the balance of Account A after the initial check but before the UPDATE.

Isolation

Isolation actively shields concurrent transactions from interfering with each other. This ensures each transaction runs as if it’s the only one happening, even when multiple transactions occur simultaneously. Without isolation, concurrent transactions could cause inconsistencies. For example, if two people simultaneously attempt to withdraw money from the same bank account, both transactions might read the same initial balance, leading to an overdraft. Isolation protects against such issues by managing how transactions interact, allowing you to fine-tune the level of interference permitted between them. This fine-tuning is important for maintaining data consistency and integrity in systems where multiple transactions occur simultaneously.

Consider two transactions attempting to withdraw $200 from Account A simultaneously.

-- Transaction 1

BEGIN TRANSACTION;

-- Lock the row for Account A to prevent other

-- transactions from modifying it until this

-- transaction completes.

SELECT balance

FROM accounts

WHERE account_id = 'A' FOR UPDATE;

-- Deduct $200 from Account A

UPDATE accounts

SET balance = balance - 200

WHERE account_id = 'A';

-- Finalize the transaction and make changes permanent

COMMIT;

-- Transaction 2 (a different connection

BEGIN TRANSACTION;

-- Attempt to lock the same row for Account A.

-- This will wait if Transaction 1 hasn’t finished yet.

SELECT balance FROM accounts WHERE account_id = 'A' FOR UPDATE;

-- Deduct $200 from Account A after the lock is acquired

UPDATE accounts

SET balance = balance - 200

WHERE account_id = 'A';

-- Finalize the transaction and make changes permanent

COMMIT;

In this example, the process begins with initiating each transaction using the BEGIN TRANSACTION command. Following this, the database locks the row corresponding to Account A using

SELECT balance

FROM accounts

WHERE account_id = 'A' FOR UPDATE;<code>

This lock ensures that only one transaction at a time can access and modify the balance of Account A, thereby preventing potential conflicts and maintaining isolation between concurrent transactions. Next, the transaction deducts $200 from Account A with:

UPDATE accounts

SET balance = balance - 200

WHERE account_id = 'A';.

Once the deduction is made, the transaction is finalized and committed using COMMIT, ensuring that all changes are permanently saved in the database.

Durability

Durability guarantees that once a transaction is committed, it will remain so, even in the event of a system crash. This guarantee is typically achieved through logging changes that have been made in the data file. Durability is important for ensuring that data is not lost once a transaction is completed. For instance, once the bank confirms a money transfer, it should be permanent, even if the system crashes immediately after.

Of course, this only works if the crashed system does not lose its data and log drives simultaneously. Then, you need to have backups to go back to a point in time, and you likely lose very recent transactions.

Let’s discuss how the engine ensures that our bank transactions are durable.

-- Start the transaction

BEGIN TRANSACTION;

-- Deduct $200 from Account A

UPDATE accounts

SET balance = balance - 200

WHERE account_id = 'A';

-- Add $200 to Account B

UPDATE accounts

SET balance = balance + 200

WHERE account_id = 'B';

-- Commit the transaction to make changes permanent

COMMIT;

Here, BEGIN TRANSACTION starts a new transaction, signalling that all following actions should be treated as part of this single transaction. The code continues as:

UPDATE accounts

SET balance = balance - 200

WHERE account_id = ‘A’

The statement deducts $200 from Account A, and then:

UPDATE accounts

SET balance = balance + 200

WHERE account_id = ‘B’

Will add $200 to Account B. These changes are temporarily staged and will not be visible to others until COMMIT is executed, which finalizes the transaction and makes all changes permanent. Internally, databases use Write-Ahead Logging (WAL) to ensure durability. WAL involves writing changes to a log file before updating the actual database files. This log is stored on durable storage, allowing the database to recover and apply committed transactions even if a failure occurs. In SQL Server, for example, this process is handled automatically by the system, so users do not need to manually insert log records. This mechanism ensures that all changes made in a transaction are preserved and will survive system failures.

NewSQL Architecture

In this section, let’s look at the key components of NewSQL architecture.

Key components of NewSQL architecture

NewSQL databases are built on several core components that enable them to handle large-scale, high-performance workloads while maintaining ACID properties. Here are the key components:

Distributed Data Storage: One of the fundamental aspects of NewSQL databases is distributed data storage. Instead of storing all data on a single server, NewSQL databases distribute data across multiple nodes or servers. This distribution allows the system to balance the load and scale horizontally, meaning more servers can be added to handle increasing amounts of data and traffic.

Automatic Sharding: Sharding is the process of splitting a database into smaller, more manageable pieces called shards. NewSQL databases automatically handle sharding, distributing data across nodes to optimize performance and storage.

Distributed Transaction Management: NewSQL databases handle transactions in a distributed manner while maintaining ACID properties. They often use sophisticated algorithms like Paxos or Raft for distributed consensus, ensuring that all nodes agree on the transaction’s outcome before it is committed.

Concurrency Control: To manage simultaneous transactions, NewSQL databases employ advanced concurrency control mechanisms. Techniques like optimistic concurrency control and multi-version concurrency control (MVCC) helps prevent conflicts and ensure that transactions are processed smoothly without interfering with each other.

Replication: Replication is crucial for high availability and fault tolerance. NewSQL databases replicate data across multiple nodes so that if one node fails, others can continue to provide access to the data. This replication can be synchronous or asynchronous, depending on the specific needs of the application.

Examples of NewSQL Database Systems

Two examples of NewSQL databases are Google Spanner and CockroachDB. Let’s look at each of these in detail.

Google Spanner

Google Spanner is a globally distributed NewSQL database that provides strong consistency and horizontal scalability. It uses Google’s TrueTime API to achieve external consistency, which ensures that transactions are consistently ordered across distributed nodes. This makes it possible to have a single, globally consistent view of the data.

Key Features

Global Distribution: Spanner can distribute data across data centers worldwide, providing low-latency access to data from any location.
Strong Consistency: Using the TrueTime API, Spanner ensures that all nodes agree on the exact order of transactions, maintaining strong consistency.
Scalability: Spanner can handle massive amounts of data and high transaction volumes, making it ideal for large-scale applications.
Works Like Relational Databases: Supports the relational model, making it less of a learning curve for users of typical relational databases. Supports both Google SQL, and PostgreSQL’s SQL Dialect. For more detail, see this article in the Google Cloud documentation.

CockroachDB

CockroachDB is an open-source NewSQL database designed to be globally distributed and resilient to failures. It uses a distributed consensus protocol based on Raft to manage transaction coordination and maintain consistency across nodes. CockroachDB is known for its ease of deployment and ability to scale horizontally.

Key Features

Resilience to Failures: CockroachDB is designed to handle node failures gracefully, ensuring that data remains accessible and consistent.
Horizontal Scalability: The database can scale out by adding more nodes, distributing data, and load to maintain performance.
Strong Consistency: CockroachDB uses the Raft consensus algorithm to ensure that all nodes agree on the state of the database, maintaining strong consistency.
Standard SQL Support: Supports a wide range of SQL statements that developers and users will be familiar with.
Global Distribution: Allows for multiple region deployment and geo-partitioning for applications that have users around the world.

Consistency in Distributed Environments

Data isn’t stored in just one place. Instead, it’s spread across multiple servers, sometimes scattered around the globe. This distribution helps with scaling and performance but introduces a significant challenge: maintaining consistency. Consistency in a distributed environment means ensuring that all copies of the data reflect the same information, no matter where they are or how many transactions are happening at once. Consistency in the context of distributed systems ensures that any read operation retrieves the most recent write operation’s result. If you update a piece of data, all subsequent reads should reflect this update, regardless of where they happen in the system. However, achieving this in a distributed setup is not straightforward due to various factors like network delays, server failures, and concurrent transactions.

Types of Consistency

Strong Consistency: Strong consistency guarantees that after a write is acknowledged, all subsequent reads will reflect that write. This type of consistency is the easiest to reason about but can be challenging to achieve in a distributed environment.

Eventual Consistency: Eventual consistency guarantees that, given enough time, all copies of the data will converge to the same value. It doesn’t guarantee immediate consistency but ensures that the data will be consistent eventually. This is often used in systems where high availability is more critical than immediate consistency.

Causal Consistency: Causal consistency is about preserving the order of operations based on their causal relationships. It ensures that if one operation causally influences another, the system maintains this sequence, but it does not require a strict global order across all operations.

Several factors make maintaining consistency in distributed environments difficult:

Network Latency: The time it takes for data to travel across the network can cause delays in synchronization.
Partition Tolerance: In a distributed system, network partitions can occur, isolating parts of the system. Ensuring consistency during partitions is challenging.
Concurrency: Multiple transactions occurring simultaneously can lead to conflicts and inconsistencies if not managed properly.

Ways To Ensure Consistency

In this section I will cover several of the major tactics that are used to ensure consistency when saving data in more than one location.

Two-Phase Commit (2PC)

The Two-Phase Commit protocol is a classic algorithm used to ensure atomicity and consistency across distributed systems. It involves two phases: the prepare phase and the commit phase.

Prepare Phase: The coordinator asks all participating nodes if they can commit the transaction.
Commit Phase: If all nodes agree, the coordinator sends a commit message. Otherwise, it sends a rollback message.

Example:

Let’s say we have a distributed banking system where we need to transfer money between accounts on different servers.

class TwoPhaseCommit:

def __init__(self):

# Initialize the coordinator with an empty list of participants

self.participants = []

def add_participant(self, participant):

# Add a participant (server or database) to the coordinator list

self.participants.append(participant)

def prepare(self):

# Phase 1: Prepare each participant for committing

for participant in self.participants:

if not participant.prepare():

# If any participant cannot prepare, roll back and return false

self.rollback()

return False

# All participants prepared successfully, return true

return True

def commit(self):

# Phase 2: Commit all participants if preparation was successful

for participant in self.participants:

participant.commit()

def rollback(self):

# Rollback all participants in case of failure during preparation

for participant in self.participants:

participant.rollback()

class Participant:

def __init__(self, name):

# Initialize participant with a name and an initial state

self.name = name

self.state = 'INIT'

def prepare(self):

# Prepare the participant for committing

print(f'{self.name}: Preparing...')

self.state = 'PREPARED'

return True # Assume preparation always succeeds in this example

def commit(self):

# Commit the participant's transaction

print(f'{self.name}: Committing...')

self.state = 'COMMITTED'

def rollback(self):

# Roll back the participant's transaction

print(f'{self.name}: Rolling back...')

self.state = 'ROLLED BACK'

# Example usage

coordinator = TwoPhaseCommit()

participant1 = Participant('Server1')

participant2 = Participant('Server2')

# Add participants to the coordinator

coordinator.add_participant(participant1)

coordinator.add_participant(participant2)

# Start the two-phase commit process

if coordinator.prepare():

# If all participants are prepared, commit the transaction

coordinator.commit()

else:

# If any participant fails preparation, roll back the transaction

coordinator.rollback()

In this example, the TwoPhaseCommit class coordinates a distributed transaction by managing a list of participants, which are represented by the Participant class. During initialization, the TwoPhaseCommit object starts with an empty participant list, while each Participant begins in the ‘INIT’ state.

When participants are added using add_participant(), the coordinator initiates the prepare phase with coordinator.prepare(). Each participant responds by setting its state to ‘PREPARED’ if it can commit; otherwise, it returns False, prompting the coordinator to call rollback() and abort the transaction. If all participants are prepared, the coordinator proceeds to the commit phase with coordinator.commit(), where each participant finalizes the transaction by changing its state to ‘COMMITTED’.

If any participant fails preparation, the coordinator will roll back the transaction, calling rollback() for each participant to revert their state to ‘ROLLED BACK’. The TwoPhaseCommit Class manages the 2PC protocol, coordinating prepare, commit, and rollback operations. The Participant Class represents a node in the distributed system. Each participant can prepare, commit, or rollback a transaction. Example Usage creates a coordinator and two participants. The coordinator prepares the transaction. If all participants are ready, it commits the transaction; otherwise, it rolls back.

Consensus Algorithms

Consensus algorithms like Paxos and Raft are used to ensure consistency in distributed systems by achieving agreement among distributed nodes. Paxos ensures that a majority of nodes agree on the same value, which is important for maintaining consistency, while Raft provides a more straightforward and practical approach to consensus, which can lead to fewer implementation issues and better overall performance. For end users, this means that data should not be lost, and consistency should be maintained, but there might be delays if network issues or leader failures occur.

Example:

Here’s an example of a consensus algorithm using the Raft protocol.

import random

# RaftNode represents an individual node in a Raft cluster

class RaftNode:

def __init__(self, name):

self.name = name # Unique identifier for the node

self.term = 0 # Current term of the node (logical time period)

self.voted_for = None # Keeps track of the candidate the node voted

# for in the current term

self.state = 'FOLLOWER' # Nodes state: Follower, Candidate, or Leader

self.log = [] # Log entries that the node maintains (not

# used in this simple example)

# Method to request a vote from this node

def request_vote(self, candidate_term):

# Node grants its vote if the candidates term is higher than the

# nodes current term

if candidate_term > self.term:

self.term = candidate_term # Update the nodes term to the

# candidate's term

self.voted_for = 'CANDIDATE' # Record that this node has voted

#for the candidate

return True

return False # If the candidate term is not higher, do not grant vote

# RaftCluster represents the entire Raft cluster composed of multiple nodes

class RaftCluster:

def __init__(self):

self.nodes = [] # List to hold all nodes in the cluster

# Method to add a node to the cluster

def add_node(self, node):

self.nodes.append(node)

# Method to start an election process in the cluster

def start_election(self):

candidate = random.choice(self.nodes) # Randomly select a candidate

# from the nodes

candidate.state = 'CANDIDATE' # Change the candidate's state to CANDIDATE

candidate.term += 1 # Increment the candidate's term as it

# starts a new election

votes = 0 # Counter to track the number of votes received

# Loop through all nodes in the cluster to request votes

for node in self.nodes:

if node.request_vote(candidate.term): # If a node grants a vote

votes += 1 # Increment the vote counter

# If the candidate receives more than half of the

# votes, it becomes the leader

if votes > len(self.nodes) / 2:

candidate.state = 'LEADER' # Update the candidate's state to LEADER

print(f'{candidate.name} is elected as leader with term {candidate.term}')

else:

# If the candidate fails to get a majority of votes,

# it remains a candidate or follower

print(f'{candidate.name} failed to become leader')

# Example usage of the RaftCluster and RaftNode classes

cluster = RaftCluster()

node1 = RaftNode('Node1')

node2 = RaftNode('Node2')

node3 = RaftNode('Node3')

# Adding nodes to the cluster

cluster.add_node(node1)

cluster.add_node(node2)

cluster.add_node(node3)

# Starting an election in the cluster

cluster.start_election()

The Raft algorithm is a way for multiple computers (or nodes) in a network to agree on a leader who will manage tasks for everyone. In this example, each computer is represented by a “RaftNode,” which has a “term” (the current election period), “voted_for” (who it voted for), and “state” (whether it is a ‘FOLLOWER’, ‘CANDIDATE’, or ‘LEADER’). All the nodes are managed by a “RaftCluster” class, which can add nodes to the group and start an election. When an election starts, one node is randomly picked to be a candidate; it then increases its term and asks the other nodes for votes. If it gets more than half of the votes, it becomes the leader; if not, the election fails, and another election might be needed later. Normally, each node would run on a different machine and communicate over a network. The leader ensures that all nodes follow the same instructions and records.

Quorum-based Approaches

Quorum-based approaches ensure that a transaction is committed only if a minimum number of nodes (a quorum) agree on it. This means a user doesn’t have to wait for every node to agree before seeing their action (like a post or an item added to a shopping cart). The system responds quickly as soon as the quorum is reached, providing a balance between speed and consistency. If not, all nodes agree or some are temporarily down, the system can still proceed as long as the quorum is achieved. This is why you might see your action reflected immediately, even if some nodes are lagging. However, in such cases, there is a small risk of inconsistency, like losing an item in your shopping cart. This approach is faster but may sacrifice some consistency in favor of a smoother user experience.

# Class representing a simple quorum system for managing distributed writes

class QuorumSystem:

def __init__(self, nodes):

self.nodes = nodes # List of nodes (servers/databases) in the system

# Method to write data to the quorum system

def write(self, data):

acknowledgements = 0 # Count of nodes that acknowledge the write

for node in self.nodes:

# Attempt to write data to each node

if node.write(data):

acknowledgements += 1 # Increment if the write is successful

# Check if a majority (quorum) of nodes acknowledged the write

if acknowledgements >= len(self.nodes) // 2 + 1:

print('Write successful') # Quorum achieved, write is successful

else:

print('Write failed') # Quorum not achieved, write failed

# Class representing an individual node (server/database) in the system

class Node:

def __init__(self, name):

self.name = name # Name of the node for identification

self.data = None # Placeholder for data storage

# Method to simulate writing data to the node

def write(self, data):

self.data = data # Store the data in the node

print(f'{self.name}: Write {data}') # Print the write action for

# demonstration

return True # Assume write is always successful for simplicity

# Example usage

nodes = [Node('Node1'), Node('Node2'), Node('Node3')] # Create a list of nodes

quorum_system = QuorumSystem(nodes) # Initialize the quorum system with the nodes

# Write data to the quorum system, which attempts to write to all nodes

quorum_system.write('Hello World') # 'Hello World' is the data being written

Here, the QuorumSystem class is a simple system that manages writing data across multiple servers or databases, called nodes. It makes sure that a write operation is only considered successful if a majority of the nodes (more than half) confirm that they received the data. The Node class represents each server or database in this system, and each node can write data to itself. In this example, the write method of each node just prints the data being written and always returns True to indicate success. When the write method of the QuorumSystem is called, it tries to write the data to all its nodes. It counts how many nodes acknowledge (or confirm) the write. If more than half of the nodes (a quorum) acknowledge it, the write is considered successful, and it prints “Write successful”; otherwise, it prints “Write failed.” The process involves initializing the quorum system with three nodes, attempting to write the data to all nodes, checking if a majority of nodes acknowledge the write, and then printing the result based on whether the quorum is achieved or not.

Concurrency Control Mechanisms

Concurrency control refers to the management of simultaneous operations on a database without conflicting with each other. The goal is to ensure that transactions are executed so that the end result is correct and consistent, even if they occur concurrently.

Techniques

In this section I will cover some of the techniques that are applied to data processing to ensure that access to data is isolated on different connections in such a way that we can be certain that changes to data doesn’t corrupt anyone’s view.

Optimistic Concurrency Control (OCC)

Optimistic concurrency control allows transactions to move forward without locks, assuming conflicts are rare. At commit time, it checks for conflicts. If a conflict is detected, the transaction is rolled back and may be retried. OCC focuses on detecting conflicts after transactions execute, leading to potential rollbacks.

Steps in OCC

Read Phase: The transaction reads the data it needs without acquiring any locks.
Validation Phase: Before committing, the transaction checks if the data it read has been modified by another transaction.
Write Phase: If no conflict is detected, the transaction writes its changes to the database. If a conflict is detected, the transaction is rolled back and retried.

Let’s use the bank example to demonstrate OCC.

class Account:

def __init__(self, balance):

self.balance = balance

class Transaction:

def __init__(self):

# Initialize read and write sets

self.read_set = {} # Keeps track of the initial balances

# read during the transaction

self.write_set = {} # Keeps track of the new balances to be written

def read(self, account):

# Record the current balance of the account

self.read_set[account] = account.balance

return account.balance

def write(self, account, amount):

# Schedule a new balance for the account

self.write_set[account] = amount

def commit(self):

# Check if the balance has changed since the read operation

for account, initial_balance in self.read_set.items():

if account.balance != initial_balance:

print('Transaction failed due to conflict')

return False # Transaction failed due to a conflict

# Apply all scheduled writes

for account, new_balance in self.write_set.items():

account.balance = new_balance

print('Transaction committed successfully')

return True # Transaction committed successfully

# Example usage

account_A = Account(1000) # Initialize account A with $1000

account_B = Account(500) # Initialize account B with $500

# Transaction 1

transaction1 = Transaction() # Create a new transaction

balance_A = transaction1.read(account_A) # Read initial balance of account A

balance_B = transaction1.read(account_B) # Read initial balance of account B

# Schedule $100 deduction from account A

transaction1.write(account_A, balance_A - 100)

# Schedule $100 addition to account B

transaction1.write(account_B, balance_B + 100)

transaction1.commit() # Attempt to commit the transaction

# Transaction 2

transaction2 = Transaction() # Create another new transaction

balance_B = transaction2.read(account_B) # Read initial balance of account B

balance_A = transaction2.read(account_A) # Read initial balance of account A

# Schedule $50 deduction from account B

transaction2.write(account_B, balance_B - 50)

# Schedule $50 addition to account A

transaction2.write(account_A, balance_A + 50)

transaction2.commit() # Attempt to commit the transaction

The code initializes two classes, Account and Transaction, to manage and handle balance changes in a simplified way. The Account class represents an account with a balance, while the Transaction class is responsible for reading and writing balances. The Transaction class has methods to record initial balances (read), schedule new balances (write), and commit changes (commit). The commit method checks for conflicts by comparing the current balance with the initially recorded balance; if no discrepancies are found, it applies the new balances. If conflicts are detected, the transaction fails to maintain consistency. In the example, two accounts are created with starting balances, and two transactions are executed. Each transaction reads the current balances, schedules updates, and attempts to commit. Success or failure is printed based on whether the transaction encountered conflicts, demonstrating basic transaction management with conflict detection.

Pessimistic Concurrency Control (PCC)

Pessimistic Concurrency Control is a technique used to manage concurrent transactions by preventing conflicts through locking. In this approach, when a transaction wants to access a resource (such as a row in a database), it locks that resource to prevent other transactions from modifying it until the lock is released. This ensures data integrity and consistency but can lead to reduced performance due to waiting times for lock acquisition.

Steps in PCC:

Lock Acquisition: Before accessing a resource, a transaction acquires a lock. If the lock is already held by another transaction, it must wait.
Data Access: The transaction reads or modifies the resource.
Lock Release: After completing its operations, the transaction releases the lock, making the resource available for other transactions.

Example:

import threading

class PessimisticAccount:

def __init__(self, balance):

self.balance = balance

# Initialize a lock to manage concurrent access

self.lock = threading.Lock()

def transfer(self, target_account, amount):

# Acquire the lock on the source account to ensure exclusive access

with self.lock:

print(f'Lock acquired for {threading.current_thread().name} on source account')

# Acquire the lock on the target account to ensure exclusive access

with target_account.lock:

print(f'Lock acquired for {threading.current_thread().name} on target account')

# Perform the balance transfer

self.balance -= amount

target_account.balance += amount

print(f'{threading.current_thread().name} transferred {amount} from Source to Target')

# Example usage

account_A = PessimisticAccount(1000)

account_B = PessimisticAccount(500)

def transaction1():

# Transaction 1: Transfer $100 from account_A to account_B

account_A.transfer(account_B, 100)

def transaction2():

# Transaction 2: Transfer $50 from account_B to account_A

account_B.transfer(account_A, 50)

# Create and start threads for concurrent transactions

thread1 = threading.Thread(name='Transaction1', target=transaction1)

thread2 = threading.Thread(name='Transaction2', target=transaction2)

thread1.start()

thread2.start()

# Wait for both threads to complete

thread1.join()

thread2.join()

# Print final balances after all transactions

print(f'Final Balance A: {account_A.balance}, Final Balance B: {account_B.balance}')

In this example, each transaction acquires locks on the accounts it needs to access, ensuring that no other transaction can interfere. This prevents inconsistencies but can cause delays if transactions need to wait for locks to be released. The PessimisticAccount class represents an account with a balance and a lock to manage concurrent access. When the transfer method is called, it first acquires the lock on the source account to ensure that no other transaction can access or modify it while the current transaction is in progress. Next, it acquires the lock on the target account to prevent other transactions from interfering with the target account during the transfer. After both locks are obtained, the method transfers funds between the accounts and prints the transaction details. In the example usage, two accounts are initialized, and two functions (transaction1 and transaction2) simulate concurrent transactions. Threads are created and started to execute these functions concurrently, and the join method ensures that the main thread waits for both transactions to complete before printing the final balances. This example uses pessimistic locking to prevent concurrent modifications, and delays can occur if transactions are blocked while waiting for locks. Optimistic techniques, such as retrying or versioning, could complement this approach to handle delays and potential deadlocks.

Multiversion Concurrency Control (MVCC)

Multiversion Concurrency Control (MVCC) allows multiple versions of data to exist simultaneously. This technique improves read performance by allowing reads to occur without waiting for writes to complete. Each transaction sees a snapshot (refers to the state of the database at the time a transaction starts. It is a consistent view of the data that allows transactions to operate without being affected by concurrent changes) of the database at a specific point in time, ensuring consistency without locking.

Example:

class VersionedAccount:

def __init__(self, balance):

# Initialize with one version of the data

self.versions = [(balance, 0)] # List of (balance, transaction_id) tuples

def read(self, transaction_id):

# Read the balance as of the given transaction ID

for balance, txn_id in reversed(self.versions):

if txn_id <= transaction_id:

return balance

return self.versions[0][0] # Return the initial balance if

#no versions match

def write(self, transaction_id, balance):

# Write a new version with the updated balance

self.versions.append((balance, transaction_id))

# Example usage

account_A = VersionedAccount(1000)

account_B = VersionedAccount(500)

# Transaction 1

txn_id1 = 1

balance_A = account_A.read(txn_id1) # Reads balance as of txn_id1

balance_B = account_B.read(txn_id1) # Reads balance as of txn_id1

account_A.write(txn_id1, balance_A - 100) # Creates a new version with updated

# balance for Account A

account_B.write(txn_id1, balance_B + 100) # Creates a new version with updated

# balance for Account B

# Transaction 2

txn_id2 = 2

balance_B = account_B.read(txn_id2) # Reads balance as of txn_id2

balance_A = account_A.read(txn_id2) # Reads balance as of txn_id2

account_B.write(txn_id2, balance_B - 50) # Creates a new version with updated

# balance for Account B

account_A.write(txn_id2, balance_A + 50) # Creates a new version with updated

# balance for Account A

# Reading the final balance for a new transaction

txn_id3 = 3

final_balance_A = account_A.read(txn_id3) # Reads the latest version of

# balance for Account A

final_balance_B = account_B.read(txn_id3) # Reads the latest version of balance

# for Account B

print(f'Final Balance A: {final_balance_A}, Final Balance B: {final_balance_B}')

In this example, each transaction reads the account balances as of its transaction ID and writes a new balance, creating a new version. Subsequent transactions see the appropriate version (Each version of the data is associated with a transaction ID, representing the state of the data after each transaction. New versions are created for every write operation, ensuring that each transaction sees a consistent view of the data as of its transaction ID) of the data based on their transaction ID, ensuring consistency without locking.

The `VersionedAccount` is initialized with a starting balance and a single version of the data, then the `read` method returns the balance as of the provided transaction ID. It scans through the versions list in reverse order, checking for versions with a transaction ID less than or equal to the provided ID. This ensures the transaction sees the correct snapshot of the data; the `write` method creates a new version of the data with the updated balance and associates it with the current transaction ID.

This new version is appended to the versions list. `Transaction 1`reads the initial balance of both accounts, updates the balances, and writes new versions with transaction ID 1. `Transaction 2` reads the updated balances from the latest versions, makes further updates, and writes new versions with transaction ID 2. `Transaction 3` reads the final balances from the latest versions, reflecting all updates made by previous transactions. These techniques ensure that transactions are processed in a way that maintains data integrity and consistency, even in complex distributed systems.

ACID Compliance in NewSQL

ACID compliance is a crucial aspect of database management, ensuring that transactions are processed reliably and consistently. In the context of NewSQL databases, maintaining ACID properties is paramount to guarantee data integrity, even in distributed environments.

Let’s look into what ACID compliance means in NewSQL, even though these terms have been introduced a bit deeper earlier in the article..

Atomicity: Atomicity ensures that each transaction is treated as a single, indivisible unit. In NewSQL databases, transactions are either fully completed or fully aborted, ensuring that no partial changes are made to the database.

Consistency: Consistency ensures that the database transitions from one valid state to another valid state after each transaction. NewSQL databases maintain consistency by enforcing integrity constraints and ensuring that transactions adhere to predefined rules.

Isolation: Isolation ensures that transactions are executed independently of each other, preventing interference between concurrent transactions. NewSQL databases employ concurrency control mechanisms to maintain isolation, allowing multiple transactions to occur simultaneously without affecting each other’s outcomes. Two-Phase Locking (2PL) ensures transactions acquire locks before changing data and release them only after completion, preventing conflicts.

Multiversion Concurrency Control (MVCC): allows transactions to read from and write to different data versions without locking, ensuring consistent views of data.

Optimistic Concurrency Control (OCC): lets transactions proceed without locks, checking for conflicts only at commit time and rolling back if inconsistencies are detected. These mechanisms prevent conflicts and inconsistencies by carefully managing how transactions access and modify data.

Durability: Durability guarantees that once a transaction is committed, its changes are permanently stored in the database, even in the event of system failures. NewSQL databases ensure durability by writing transaction logs and data to disk or other persistent storage mediums.

NewSQL databases maintain durability across geographic or computational boundaries by combining transaction logging with distributed replication and consensus protocols. It ensures that committed transactions are reliably stored and can be recovered, preserving data integrity and consistency despite node failures or system crashes.

Advantages and Limitations of NewSQL

NewSQL databases have lots of advantages; one of the most significant advantages is their ability to scale horizontally while maintaining ACID compliance. This means that NewSQL databases can handle increasing amounts of data and traffic by adding more servers to the system, rather than just upgrading a single server’s hardware. This scalability is important for modern applications that experience rapid growth and need to maintain consistent performance and reliability. Additionally, NewSQL databases use advanced algorithms and distributed architectures to ensure data integrity and high availability, which are necessary for enterprise applications that require robust transaction processing.

Another key advantage of NewSQL is its support for SQL, a language that is widely known and used by developers and database administrators. This compatibility allows organizations to leverage their existing SQL knowledge and tools, reducing the learning curve and making it easier to integrate NewSQL databases into their current systems. Furthermore, NewSQL databases offer improved performance by optimizing query execution and minimizing latency. They achieve this through techniques like in-memory processing, distributed transactions, and parallel query execution, which collectively enhance the speed and efficiency of data operations.

However, NewSQL databases are not without their limitations. One of the primary challenges is the complexity involved in setting up and managing a distributed database system. Ensuring data consistency across multiple nodes, handling network partitions, and managing distributed transactions require sophisticated infrastructure and expertise. This complexity can increase the operational overhead and necessitate specialized skills, which may not be readily available in all organizations. Moreover, the cost associated with NewSQL databases can be higher than that of traditional databases, particularly for enterprise-grade solutions that demand high availability and fault tolerance. Licensing fees, advanced hardware requirements, and the need for ongoing maintenance can add to the overall expense, potentially making it a significant investment.

Another limitation of NewSQL is its relative novelty and limited adoption compared to more established database technologies. While it offers many advantages, some organizations may be hesitant to adopt NewSQL due to concerns about compatibility with existing systems, vendor lock-in, and the maturity of the technology. This cautious approach can slow down the widespread adoption of NewSQL, as businesses may prefer to stick with familiar solutions that have a longer track record. Additionally, maintaining strong consistency in a distributed environment can introduce performance trade-offs, as ensuring that all nodes in the system agree on the state of the data may require additional time and resources, potentially impacting overall throughput.

Conclusion

Understanding these advantages and limitations is important for making informed decisions about adopting NewSQL technologies and effectively integrating them into an organization’s data management strategy. In distributed environments, NewSQL databases guarantee reliable and predictable transactions. They provide durability against system failures, preserve data consistency and integrity, and manage concurrent transactions effectively. NewSQL is, therefore, a solid option for contemporary applications that require strong transactional guarantees in addition to scalability. Devs can build strong and reliable database systems by comprehending and putting these principles into practice.

Register for Simple Talk

Consistency and Concurrency in NewSQL Database Systems

Understanding ACID Properties of Transactions

Atomicity

Consistency

Isolation

Durability

NewSQL Architecture

Key components of NewSQL architecture

Examples of NewSQL Database Systems

Google Spanner

CockroachDB

Consistency in Distributed Environments

Types of Consistency

Ways To Ensure Consistency

Two-Phase Commit (2PC)

Consensus Algorithms

Quorum-based Approaches

Concurrency Control Mechanisms

Techniques

Optimistic Concurrency Control (OCC)

Pessimistic Concurrency Control (PCC)

Multiversion Concurrency Control (MVCC)

ACID Compliance in NewSQL

Advantages and Limitations of NewSQL

Conclusion

Article tags

About the author

Chisom Kanu

Chisom's contributions

Articles

Books

Top topics

Chisom's latest contributions:

MySQL Shell Basic Configuration Management (Part 4 – Optimizing MySQL Performance)

MySQL Shell Basic Configuration Management (Part 3 – Disk I/O and Storage Optimization)

MySQL Shell Basic Configuration Management (Part 2 – Memory and CPU)

Understanding ACID Properties of Transactions

Atomicity

Consistency

Isolation

Durability

NewSQL Architecture

Key components of NewSQL architecture

Examples of NewSQL Database Systems

Google Spanner

CockroachDB

Consistency in Distributed Environments

Types of Consistency

Ways To Ensure Consistency

Two-Phase Commit (2PC)

Consensus Algorithms

Quorum-based Approaches

Concurrency Control Mechanisms

Techniques

Optimistic Concurrency Control (OCC)

Pessimistic Concurrency Control (PCC)

Multiversion Concurrency Control (MVCC)

ACID Compliance in NewSQL

Advantages and Limitations of NewSQL

Conclusion

Article tags

Recommended

About the author

Chisom Kanu

Chisom's contributions

Articles

Books

Top topics

Chisom's latest contributions:

MySQL Shell Basic Configuration Management (Part 4 – Optimizing MySQL Performance)

MySQL Shell Basic Configuration Management (Part 3 – Disk I/O and Storage Optimization)

MySQL Shell Basic Configuration Management (Part 2 – Memory and CPU)