Mastering Database Fundamentals: Read Phenomena and Isolation

Database Fundamentals - Read Phenomena and Isolation
This entry is part 4 of 4 in the series Database Fundamentals

Read Phenomena in Isolation

In the context of database transactions, “read phenomena” refer to the various inconsistencies that can arise when transactions are executed concurrently. These phenomena are critical to understanding the isolation property of transactions, as they illustrate the potential conflicts that isolation levels aim to prevent. There are four primary types of read phenomena:

Dirty Reads:

  • Definition: A dirty read occurs when a transaction reads data that has been modified by another transaction but not yet committed. If the other transaction rolls back, the data read by the first transaction becomes invalid.
  • Example: Transaction A updates a row in a table but has not yet committed the change. Transaction B reads the updated row. If Transaction A rolls back, Transaction B has read data that never actually existed in the database.

Here’s a practical, real-time example of dirty reads in a database context:

Scenario:

Imagine an e-commerce platform with a database table Orders that contains order information.

Table: Orders

OrderIDStatusAmount
101Pending100
102Completed150

Transactions:

  1. Transaction A:
    • A customer support agent is updating the Status of OrderID 101 to Cancelled because the customer requested to cancel the order.
    • The query executed is:
      UPDATE Orders SET Status = 'Cancelled' WHERE OrderID = 101;
    • Important: Transaction A has not yet committed the change.
  2. Transaction B:
    • A different process (perhaps a reporting dashboard) reads the Orders table to calculate how many Pending orders exist.
    • The query executed is: SELECT COUNT(*) FROM Orders WHERE Status = 'Pending';
    • Outcome: Transaction B will not see OrderID 101 as Pending but as Cancelled because it read the uncommitted update from Transaction A.
  3. Rollback in Transaction A:
    • If Transaction A rolls back its change (e.g., the customer changed their mind and decided not to cancel), the Status of OrderID 101 remains Pending.

Problem:

Transaction B has calculated its report based on incorrect data because it relied on the uncommitted change made by Transaction A. This is a dirty read, as Transaction B read data that was later invalidated.

Why It’s a Problem:

  • Dirty reads can lead to inconsistencies in reports, audits, or downstream processes.
  • If the reporting system stores the incorrect calculation, it might propagate invalid data to decision-makers.

How to Prevent Dirty Reads:

  • Use higher isolation levels in the database, such as Read Committed or Repeatable Read, which ensure that a transaction only reads committed changes.

Non-Repeatable Reads

  • Definition: A non-repeatable read occurs when a transaction reads the same row twice and gets different data each time. This happens if another transaction modifies the row and commits the change between the two reads.
  • Example: Transaction A reads a row. Transaction B then updates or deletes the same row and commits the change. When Transaction A reads the row again, it sees the updated data, leading to inconsistencies in its processing.

Here’s a practical, real-time example of non-repeatable reads in a database context:

Scenario: Stock Replenishment Process

Imagine an inventory management system used by a warehouse to manage the stock of products.

Table: Inventory

ProductIDProductNameStockQuantity
5001Headphones100
5002Bluetooth Speaker50

Transactions:

  1. Transaction A (Single Transaction):
    • A warehouse manager is updating the stock after a replenishment order and checks the stock for Headphones twice within the same transaction, once at the beginning and again in the middle of the process.
    • First Read (Step 1):
      • The manager checks the stock of Headphones at the beginning of the process.
      • The query executed is: SELECT StockQuantity FROM Inventory WHERE ProductID = 5001;
      • The result is StockQuantity = 100.
    • Stock Update (Step 2):
      • Another employee (Transaction B) is also updating the inventory at the same time. They update the stock by adding 50 new Headphones to the inventory.
      • The query executed is: UPDATE Inventory SET StockQuantity = StockQuantity + 50 WHERE ProductID = 5001;
      • Transaction B commits, and the new stock level for Headphones becomes 150.
    • Second Read (Step 3):
      • Later in the same transaction, the warehouse manager checks the stock of Headphones again, assuming it hasn’t changed.
      • The query executed is: SELECT StockQuantity FROM Inventory WHERE ProductID = 5001;
      • This time, the result is StockQuantity = 150.

Problem:

  • Transaction A (the warehouse manager’s transaction) reads the stock quantity for Headphones twice and gets different results:
    • The first read shows 100 available.
    • The second read shows 150 available, as another transaction (Transaction B) updated the stock in between.

This inconsistency in reading the same data twice is a non-repeatable read, because the stock data was modified by another transaction after the first read and before the second read.

Why It’s a Problem:

  • Transaction A may rely on the first read (seeing 100 stock) to decide how many units to order or process. After the second read, it might see 150 units, which causes inconsistency in the decision-making process.
  • This can lead to incorrect planning, such as the warehouse manager thinking they need to order more stock when the inventory was already replenished in the meantime.

How to Prevent Non-Repeatable Reads:

  • To prevent non-repeatable reads in such scenarios, the system should use the Repeatable Read isolation level, which ensures that the same data will be read consistently within the transaction, even if other transactions modify it in the meantime.
    • In this case, Transaction A would consistently see the same stock level for Headphones (either 100 or 150), ensuring reliable decision-making.

Phantom Reads

  • Definition: A phantom read occurs when a transaction re-executes a query returning a set of rows that satisfy a certain condition and finds that the set of rows has changed due to another recently committed transaction. Essentially, new rows that match the query criteria appear (or existing rows disappear) between reads.
  • Example: Transaction A reads all rows where the account balance is greater than $1000. Transaction B then inserts a new row with an account balance of $1500 and commits. When Transaction A re-executes the query, it sees the new row, which was not there before.

Here’s an example of phantom reads in a product pricing system:

Scenario: Product Price Update

Imagine an e-commerce system where product prices are updated regularly based on promotions, market conditions, or stock levels.

Table: Products

ProductIDProductNamePriceStockQuantity
1001Smartwatch250150
1002Bluetooth Speaker10075
1003Laptop100030
1004Tablet30050

Transactions:

  1. Transaction A (Price Update Query):
    • A marketing team member (Transaction A) is applying a price update for products whose price is greater than $200 to run a promotion.
    • Transaction A executes a query to get the products eligible for the promotion: SELECT * FROM Products WHERE Price > 200;
    • The result includes the Smartwatch and Laptop:
      • Smartwatch (Price: 250)
      • Laptop (Price: 1000)
  2. Transaction B (Price Insertion):
    • Meanwhile, another team (Transaction B) adds a new product (a new high-priced product) to the catalog, whose price exceeds $200, thus affecting the set of products that should be eligible for the promotion.
    • Transaction B executes: INSERT INTO Products (ProductName, Price, StockQuantity) VALUES ('Smart TV', 500, 20);
    • Transaction B commits the new product: Smart TV (Price: 500).
  3. Transaction A (Re-executes the Query):
    • After the first price update decision is made, Transaction A later re-executes the same query to apply the updated prices for the promotion: SELECT * FROM Products WHERE Price > 200;
    • However, Transaction A now sees a new product, the Smart TV (Price: 500), which was inserted by Transaction B, and which was not present in the first read.

Problem:

  • Initially, Transaction A retrieves two products eligible for the promotion: Smartwatch and Laptop.
  • After Transaction B inserts the new Smart TV into the database, Transaction A re-runs the same query and unexpectedly includes this new product, leading to an incorrect promotion strategy.

Why It’s a Problem:

  • Transaction A is performing operations based on a set of products, but the result set changes between the two reads because of the new product inserted by Transaction B.
  • The marketing team could mistakenly apply the promotion to more products than initially planned, resulting in incorrect pricing and discounts being applied to unintended products.

How to Prevent Phantom Reads:

  • To prevent phantom reads, the system should use the Serializable isolation level, which ensures that once a transaction reads a set of rows, no other transaction can insert, update, or delete rows that affect the query results during the transaction.
    • This would make sure that Transaction A consistently sees the same set of products in its two reads, and the newly inserted Smart TV would not appear unexpectedly in the second read, ensuring consistent promotion logic.

Lost Updates:

  • Definition: A lost update occurs when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of the other transactions, leading to the last update overwriting updates made by the other transactions without any warning.
  • Example: Transaction A reads a row and updates its value. Before Transaction A commits, Transaction B reads the same row and updates its value based on the original data read. Transaction A then commits, followed by Transaction B, resulting in the update made by Transaction A being lost.

Here’s a real-time example of a lost update scenario, commonly found in systems like banking or ticket booking:

Scenario: Bank Account Balance Update

Imagine a banking system where users can transfer money from one account to another. Two customers, Customer A and Customer B, have separate transactions that modify the balance of the same account, Account 123.

Table: Accounts

AccountIDCustomerNameBalance
123Customer A1000

Transactions:

  1. Transaction A (Customer A’s Transfer):
    • Transaction A starts by transferring $200 from Account 123 to Account 456.
    • Transaction A reads the current balance of Account 123, which is $1000.
    • Based on this value, Transaction A updates the balance: UPDATE Accounts SET Balance = 1000 - 200 WHERE AccountID = 123;
  2. Transaction B (Customer B’s Transfer):
    • Meanwhile, Transaction B also starts and is transferring $150 from Account 123 to Account 789.
    • Transaction B also reads the current balance of Account 123, which is still $1000, unaware that Transaction A is also updating the same balance.
    • Transaction B updates the balance based on the value it read: UPDATE Accounts SET Balance = 1000 - 150 WHERE AccountID = 123;
  3. Transaction A Commits:
    • Transaction A commits its update, reducing the balance of Account 123 by $200. The new balance is $800.
  4. Transaction B Commits:
    • Transaction B then commits its update, reducing the balance of Account 123 by $150. The new balance is $850.

Problem:

  • The final balance of Account 123 is $850, but this is not correct.
  • Transaction A and Transaction B both started with the same balance of $1000, but because Transaction A and Transaction B didn’t see each other’s changes, they both overwrote the balance in ways that led to $200 and $150 being deducted separately instead of $350 in total. The deduction from Transaction A is effectively lost.

Why It’s a Problem:

  • The system should have recognized that two transactions were concurrently updating the same account and handled it in a way that prevents one update from overwriting the other.
  • In this case, Transaction A‘s update is lost, and the final balance is inconsistent with the expected outcome.

How to Prevent Lost Updates:

  • To prevent lost updates, the system can use locking mechanisms or implement optimistic concurrency control.
    • Optimistic concurrency control: Each transaction checks if the data it read has changed before it commits. If the data has changed (i.e., if another transaction has updated the same row), the transaction is retried or aborted.
    • Pessimistic locking: Lock the row for the duration of the transaction to prevent other transactions from modifying it until the first transaction completes.

In this case, using either of these methods would ensure that Transaction A and Transaction B do not overwrite each other’s updates, and the correct final balance would be computed.


Isolation Levels

To manage these read phenomena, databases implement various isolation levels, each providing a different balance between consistency and concurrency, while also addressing the issue of lost updates. The ANSI SQL standard defines four isolation levels:

  1. Read Uncommitted:
    • Behavior: Allows dirty reads. Transactions can read data modified by other transactions even if those changes have not been committed.
    • Phenomena Allowed: Dirty reads, non-repeatable reads, phantom reads, and lost updates.
    • Use Case: Suitable for scenarios where performance is more critical than data consistency, and temporary inconsistencies are acceptable.
  2. Read Committed:
    • Behavior: Prevents dirty reads by ensuring that transactions can only read committed data. However, non-repeatable reads, phantom reads, and lost updates are still possible.
    • Phenomena Allowed: Non-repeatable reads, phantom reads, and lost updates.
    • Use Case: Commonly used isolation level that provides a balance between performance and consistency.
  3. Repeatable Read:
    • Behavior: Prevents dirty reads and non-repeatable reads by ensuring that if a transaction reads a row, no other transaction can modify that row until the first transaction completes. However, phantom reads and lost updates can still occur.
    • Phenomena Allowed: Phantom reads and lost updates.
    • Use Case: Suitable for scenarios where consistent reads of the same data are crucial, such as financial applications.
  4. Serializable:
    • Behavior: Provides the highest level of isolation by ensuring that transactions are serializable, meaning they are executed in such a way that their results are equivalent to some serial (one after the other) execution of those transactions.
    • Phenomena Allowed: None. It prevents dirty reads, non-repeatable reads, phantom reads, and lost updates.
    • Use Case: Used in scenarios where absolute consistency is required, even at the expense of concurrency and performance.

By understanding and selecting the appropriate isolation level, database systems can manage the trade-offs between data consistency and system performance, ensuring that transactions operate correctly and efficiently in a multi-user environment.

Series Navigation<< Mastering Database Fundamentals: ACID with practical example