intro
As the amount of data in databases continues to grow, so does the need for efficient and effective concurrency control techniques. There are different concurrency control techniques including Timestamp-based concurrency control, Lock-based concurrency control, and Multiversion Concurrency Control (MVCC.) MVCC is a widely used approach, supported by databases like PostgreSQL, MySQL with InnoDB, Oracle, Firebird, Microsoft SQL (optional, disabled by default), and IBM DB2 (partial.) In this blog, we tell you everything you need to know about Multiversion Concurrency Control (MVCC), or Multiversion Technique in PostgreSQL and how it works.
Introduction to MVCC in PostgreSQL
MVCC is one of the magic pieces that makes Postgres and other relational databases what they are. It is a key aspect of PostgreSQL's framework and is used to implement transaction partitioning and concurrency control. As such, it is important for developers and database administrators to have a strong understanding of how MVCC works and how it can impact database performance and scalability.
A database user has the option of reading from or writing to a database, or doing both. However, a problem while creating (updating) databases is that our databases lock the data causing concurrency issues and making it challenging for readers to view the updated data. The readers might only see a partially updated version of the data or a version with errors. These concurrent problems could result in deadlocks and unsuccessful transactions. PostgreSQL uses Multi-Version Concurrency Control to control access to data in a database and ensures consistency while allowing multiple transactions to happen simultaneously in the database. In simple terms, MVCC creates several versions of a single database record, enabling various transactions to access different versions of one database record without conflicting with one another. This allows several transactions to run simultaneously, and eliminates the need for locks or blocking operations to enhance the database's performance and responsiveness.
The Behavior of MVCC
The most important aspect in Multiversion Concurrency Control in terms of user-visible behavior is that “readers never block writers” and “writers never block readers.” This behavior is primarily controlled by snapshots which determine what a user can see in a database. A snapshot in MVCC is a virtual read-only, point-in-time copy of the data recorded at the start of each SQL statement in a READ COMMITTED transaction isolation mode, and at the transaction start in a SERIALIZABLE translation isolation mode. MVCC provides a consistent view of the data to each transaction, even as other transactions are modifying the data. Each transaction has its own snapshot, which represents the state of the database at the beginning of the transaction. The snapshot comprises all information about all the table rows (tuples) in the database and controls the table rows that are visible to the transaction at the time the snapshot is taken.
When a new snapshot is taken, the following information is gathered:
Using this information from a snapshot, Postgres can determine if a transaction’s actions should be visible to an executing statement.
How Does PostgreSQL Handle Concurrency?
MVCC allows multiple transactions to read and write to the database at the same time without interfering with each other while each transaction operates on a "snapshot" of the database which is a consistent view of the data at a specific time.
For an example illustrating how MVCC works in PostgreSQL, let’s assume we have a database and a table called Inventory
within it - the table has the following columns:
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Let’s say that two of the database transactions (Transaction 1 and Transaction 2) above are trying to update the amount of the product which has an ID of 100121 at the same time. The two transactions are taking place simultaneously so Transaction 1 begins.
If you are looking for an easy and powerful SQL client and database manager, then you've got to try DbVisualizer. It connects to nearly any database.
Creating a Snapshot of the Database Table
When Transaction 1 (T1) starts, it creates a snapshot of the "Inventory" table as shown below:
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Concurrent Transactions Creating Their Own Snapshots
Now, when Transaction 2 begins, it also creates its own snapshot of the database table as shown below:
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Utilizing Multiversion Concurrency Control
If the Transaction 1 tries to update the amount of product with an ID of 100121 to $550.00 and Transaction 2 tries to update the amount of the same product to $580.00 at the same time, PostgreSQL will utilize the technique of Multiversion Concurrency Control to create a new version of the record for each transaction as shown below:
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 550.00 | Raymond Ryan |
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 580.00 | Raymond Ryan |
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Execution and Commit of Transactions
After T1 executes its updates, the new version of the database record with an amount of $550.00 becomes the current version.
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 550.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Now, when T2 tries to query the database for the row to commit changes, it sees the old value of 500.00 for the amount of product with an ID of 100121. This is because T2's snapshot of the data was taken at the time it started, which was before T1 committed its changes. PostgreSQL will recognize that the record it is trying to update has already been modified by T1 and hence cannot update it. Here, T2's transaction will create a serialization error, indicating that it conflicted with T1's transaction because T2’s snapshot is outdated.
Refreshing the Snapshot
T2 will have to refresh its snapshot by starting a new transaction and taking a new snapshot of the database which will include the row of the product with the ID of 100121 and a price of 550.00.
Creating a New Version of the Row with a New Transaction ID
T2 will now modify the row of the product with an ID of 100121 by changing the amount of the product from 550.00 to 580.00 and then commit its changes. PostgreSQL will then create a new version of the row with the new product amount value and an internal transaction ID of T2.
PostgreSQL MVCC Internal Process Flow
MVCC maintains multiple versions of each row in the database, so that different transactions can see different versions of the same data without blocking each other. Here, we'll explore the internal process flow of MVCC in PostgreSQL.
Suppose we have a table named inventory with columns product id, product description, amount and customer name:
1
CREATE TABLE inventory (
2
product_id INT,
3
product_description VARCHAR(255),
4
amount DECIMAL(10,2),
5
customer_name VARCHAR(255)
6
);
Here's how the MVCC behavior in PostgreSQL works for transactions T1 and T2:
Transaction T1
1
BEGIN TRANSACTION;
2
3
-- Take a snapshot of the database
4
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
5
6
-- Read the current version of row with ID 100121
7
SELECT amount FROM inventory WHERE product_id = 100121;
8
9
-- Modify the row with ID 100121
10
UPDATE inventory SET amount = 550 WHERE product_id = 100121;
11
12
-- Commit the transaction
13
COMMIT;
1
BEGIN TRANSACTION;
2
3
-- Take a snapshot of the database
4
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
5
6
-- Read the current version of row with ID 100121
7
SELECT amount FROM inventory WHERE product_id = 100121;
8
9
-- Modify the row with ID 100121
10
UPDATE inventory SET amount = 580 WHERE product_id = 100121;
11
12
-- Commit the transaction
13
COMMIT;
As a result, the inventory table will now have two versions of the row with ID 100121: one with an amount of 550 and a transaction ID of T1, and another with an amount of 580 and a transaction ID of T2. Each transaction sees a consistent view of the data at the time it started, and PostgreSQL ensures that transactions do not interfere with each other, even during modification of the same data.
Pros & Cons Of MVCC
Pros
Cons
Conclusion
In this article, you have learnt about the general idea of what MVCC is and how it functions in PostgreSQL by using snapshots to manage the visibility of transactions to avoid the need for locking. MVCC permits higher levels of concurrency, which reduces contention in a database application and improves performance. By understanding the principles and best practices of MVCC, developers can design more efficient and scalable applications that take full advantage of the power of PostgreSQL.
Make sure to follow the DbVisualizer Table blog to learn more about database development, and until next time.