In this article, we will understand the reasons and solutions for the emergence of MySQL phantom rea

Mondo Technology Updated on 2024-01-29

There are four levels of mysql transaction isolation:

Read not submitted: When a transaction has not yet been committed, the changes made by the transaction can be seen by other transactions. This isolation level will have dirty reads and non-repeatable reads.

Read the submission: After a transaction is committed, the changes made in the transaction will be seen by other transactions. This isolation level solves dirty reads, which will cause non-repeatable reads.

Repeatable readability: refers to the data seen during the execution of a transaction, which is always consistent with the data seen at the start of the transaction. This isolation level solves the problem of non-repeatable reads, which will cause phantom reads.

Serialization: For the same row of records, write locks are added, and read locks are added. In the event of a read/write lock conflict, the transaction that is accessed later must wait for the previous transaction to be executed before it can continue to be executed. This isolation level solves phantom reads.

Dirty read: refers to a transaction that reads data that is not committed by other transactions.

Non-repeatable read: refers to reading the same data multiple times, and the result set is inconsistent (i.e., reading data after other transactions have been committed).

Phantom read refers to when a transaction queries the same range twice, and the later query sees rows that were not seen in the previous query.

Create a t user table.

-- Create table T User (User ID bigint not null comment.)'User ID', user_name varchar(100) null comment 'Username', user_age int null, primary key (`user_id`),key `idx_user_name`(`user_name`))engine=innodb comment 'User Information Table';--Insert into t user values(1,'nq1',37),(5,'nq5',32),(10,'nq10',20),(15,'nq25',25);
Where:

The user ID is the primary key.

user name field to create an index.

The user age field did not create an index.

Execution statement:

begin; select * from t_user where user_age=37 for update;commit;
Therefore, after the SELECT statement is executed, a write lock will be added to the line of user ID=1, and according to the two-stage lock protocol, this write lock will be released when the commit statement is executed.

Scenario analysis

At the repeatable isolation level, assume that the select * from t user where user age=37 for update statement is locked only on the user age=37 line.

As shown in:

Three queries are executed in Session A: Step 1, Step 2, and Step 3. The third query statement is select * from t where user age=37 for update: query all rows with user age=37 using the current read+write lock.

Query result: At t1, step1 only returns the line record (1,) of user id=1'nq1',37)。

At T2, Session B changes the user age value of the user id=5 line to 37, so at T3, the query result of step2 is the user id=1 and user id=5 rows (1,'nq1',37)、(5,'nq5',32)。

T4, Session C inserts a row (2,) into the T User table'nq2',37), so at T5, the query result of step3 is user id=1, user id=5, and user id=2'nq1',37)、(5,'nq5',32)、(2'nq2',37)。

In the query result, the record of user id=2 is called phantom read.

Special note: Under the repeatable read isolation level, normal queries are snapshot reads and do not see data inserted by other transactions. Therefore, the phantom read will only appear when the current read is undertaken.

The modification result of Session B in the above ** cannot be called phantom read when seen by the current read of Session A. Phantom reads refer only to newly inserted lines.

In ** above, the statement in Session A:

select * from t where user_age=37 for update;
What it means: When querying, use for update to lock the user age=37 record. In this case, add two update statements to Session B and Session C.

As shown in:

Among them, the second statement of Session B:

update t_user set user_name = 'nq55555' where user_id = 5;
The semantics are as follows: change the value of the user name field in the line user id=5 and user age=37 to nq55555.

At T1, Session A only adds a row lock to the line User ID=1, so Session B can execute the two update statements in ** at time T2, which destroys the lock semantics of the Step 1 statement in Session A that locks all the lines of User age=37. Session C, like Session B, breaks the locking semantics of the step1 statement in Session A that locks all lines with user age=37.

Locks are designed to ensure data consistency. This consistency is not only the consistency of the data status within the database, but also the logical consistency of data and logs.

In the above **, add the following statement to Session A:

update t_user set user_age = 40 where user_age = 37;
As shown in:

The addition statement of Step 1 in Session A:

update t_user set user_age = 40 where user_age = 37;
The semantics are as follows: lock the statement user age=37 and update the value of user age to 40.

Processing steps: 1) At T1, Session A updates the user id=1 record to (1,).'nq1',40), the update result was finally officially submitted at the T6 moment.

2) At T2, Session B updates the user id=5 line to (5,'nq55555',37)。

3) At T4, Session C adds a new row of records (2,) to the table'nq55555',37)。

4) The other rows are not related to this execution sequence and remain unchanged.

After the above statement is executed, the execution sequence in the binlog:

/*(5,'nq5',37)*/update t_user set user_age = 37 where user_id = 5;/*(5,'nq55555',37)*/update t_user set user_name = 'nq55555' where user_id = 5;/*(2,'nq2',37)*/insert into t_user values(2,'nq2',37);/*(2,'nq55555',37)*/update t_user set user_name = 'nq55555' where user_id = 2;* All lines with user age=37 are updated to 40* update t user set user age=40 where user age=37;
Eventually, the user age field values for rows with user IDs 2 and 5 in the table were updated to 40, and data inconsistencies occurred.

The above analysis is based on the select * from t user where user age=37 for update statement, which locks only the user age=37 (i.e., user id=1) line. Therefore, this situation is clearly unreasonable.

Suppose all rows scanned by the select * from t user where user age=37 for update statement are locked.

As shown in:

Since Session A has a write lock on all the lines it scans, Session B goes into blocking when it executes the first update statement. Session B cannot continue until Session A is committed at T6.

After the above statement is executed, the execution sequence in the binlog:

/*(2,'nq2',37)*/insert into t_user values(2,'nq2',37);/*(2,'nq55555',37)*/update t_user set user_name = 'nq55555' where user_id = 2;* All lines with user age=37 are updated to 40* update t user set user age=40 where user age=37;/*(5,'nq5',37)*/update t_user set user_age = 37 where user_id = 5;/*(5,'nq55555',37)*/update t_user set user_name = 'nq55555' where user_id = 5;
As you can see, the row user id=5 in the table has been executed correctly, but the result of the row user id=2 recorded in the database is (2,'nq55555',37), and the result in the binlog is (2,'nq55555',40), there is still a phantom reading problem.

The reason why phantom reads still occurs: at t3, when all scanned rows are locked by the statement select * from t user where user age=37 for update, the record of user id=2 does not exist yet. Therefore, even if all scanned records are locked, it will not prevent the insertion of new records (i.e., phantom reads).

Based on the above analysis, the reason for phantom reads is that row locks can only lock the rows, but cannot lock the gap between rows. To solve the phantom read problem, InnoDB introduces a new lock, the gap lock.

Gap lock, which locks the gap between two values. For example, when t user is initialized, (1,'nq1',37),(5,'nq5',32),(10,'nq10',20),(15,'nq25',25) and so on, resulting in a total of 5 gaps.

As shown in Fig

When you run the select * from t user where user age=37 for update statement, row locks are added to the four initialization records and gaps are added to the five gaps corresponding to the four records.

Where: The one that conflicts with the row lock is another row lock.

What conflicts with the gap lock is the operation of inserting a record into the gap, and there is no conflict between the gap locks.

The impact of gap locks

The introduction of gap locks solves the phantom read problem at repeatable isolation levels, but it also has some implications.

Scenario analysis

Procedure: 1) Session A executes select ....for update statement, since the line user id=8 does not exist, a gap lock (5,10) will be added.

2) Session B is executed to select ....For update statement, gap locks (5 and 10) will also be added, and the gap locks will not conflict with each other, so the statement can be executed successfully.

3) Session B tries to insert a row (8,) into the table'nq8',28), due to the existence of a gap lock in Session A, the insertion statement can only go into waiting.

4) Session A tries to insert a row (8,) into the table'nq8',28), and due to the existence of the gap lock in Session B, the insertion statement can only enter the wait.

At this point, the two sessions will enter a state of waiting for each other, forming a deadlock. The deadlock detection of INNODB detects the deadlock relationship, so the insert statement of Ssession A returns an error message and the corresponding message is returned.

The introduction of gap locking may cause the same statement to lock a larger range, which will inevitably affect the concurrency of table operations. Of course, gap locks only take effect at the repeatable read isolation level, so if you set the isolation level to read commits, there will be no gap locks, but you need to solve the problem of data and log inconsistencies that may occur at the read commit level, that is, set the binlog format to row.

The gap lock and the row lock are collectively called next-key locks, and each next-key lock is a front-open and back-closed section. After the table t user is initialized, if you use the select * from t user for update statement to lock all records in the entire table, five next-key locks are formed: (1], (1,5], (5,10], (10,15], (15,+supremum). where supremum refers to a non-existent maximum value added by innodb to each index, so that it meets the definition of the open-closed interval.

The main content of the phantom reading:

1) Phantom read definition: Phantom read refers to when a transaction queries the same range twice, and the later query sees the rows that the previous query did not see.

2) Scenario where phantom reading appears: Phantom reading will only appear under the current reading, and phantom reading only refers to the newly inserted line.

3) Problems of phantom reading: semantic problems and data consistency problems.

4) Phantom read problem solution: Under the current read conditions, MySQL solves the phantom read problem by introducing gap locks. However, gap locks can have an impact on the concurrency of table operations. Gap locks only take effect at the repeatable read isolation level, so if you set the isolation level to read commits, there will be no gap locks, but you need to solve the data and log inconsistencies that may occur at the read commit level, that is, set the binlog format to row.

Read Recommended] More exciting content, such as:

Redis series.

Data Structures & Algorithms.

NACOS series.

MySQL series.

JVM series.

Kafka series.

Please move to the personal homepage of [Nanqiu] for reference. The content is constantly being updated.

About the author] An old babe who loves technology and life, focuses on the field Xi of J**A, and pays attention to [Nanqiu] to take you to learn and grow together

Related Pages