Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txn: try to fix the possible deadlock caused by scan lock #16342

Merged
merged 2 commits into from
Jan 10, 2024

Conversation

cfzjywxk
Copy link
Collaborator

@cfzjywxk cfzjywxk commented Jan 9, 2024

What is changed and how it works?

Issue Number: Ref #16340

What's Changed:

Try to fix the possible deadlock caused by scan lock.

Related changes

Check List

Tests

  • Manual test (add detailed scripts or steps below)
#[test]
fn test_rw_lock() {
    fn test_check_term_version_status(locks: &PeerPessimisticLocks) -> Result<(), ()> {
        let version = locks.version;
        Ok(())
    }
    let peer_locks = PeerPessimisticLocks::from_locks(vec![
        lock_with_key(b"key1", false),
        lock_with_key(b"key2", false),
        lock_with_key(b"key3", false),
    ]);
    let lock = std::sync::Arc::new(RwLock::new(peer_locks));
    let running = std::sync::Arc::new(std::sync::atomic::AtomicBool::new(true));
    let l_read = lock.clone();
    let r = running.clone();
    let t0 = std::thread::spawn(move || {
        while r.load(Ordering::SeqCst) {
            // Impl 1.
            // let res = match test_check_term_version_status(&l_read.read()) {
            //     Ok(_) => {
            //         // Scan locks within the specified range and filter by max_ts.
            //         Ok(l_read
            //             .read()
            //             .scan_locks(None, None, |_, _ |true, 0))
            //     }
            //     Err(e) => Err(e),
            // };
            /// Impl 2.
            let pessimistic_locks_guard = l_read.read();
            let res = match test_check_term_version_status(&pessimistic_locks_guard) {
                Ok(_) => {
                    // Scan locks within the specified range and filter by max_ts.
                    Ok(pessimistic_locks_guard.scan_locks(None, None, |_,_| true, 0))
                }
                Err(e) => Err(e),
            };
        }
    });
    let l_write = lock.clone();
    let r = running.clone();
    let t1 = std::thread::spawn(move || {
        while r.load(Ordering::SeqCst) {
            let mut a = l_write.write();
            a.version += 1;
        }
    });
    std::thread::sleep(std::time::Duration::from_millis(1000));
    running.store(false, Ordering::SeqCst);
    t0.join().unwrap();
    t1.join().unwrap();
}

The Impl 1 which executes like

let res = match self.check_term_version_status(&txn_ext.pessimistic_locks.read()) {
Ok(_) => {
// Scan locks within the specified range and filter by max_ts.
Ok(txn_ext
.pessimistic_locks
.read()
.scan_locks(start_key, end_key, filter, scan_limit))
}
Err(e) => Err(e),
};
would cause deadlock, while Impl 2 would not

The read thread
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000555557aceb97 in parking_lot_core::thread_parker::imp::ThreadParker::futex_wait (self=0x7ffff6ffe560, ts=...)
    at parking_lot_core-0.9.1/src/thread_parker/linux.rs:112
#2  0x0000555557ace9c4 in parking_lot_core::thread_parker::imp::{impl#0}::park (self=0x7ffff6ffe560)
    at parking_lot_core-0.9.1/src/thread_parker/linux.rs:66
#3  0x0000555557ac36ea in parking_lot_core::parking_lot::park::{closure#0}<parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#0}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>, parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#1}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>, parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#2}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>> (
    thread_data=0x7ffff6ffe540) at parking_lot_core-0.9.1/src/parking_lot.rs:635
#4  0x0000555557ac1ceb in parking_lot_core::parking_lot::with_thread_data<parking_lot_core::parking_lot::ParkResult, parking_lot_core::parking_lot::park::{closure_env#0}<parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#0}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>, parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#1}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>, parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#2}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>>> (f=...)
    at parking_lot_core-0.9.1/src/parking_lot.rs:207
#5  parking_lot_core::parking_lot::park<parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#0}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>, parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#1}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>, parking_lot::raw_rwlock::{impl#10}::lock_common::{closure_env#2}<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}>> (key=140737219925040, validate=..., 
    before_sleep=..., timed_out=..., park_token=..., timeout=...)
    at parking_lot_core-0.9.1/src/parking_lot.rs:600
#6  0x0000555557ac620d in parking_lot::raw_rwlock::RawRwLock::lock_common<parking_lot::raw_rwlock::{impl#10}::lock_shared_slow::{closure_env#0}> (

The write thread
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000555557aceb97 in parking_lot_core::thread_parker::imp::ThreadParker::futex_wait (self=0x7ffff6dfd560, ts=...)
    at parking_lot_core-0.9.1/src/thread_parker/linux.rs:112
#2  0x0000555557ace9c4 in parking_lot_core::thread_parker::imp::{impl#0}::park (self=0x7ffff6dfd560)
    at parking_lot_core-0.9.1/src/thread_parker/linux.rs:66
#3  0x0000555557ac3f36 in parking_lot_core::parking_lot::park::{closure#0}<parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#0}, parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#1}, parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#2}> (thread_data=0x7ffff6dfd540)
    at parking_lot_core-0.9.1/src/parking_lot.rs:635
#4  0x0000555557ac2255 in parking_lot_core::parking_lot::with_thread_data<parking_lot_core::parking_lot::ParkResult, parking_lot_core::parking_lot::park::{closure_env#0}<parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#0}, parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#1}, parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#2}>> (f=...)
    at parking_lot_core-0.9.1/src/parking_lot.rs:207
#5  parking_lot_core::parking_lot::park<parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#0}, parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#1}, parking_lot::raw_rwlock::{impl#10}::wait_for_readers::{closure_env#2}> (key=140737219925041, validate=..., before_sleep=..., timed_out=..., 
    park_token=..., timeout=...) at parking_lot_core-0.9.1/src/parking_lot.rs:600
#6  0x0000555557ac5b23 in parking_lot::raw_rwlock::RawRwLock::wait_for_readers (self=0x7ffff0001430, timeout=..., prev_value=0) at src/raw_rwlock.rs:1013
#7  0x0000555557ac5035 in parking_lot::raw_rwlock::RawRwLock::lock_exclusive_slow 

Side effects

Release note

None

Signed-off-by: cfzjywxk <cfzjywxk@gmail.com>
@cfzjywxk cfzjywxk added the type/bugfix This PR fixes a bug. label Jan 9, 2024
Copy link
Contributor

ti-chi-bot bot commented Jan 9, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • ekexium
  • zyguan

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed do-not-merge/needs-triage-completed labels Jan 9, 2024
@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 10, 2024
Copy link
Contributor

ti-chi-bot bot commented Jan 10, 2024

@crazycs520: Thanks for your review. The bot only counts approvals from reviewers and higher roles in list, but you're still welcome to leave your comments.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jan 10, 2024
@cfzjywxk
Copy link
Collaborator Author

/merge

Copy link
Contributor

ti-chi-bot bot commented Jan 10, 2024

@cfzjywxk: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Jan 10, 2024

This pull request has been accepted and is ready to merge.

Commit hash: b3a172c

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Jan 10, 2024
Copy link
Contributor

ti-chi-bot bot commented Jan 10, 2024

@cfzjywxk: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit d447120 into tikv:master Jan 10, 2024
7 checks passed
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Jan 10, 2024
ti-chi-bot bot added a commit that referenced this pull request Jan 10, 2024
ref #16340

Try to fix the possible deadlock caused by scan lock.

Signed-off-by: cfzjywxk <cfzjywxk@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@cfzjywxk cfzjywxk deleted the try_fix_deadlock branch January 12, 2024 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants