-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pending run_pending_tasks
may cause busy loop in schedule_write_op
#412
Comments
Because So, this line doesn't yield to break the busy loop. Line 681 in ae33f8d
It seems to me that |
Signed-off-by: Yilin Chen <sticnarf@gmail.com>
Hi. Thank you for reporting the issue. So, if I understand correctly, (1) unlocking
Can you please try it by modifying When I implemented the loop in |
Oh, I realized that you added |
Yes, I have tried inserting a |
That's great! I still do not want to add async runtime dependency to |
Do you mean sometimes call
|
I see. So, without calling I will try to reproduce the problem and try some ideas to fix. I run mokabench before every release. Although the write op channel often gets full during a mokabench run, I have never had the busy loop problem. I think this is because mokabench never calls I will modify mokabench to do something like the followings to see it will reproduce the problem:
|
I created a potential fix for the busy loop issue. Can you please try it if you have time? [dependencies]
moka = { git = "https://github.com/moka-rs/moka", branch = "avoid-async-scheduler-busy-loop", features = ["future"] } If you do not have time, that is okay. I have not been able to reproduce the issue using mokabench, but I will continue trying. |
I found the following program can reproduce the problem. I ran it on a MacBook Air with Apple M2 chip.
[dependencies]
moka = { version = "0.12.5", features = ["future"] }
# moka = { git = "https://github.com/moka-rs/moka", branch = "avoid-async-scheduler-busy-loop", features = ["future"] }
futures-util = "0.3.30"
rand = "0.8.5"
rand_pcg = "0.3.1"
tokio = { version = "1.37.0", features = ["rt-multi-thread", "macros"] }
use moka::future::Cache;
use rand::{distributions::Uniform, prelude::*};
use rand_pcg::Pcg64;
const NUM_TASKS: usize = 48;
const NUM_KEYS: usize = 500_000;
const NUM_OPERATIONS_PER_TASK: usize = 1_000_000;
const NUM_INSERTS_PER_STEP: usize = 16;
const MAX_CAPACITY: u64 = 100_000;
// Use a small number of the worker threads to make it easy to reproduce the problem.
#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
async fn main() {
let cache = Cache::builder().max_capacity(MAX_CAPACITY).build();
// Spawn `NUM_TASKS` tasks, each performing `NUM_OPERATIONS_PER_TASK` operations.
let tasks = (0..NUM_TASKS).map(|task_id| {
let cache = cache.clone();
tokio::spawn(async move {
let mut rng = Pcg64::seed_from_u64(task_id as u64);
let distr = Uniform::new_inclusive(0, NUM_KEYS - 1);
// Each task will perform `NUM_OPERATIONS_PER_TASK` operations.
for op_number in 0..NUM_OPERATIONS_PER_TASK {
// Each task will insert `NUM_INSERTS_PER_STEP` entries per operation in
// parallel, and call `run_pending_tasks()` if the random number `n` is 0.
let ns = (&mut rng)
.sample_iter(distr)
.take(NUM_INSERTS_PER_STEP)
.collect::<Vec<_>>();
let ops = ns
.into_iter()
.map(|n| {
let key = format!("key-{}", n);
let value = format!("value-{}", n);
let cache = cache.clone();
async move {
cache.insert(key, value).await;
if n == 0 {
cache.run_pending_tasks().await;
}
}
})
.collect::<Vec<_>>();
futures_util::future::join_all(ops).await;
if (op_number + 1) % 1000 == 0 {
println!("Task {:2} - {:7}", task_id, op_number + 1);
// Give other tasks some chances to run.
tokio::task::yield_now().await;
}
}
println!("Task {:2} - {:7}", task_id, NUM_OPERATIONS_PER_TASK);
})
});
// Run all tasks and wait for them to complete.
futures_util::future::join_all(tasks).await;
cache.run_pending_tasks().await;
println!("Done! Cache size: {}", cache.entry_count());
} |
Just for sure, I ran the above program on a Linux x86_64 PC with Intel Core i7-12700F. I got the same result:
|
I merged #415 into the [dependencies]
moka = { git = "https://github.com/moka-rs/moka", branch = "main", features = ["future"] } I will try to release |
I published Thanks again for reporting the bug. Your analysis really helped to reproduce and fix it. |
Sorry, I didn't have time to try the new fix until today. I can confirm the problem disappears in the test of my application with moka 0.12.6. Thank you for the quick fix! |
Thank you for trying out moka 0.12.6 while you are busy! |
It's not so easy to create a minimal reproducible example. I will try to describe my code and my findings.
As a workaround for #408, I run
run_pending_tasks
manually when the cache size exceeds the capacity by a wide margin. However, occasionally I find my tokio worker threads are stuck in the busy loop ofschedule_write_op
.I examine what happens with gdb:
current_task
mutex intry_run_pending_tasks
.current_task
mutex is2
, which means the mutex is not locked but there is a starved acquirer. In this state,try_lock
will fail.I further add some logs to moka and find that inside my manual call to
run_pending_tasks
, it tries to acquire thecurrent_task
mutex and does not actually get it acquired. It probably shows that the starved mutex acquirer is the manualrun_pending_tasks
task.There are various reasons why the
run_pending_tasks
task is not waked up to acquire the lock:schedule_write_op
. The runtime never schedules therun_pending_tasks
task to run.run_pending_tasks
task together with otherinsert
tasks in ajoin_all
future. So, when thejoin_all
future is doing the busy loop, it's also impossible for therun_pending_tasks
task to run.The text was updated successfully, but these errors were encountered: