-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't auto-advance time when a spawn_blocking task is running. #5115
Conversation
This could conceivably break someone's existing code. If a If that's too big a breaking change, my problem could alternatively be addressed with
Of course, there's the question of whether such APIs would actually work well. Intuitively, I think the second option is pretty good. I think async code generally "knows" locally there's a future that represents work that's happening outside of tokio's managed tasks, that will complete and unblock a tokio task. When such a future is polled, I think that's when auto-advance should be inhibited. |
Hmm, yes, the server example is annoying. |
OK, how about a function that wraps a future, say // (in tokio::time)
/// Annotate a future as doing work, from when it is first polled to when it becomes `Ready`, even if no
/// tokio thread is currently polling it.
///
/// This suppresses the auto-advance feature when time is paused. When time is not paused, or
/// the **test-util** feature is not enabled, this is a no-op.
async fn inhibit_auto_advance<F: Future>(fut: F) -> F::Output; ? |
I haven't reviewed enough, but I am OK considering this a "bug fix" and not a "breaking change". The arguments are valid. Also, risk is lower due to it being a test-time feature. |
In that case, this is ready for review. |
I had a quick look and it looks reasonable enough. There are some CI failures though. |
OK, I fixed it—CI is green now. |
976004f
to
1013452
Compare
There's still a race condition, apparently. Loom runs the test about 600 times before detecting it. |
I wonder if this is the second problem I mentioned a couple days ago: if the main thread parks itself, The interleaving that fails is:
I don't know why we don't see |
Considering that loom reports a deadlock, it seems reasonable to conclude that this is due to the issue you mention. As for why you don't see task B, I assume its because it was scheduled to run later? |
969b50b
to
f23610f
Compare
Time shouldn't auto-advance unless there's really nothing else going on. In general that is hard to fix, but this special case is relatively easy. Fixes: tokio-rs#4614
This uses a destructor, so it will also work if tokio machinery panics while trying to e.g. spawn a thread.
The auto-advance inhibit count must be decremented after the blocking task is finished, and crucially after waking the JoinHandle, to prevent the tokio thread from auto-advancing before tasks awaiting that JoinHandle are rescheduled.
Specifically, when a blocking task is spawned, but no thread immediately awaits the JoinHandle, if the main thread parks itself, we previously failed to unpark it when the blocking task completed. Auto-advance would be re-enabled but could not happen, a potential deadlock.
It does not always work; and is it happens, the caller has a handle already and can just pass it in.
cbb4c38
to
939841f
Compare
The loom issue is fixed and has a test now. This is ready for review. (I don't know why there are still jobs that are "Expected" after all this time; I think I've seen that before with gh actions when new jobs are added...?) |
You can fix CI by merging in master. |
OK, this is ready for a CI run. |
Thanks! Tests are green. I think this is ready for review. |
👋 hi, friendly review ping (or, what should I do next here) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the silence. We discussed this on the discord back when you said CI was green, and I think we came to the conclusion that this looks fine. I took another look just now, and I have one comment regarding your tests.
tokio/tests/task_blocking.rs
Outdated
time::timeout( | ||
Duration::from_secs(3), | ||
task::spawn_blocking(|| thread::sleep(Duration::from_millis(250))), | ||
) | ||
.await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reduce the sleeps. Our test suite already takes long enough.
This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.23.0` -> `1.24.1` | | [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.23.0` -> `1.24.1` | --- ### Release Notes <details> <summary>tokio-rs/tokio</summary> ### [`v1.24.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.24.1): Tokio v1.24.1 [Compare Source](tokio-rs/tokio@tokio-1.24.0...tokio-1.24.1) This release fixes a compilation failure on targets without `AtomicU64` when using rustc older than 1.63. ([#​5356]) [#​5356]: tokio-rs/tokio#5356 ### [`v1.24.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.24.0): Tokio v1.24.0 [Compare Source](tokio-rs/tokio@tokio-1.23.1...tokio-1.24.0) The highlight of this release is the reduction of lock contention for all I/O operations ([#​5300](tokio-rs/tokio#5300)). We have received reports of up to a 20% improvement in CPU utilization and increased throughput for real-world I/O heavy applications. ##### Fixed - rt: improve native `AtomicU64` support detection ([#​5284]) ##### Added - rt: add configuration option for max number of I/O events polled from the OS per tick ([#​5186]) - rt: add an environment variable for configuring the default number of worker threads per runtime instance ([#​4250]) ##### Changed - sync: reduce MPSC channel stack usage ([#​5294]) - io: reduce lock contention in I/O operations ([#​5300]) - fs: speed up `read_dir()` by chunking operations ([#​5309]) - rt: use internal `ThreadId` implementation ([#​5329]) - test: don't auto-advance time when a `spawn_blocking` task is running ([#​5115]) [#​5186]: tokio-rs/tokio#5186 [#​5294]: tokio-rs/tokio#5294 [#​5284]: tokio-rs/tokio#5284 [#​4250]: tokio-rs/tokio#4250 [#​5300]: tokio-rs/tokio#5300 [#​5329]: tokio-rs/tokio#5329 [#​5115]: tokio-rs/tokio#5115 [#​5309]: tokio-rs/tokio#5309 ### [`v1.23.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.23.1): Tokio v1.23.1 [Compare Source](tokio-rs/tokio@tokio-1.23.0...tokio-1.23.1) This release forward ports changes from 1.18.4. ##### Fixed - net: fix Windows named pipe server builder to maintain option when toggling pipe mode ([#​5336]). [#​5336]: tokio-rs/tokio#5336 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC44NC4xIiwidXBkYXRlZEluVmVyIjoiMzQuODQuMSJ9--> Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1703 Reviewed-by: crapStone <crapstone@noreply.codeberg.org> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Time shouldn't auto-advance unless there's really nothing else going on. In general that is hard to fix, but this special case is relatively easy.
Fixes: #4614
Motivation
I'm writing tests for a complex distributed system. In a single test process, a "server" task runs CPU-bound units of work using
spawn_blocking
, while a "client" task sends requests and waits for responses using.await
, with a generous timeout. Issue #4614 causes the timeout to elapse instantly; no responses ever succeed.There seems to be consensus in #4614 that this behavior is a bug, and I agree.
Solution
I added a field in the clock that tells how many
spawn_blocking
tasks are running/scheduled.The code increments this counter when spawning such a task and decrements it when the task is done. This means the increment and the decrement are not conveniently side by side in the code, but I think it might be important to bump the counter synchronously, to avoid a race condition if the new task doesn't actually start running right away.
I named the counter
auto_advance_inhibit_count
, instead of something obvious likenum_blocking_tasks
, because it might make sense in the future to provide an API whereby users can manually inhibit auto-advance (... if I'm right in thinking that blocking tasks are not the only kind of work Tokio "can't see").