Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix starvation when using LIFO slot without consuming budget #5686

Merged
merged 4 commits into from
May 15, 2023

Conversation

Darksonn
Copy link
Contributor

This fixes a problem where using the LIFO slot without consuming budget can result in starvation.

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime M-coop Module: tokio/coop labels May 12, 2023
@github-actions github-actions bot added the R-loom Run loom tests on this PR label May 12, 2023
@Darksonn Darksonn added R-loom Run loom tests on this PR and removed R-loom Run loom tests on this PR labels May 12, 2023
@Darksonn Darksonn changed the title Add failing test. Fix starvation when using LIFO slot without consuming budget May 12, 2023
@hawkw hawkw self-requested a review May 12, 2023 20:53
tokio/src/runtime/coop.rs Outdated Show resolved Hide resolved
@Darksonn Darksonn enabled auto-merge (squash) May 15, 2023 11:54
@Darksonn Darksonn merged commit 70364b7 into master May 15, 2023
@Darksonn Darksonn deleted the alice/lifo-consume-coop branch May 15, 2023 12:40
carllerche added a commit that referenced this pull request May 23, 2023
As an optimization to improve locality, the multi-threaded scheduler
maintains a single slot (LIFO slot). When a task is scheduled, it goes
into the LIFO slot. The scheduler will run tasks in the LIFO slot first,
before checking the local queue.

In ping-ping style workloads where task A notifies task B, which
notifies task A again, this can cause starvation as these two tasks will
repeatedly schedule the other in the LIFO slot. #5686, a first
attempt at solving this problem, consumes a unit of budget each time a
task is scheduled from the LIFO slot. However, at the time of this
commit, the scheduler allocates 128 units of budget for each chunk of
work. This is quite high in situation where tasks do not perform many
async operations, yet have meaningful poll times (even 5-10 microsecond
poll times can have outsized impact on the scheduler).

In an ideal world, the scheduler would adapt to the workload it is
executing. However, as a stopgap, this commit limits the number of times
the LIFO slot is prioritized per scheduler tick.
carllerche added a commit that referenced this pull request May 23, 2023
As an optimization to improve locality, the multi-threaded scheduler
maintains a single slot (LIFO slot). When a task is scheduled, it goes
into the LIFO slot. The scheduler will run tasks in the LIFO slot first
before checking the local queue.

Ping-ping style workloads where task A notifies task B, which
notifies task A again, can cause starvation as these two tasks 
repeatedly schedule the other in the LIFO slot. #5686, a first
attempt at solving this problem, consumes a unit of budget each time a
task is scheduled from the LIFO slot. However, at the time of this
commit, the scheduler allocates 128 units of budget for each chunk of
work. This is relatively high in situations where tasks do not perform many
async operations yet have meaningful poll times (even 5-10 microsecond
poll times can have an outsized impact on the scheduler).

In an ideal world, the scheduler would adapt to the workload it is
executing. However, as a stopgap, this commit limits the times
the LIFO slot is prioritized per scheduler tick.
crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this pull request Jul 6, 2023
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.28.2` -> `1.29.1` |
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.28.2` -> `1.29.1` |

---

### Release Notes

<details>
<summary>tokio-rs/tokio (tokio)</summary>

### [`v1.29.1`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.1): Tokio v1.29.1

[Compare Source](tokio-rs/tokio@tokio-1.29.0...tokio-1.29.1)

##### Fixed

-   rt: fix nesting two `block_in_place` with a `block_on` between (#&#8203;5837])

#&#8203;5837]: tokio-rs/tokio#5837

### [`v1.29.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.29.0): Tokio v1.29.0

[Compare Source](tokio-rs/tokio@tokio-1.28.2...tokio-1.29.0)

Technically a breaking change, the `Send` implementation is removed from
`runtime::EnterGuard`. This change fixes a bug and should not impact most users.

##### Breaking

-   rt: `EnterGuard` should not be `Send` (#&#8203;5766])

##### Fixed

-   fs: reduce blocking ops in `fs::read_dir` (#&#8203;5653])
-   rt: fix possible starvation (#&#8203;5686], #&#8203;5712])
-   rt: fix stacked borrows issue in `JoinSet` (#&#8203;5693])
-   rt: panic if `EnterGuard` dropped incorrect order (#&#8203;5772])
-   time: do not overflow to signal value (#&#8203;5710])
-   fs: wait for in-flight ops before cloning `File` (#&#8203;5803])

##### Changed

-   rt: reduce time to poll tasks scheduled from outside the runtime (#&#8203;5705], #&#8203;5720])

##### Added

-   net: add uds doc alias for unix sockets (#&#8203;5659])
-   rt: add metric for number of tasks (#&#8203;5628])
-   sync: implement more traits for channel errors (#&#8203;5666])
-   net: add nodelay methods on TcpSocket (#&#8203;5672])
-   sync: add `broadcast::Receiver::blocking_recv` (#&#8203;5690])
-   process: add `raw_arg` method to `Command` (#&#8203;5704])
-   io: support PRIORITY epoll events (#&#8203;5566])
-   task: add `JoinSet::poll_join_next` (#&#8203;5721])
-   net: add support for Redox OS (#&#8203;5790])

##### Unstable

-   rt: add the ability to dump task backtraces (#&#8203;5608], #&#8203;5676], #&#8203;5708], #&#8203;5717])
-   rt: instrument task poll times with a histogram (#&#8203;5685])

#&#8203;5766]: tokio-rs/tokio#5766

#&#8203;5653]: tokio-rs/tokio#5653

#&#8203;5686]: tokio-rs/tokio#5686

#&#8203;5712]: tokio-rs/tokio#5712

#&#8203;5693]: tokio-rs/tokio#5693

#&#8203;5772]: tokio-rs/tokio#5772

#&#8203;5710]: tokio-rs/tokio#5710

#&#8203;5803]: tokio-rs/tokio#5803

#&#8203;5705]: tokio-rs/tokio#5705

#&#8203;5720]: tokio-rs/tokio#5720

#&#8203;5659]: tokio-rs/tokio#5659

#&#8203;5628]: tokio-rs/tokio#5628

#&#8203;5666]: tokio-rs/tokio#5666

#&#8203;5672]: tokio-rs/tokio#5672

#&#8203;5690]: tokio-rs/tokio#5690

#&#8203;5704]: tokio-rs/tokio#5704

#&#8203;5566]: tokio-rs/tokio#5566

#&#8203;5721]: tokio-rs/tokio#5721

#&#8203;5790]: tokio-rs/tokio#5790

#&#8203;5608]: tokio-rs/tokio#5608

#&#8203;5676]: tokio-rs/tokio#5676

#&#8203;5708]: tokio-rs/tokio#5708

#&#8203;5717]: tokio-rs/tokio#5717

#&#8203;5685]: tokio-rs/tokio#5685

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi4wLjAiLCJ1cGRhdGVkSW5WZXIiOiIzNi4wLjAiLCJ0YXJnZXRCcmFuY2giOiJkZXZlbG9wIn0=-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1958
Reviewed-by: crapStone <crapstone01@gmail.com>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-coop Module: tokio/coop M-runtime Module: tokio/runtime R-loom Run loom tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants