Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: add optional throttling to BasicScheduler #2016

Closed
wants to merge 4 commits into from

Conversation

fengalin
Copy link

Motivation

This PR is a continuation of #1887.

gst-plugin-threadshare is a framework and a collection of GStreamer
elements which are developped with the Rust programming language and use the
gstreamer-rs bindings to the GStreamer C-based libraries.

In the regular GStreamer model, pipeline elements spawn there own threads to
deal with asynchronous processing. Most of the time, these threads wait for
something to happen which causes multiple overheads. The gst-plugin-threadshare
model allows elements to share threads which reduces context switches and system
calls, thus lowering CPU load.

The framework executor uses tokio. Until release 0.2.0, tokio featured
functions that gave control on the iterations of the executor:

  • turn would execute one iteration and indicate if tasks were handled by
    the executor.
  • The user could request the executor to park & unpark.

Thanks to these features the framework could benefit from the tokio executor
while implementing a throttling strategy. The throttling strategy consists in
grouping tasks, timers and I/O handling by forcing the thread to sleep during a
short duration. For use cases where a large number of events are to be handled
randomly (e.g. 2,000+ live multimedia streams), this induces an even more
efficient use of the CPU and wider bandwidth.

Since the throttling strategy introduces pauses in the execution thread, timers
need a special treatment so that they are fired close enough to their target
instant. In order to avoid timers from being triggered too late,
gst-plugin-threadshare implemented its own timers management.

See this blog post for a more in-depth explanation of the throttling strategy.

Two solutions are considered for gst-plugin-threadshare to keep up with
tokio new versions (see this thread):

  1. Introduce the throttling strategy as an option in tokio's executor.
  2. Re-introduce an API in tokio's executor so that gst-plugin-threadshare
    can keep its throttling strategy.

Solution (1) presents the following benefits:

  • Other projects could activate the throttling strategy depending on their
    use cases.
  • The timers special handling with regard to throttling would be implemented
    in tokio, meaning gst-plugin-threadshare would not implement its own
    timers management.

This PR is an implementation of solution (1). Follow this link
for an evaluation of the resulting behavior for the BasicScheduler.

Solution

This PR introduces a new optional duration parameter max_throttling. If the
user provides a non-null duration and selects the basic scheduler, the
throttling strategy is activated. max_throttling is the maximum duration
of the thread pauses and the minimal duration between two consecutive timers &
I/O polls.

When throttling is activated, the BasicScheduler loops on the new
tick_throttling function instead of tick. The tick_throttling function
uses a state machine to control the descent to the state where the thread would
be actually paused if the max_throttling hasn't elapsed yet since the instant the
timers & I/O where last polled.

The purpose of the thread pause is to not be awaken before the intended duration
has elapsed, so the actual pause doesn't rely on a driver's park implementation,
but on a call to std::thread::sleep.

Since timers I/O are never polled more often than max_throttling ms, timers
need a special care. The idea is to consider time frames which start at a polling
instant. Consider a time frame starting now. Since current time frame will last
at least max_throttling ms, we need to fire timers that will elapse before
the first half of current timer frame, otherwise they would be fire further to
their intended instant and always too late.

In order to implement this strategy with minimal impact on existing code, it was
decided to add a new mode to the time::Driver. When this mode is activated,
the park_timeout method fires the timers as described above, polls the I/O
via the inner driver and returns immediately. This way the scheduler can still
trigger tasks that would depend on the timers before going to sleep.

Tests

In order to ensure on par behavior with the regular tick algorithm,
basic_scheduler_throttling was added to rt_common. A new test
rt_basic_throttling.rs takes care of checking the time frame strategy.

Tests were also conducted as part of this evaluation
which covers a tokio only benchmark as well as a comparison of the
implementations used for the gst-plugin-threadshare framework.

@fengalin fengalin changed the title Fengalin/throttling Optional BasicScheduler throttling Dec 22, 2019
@fengalin
Copy link
Author

Failure is due to elapsed duration checks on MacOS. I know elapsed duration checks are not ideal in CI, but I though they were important here. I already relaxed the boundary compared to the limit I tested on my box, but it doesn't seem enough. Unfortunately, I don't have a MacOS to ascertain this is not a bug in my implementation.

Will try to figure out a solution. Any suggestion welcome! :)

@fengalin
Copy link
Author

fengalin commented Dec 23, 2019

Still no luck with CI on macOS. It's the only platform which show inconsistent
elapsed durations for the rt_basic_throttling tests:

Test elapsed Max. boundary
at_root_one_time_frame (X) 214.6 200
in_spawn_one_time_frame 191.9 200
in_spawn_two_time_frames 191.4 300
at_root_two_time_frames 191.9 300

Note: not always the same test in error + it seems very suspicious to me
that they all show a similar elapsed duration when the two_time_frames
variants should be around twice as long as the one_time_frame tests.

Windows & Linux show expected behavior:

Test elapsed Max. boundary
at_root_one_time_frame 100.2 200
in_spawn_one_time_frame 100.1 200
in_spawn_two_time_frames 200.3 300
at_root_two_time_frames 200.4 300

These results were extracted from this CI execution which was triggered by
1178fb9 to investigate these results.

@fengalin fengalin force-pushed the fengalin/throttling branch 6 times, most recently from fc1e764 to 23e4dc2 Compare December 24, 2019 12:06
@fengalin
Copy link
Author

I added a commit to account for the dispersion in timing for CI
on macOS using rt_common timer based tests.

Commit: 23e4dc2, CI run: https://dev.azure.com/tokio-rs/Tokio/_build/results?buildId=4297&view=results

test Linux macOS Windows
basic_scheduler::delay_at_root 51.2 51.4 61.2
basic_scheduler::delay_in_spawn 51.3 160.3 61.4
basic_scheduler_throttling::delay_at_root 51.0 55.3 60.5
basic_scheduler_throttling::delay_in_spawn 51.4 53.3 61.9
threaded_scheduler::delay_at_root 51.4 78.3 59.2
threaded_scheduler::delay_in_spawn 51.3 73.3 61.3

I choose these tests because they were not added by this PR
and some of them don't make use of the feature introduced in
this PR. With CI on Linux and Windows, the timing remains in
the same range. With CI on macOS, the timing raises up to
160.3ms where it should be close to 50ms.

My conclusion is that we can't check timing bounds with CI on
macOS. As a consequence, I decided to discard the checks
on this OS for the new tests in rt_basic_throttling.

@fengalin fengalin force-pushed the fengalin/throttling branch 4 times, most recently from 99c7b5a to a05771c Compare December 25, 2019 10:06
@fengalin fengalin changed the title Optional BasicScheduler throttling rt: add optional throttling to BasicScheduler Dec 26, 2019
@fengalin fengalin force-pushed the fengalin/throttling branch 2 times, most recently from 73852c3 to e759c81 Compare January 8, 2020 16:09
@ghost
Copy link

ghost commented Jan 10, 2020

Could anybody tell which issue is blocking the PR? is there any chance that it can be merged in upcoming release?

@sdroege
Copy link
Contributor

sdroege commented Jan 10, 2020

Could anybody tell which issue is blocking the PR?

It needs one of the maintainers to actually review it and first of all consider the idea good and something they'd like to have as part of tokio.

@fengalin fengalin requested a review from carllerche January 22, 2020 13:58
@fengalin fengalin force-pushed the fengalin/throttling branch 2 times, most recently from aad2846 to 3136c44 Compare February 28, 2020 15:58
@fengalin fengalin force-pushed the fengalin/throttling branch from 4ef1a66 to 5f95c38 Compare April 11, 2020 17:58
@vorot93 vorot93 added A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-runtime Module: tokio/runtime labels Apr 24, 2020
With some tasks, we don't want to take the risk of waiting for the
max_throttling time to elapse before being handled. This is the case
when a user starts many task in a serialized way waitting for an
acknowledgment that each task is indeed spawned. In this case, we
want the task to be handled immediately so that we don't risk to
wait max_throttling time for each task.
@fengalin fengalin force-pushed the fengalin/throttling branch from 5f95c38 to 37cb931 Compare April 24, 2020 12:55
@carllerche
Copy link
Member

Sorry for the radio silence.

I have read your description a few times and glanced at the source.

I am not disputing that batching/throttling is a useful optimization technique. I am not following why batching/throttling needs to be implemented in the Tokio runtime and not a layer between Tokio and your application. Could you clarify this?

@fengalin
Copy link
Author

We used to implement this as a layer, but since the removal of of turn and user callable park & unpark, we couldn't any more (see #1887).

We currently use a fork and plan on moving to a dedicate runtime. Sorry for accidentally pushing to this branch and reviving this PR.

Closing it now.

@fengalin fengalin closed this Apr 28, 2020
@carllerche
Copy link
Member

It's fine.

However, what I am trying to understand is why does this capability require any specific runtime support? Batching / throttling of action can happen purely at the futures level. I could be missing something. I am trying to understand better because and make sure that I am not missing anything.

@carllerche
Copy link
Member

For example, you could implement throttled_spawn(impl Future) that throttles execution of the future without any specific runtime support. Would this handle your case?

@carllerche
Copy link
Member

It is also possible to implement a fully self-contained executor that implements throttling. For example, LocalSet is a self-contained scheduler: https://github.com/tokio-rs/tokio/blob/master/tokio/src/task/local.rs

@sdroege
Copy link
Contributor

sdroege commented Apr 29, 2020

While throttling can be implemented on the futures level, this has the problem that the reactor is still woken up whenever e.g. a new UDP packet is received. The main goal here is to throttle the reactor to (as a result) throttle the overall rate of wakeups and batch them all together.

A self-contained executor/reactor (both combined) might be an option, I guess we missed that enough API for implementing that from the outside is exposed in tokio 0.2. That might be worth trying. However as @fengalin mentioned above, we're currently looking into moving to a custom, minimal runtime instead. This issue (and generally quite custom behaviour) is of course one big reason for that, but overall we need more direct control over the runtime behavior than what tokio can provide and also only really need maybe 10% of the features tokio offers :) It's a very specific use-case and it seems unrealistic that a general purpose runtime would be fully adaptable to that, or to expect that such niche features like this one here could be merged into tokio (and increase the maintenance burden for everyone).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-enhancement Category: A PR with an enhancement or bugfix. M-runtime Module: tokio/runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants