Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer, limit: use tokio-util's PollSemaphore #556

Merged
merged 5 commits into from
Feb 10, 2021
Merged

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Feb 10, 2021

The Semaphore implementation in tokio::sync doesn't expose a
poll_acquire method to acquire owned semaphore permits in a
non-async-await context. Currently, the limit::concurrency and
buffer middleware use our own pollable wrapper for
tokio::sync::Semaphore. This works by creating a new
Semaphore::acquire_owned future and boxing it every time the semaphore
starts acquiring a new permit.

Recently, the tokio_util crate introduced its own PollSemaphore
wrapper type
. This provides the same functionality of a pollable
version of tokio::sync's Semaphore, just like our semaphore wrapper.
However, tokio_util's version is significantly more efficient: rather
than allocating a new Box for each acquire_owned future, it uses
the ReusableBoxFuture type to reuse a single allocation every
time a new acquire_owned future is needed. This means that rather than
allocating every time a Buffer or ConcurrencyLimit service starts
acquiring a new permit, there's a single allocation for each clone of
the service. Unless services are cloned per-request, this means that the
allocation is moved out of the request path in most cases.

I had originally considered an approach similar to this, but I didn't
think the reduced allocations were worth adding a bunch of unsafe code
in tower (which presently contains no unsafe code). However, this type
fits in perfectly in tokio-util, and now that there's an upstream
implementation, we should use it.

This introduces a dependency on tokio-util when the "limit" or "buffer"
features are enabled.

Additionally, I've added a new test for Buffer asserting that once an
individual Buffer service has been driven to readiness but not called,
additional poll_ready calls won't acquire additional buffer capacity.
This reproduces a bug that existed in earlier versions of
tower::buffer, which could result in starvation of buffer waiters.
This bug doesn't exist in 0.4, but I wanted to ensure that changing the
buffer internals here didn't introduce any new regressions.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This reproduces an issue in previous versions of `tower::buffer`, where
multiple calls to `poll_ready` when the buffer is already ready would
acquire additional channel capacity that was never used. This
essentially leaked channel capacity.

This issue doesn't exist with the current implementation, but this test
should help ensure it isn't reintroduced when changing the buffer
internals (such as switching to `tokio-util`'s pollable channel).

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
@hawkw hawkw added C-enhancement Category: A PR with an enhancement or a proposed on in an issue. A-buffer Area: The tower "buffer" middleware A-tower Area: the tower crate itself labels Feb 10, 2021
@hawkw hawkw requested a review from a team February 10, 2021 19:02
@hawkw hawkw self-assigned this Feb 10, 2021
@hawkw hawkw merged commit ccfaffc into master Feb 10, 2021
@hawkw hawkw deleted the eliza/buffer-fix branch February 10, 2021 19:52
hawkw added a commit to linkerd/linkerd2-proxy that referenced this pull request Feb 17, 2021
…922)

The proxy currently has its own implementation of a `tower` `Service`
that makes an inner service `Clone`able by driving it in a spawned task
and buffering requests on a channel. This also exists upstream, as
`tower::buffer`.

We implemented our own version for a couple of reasons: to avoid an
upstream issue where memory was leaked when a buffered request was
cancelled, and to implement an idle timeout when the buffered service
has been unready for too long. However, it's no longer necessary to
reimplement our own buffer service for these reasons: the upstream bug
was fixed in `tower` 0.4 (see tower-rs/tower#476, tower-rs/tower#480,
and tower-rs/tower#556); and we no longer actually use the buffer idle
timeout (instead, we idle out unresponsive services with the separate
`Failfast` middleware, note that `push_spawn_buffer_with_idle_timeout`
is never actually used).

Therefore, we can remove our _sui generis_ implementation in favour of
`tower::buffer` from upstream. This eliminates dead code for the idle
timeout, which we never actually use, and reduces duplication (since
`tonic` uses `tower::buffer` internally, its code is already compiled
into the proxy). It also reduces the amount of code I'm personally
responsible for maintaining in two separate places ;)

Since the `linkerd-buffer` crate erases the type of the buffered
service, while `tower::buffer` does not, I've changed the
`push_spawn_buffer`/`spawn_buffer` helpers to also include a
`BoxService` layer. This required adding a `BoxServiceLayer` type, since
`BoxService::layer` returns a `LayerFn` with an unnameable type.

Also, this change ran into issues due to a compiler bug where generators
(async blocks) sometimes forget concrete lifetimes,
rust-lang/rust#64552. In order to resolve this, I had to remove the
outermost `async` blocks from the OpenCensus and identity daemon tasks.
These async blocks were used only for emitting a tracing event when the
task is started, so it wasn't a big deal to remove them; I moved the
trace events into the actual daemon task functions, and used a `tracing`
span to propagate the remote addresses which aren't known inside the
daemon task functions.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-buffer Area: The tower "buffer" middleware A-tower Area: the tower crate itself C-enhancement Category: A PR with an enhancement or a proposed on in an issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants