Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent PeerSet service to be buffered #1718

Closed
wants to merge 5 commits into from

Conversation

oxarbitrage
Copy link
Contributor

Motivation

Multiple calls to poll_ready()to the PeerSet service can cause fill ups and hangs if they are buffered.

#1697

Solution

The solution of this PR is to get the name of the service type as a string and check if that contains the word "Buffer" at runtime.
This uses https://doc.rust-lang.org/std/any/fn.type_name.html and by the notes it can fail at some edge cases.

Current and accepted type for the service is:

&mut tower::load::peak_ewma::PeakEwma<zebra_network::peer::client::Client

I manually tested this by changing the string to search from Buffer to PeakEwma, then started zebra and got the panic:

...
The application panicked (crashed).
Message:  assertion failed: !format!("{:?}", type_of(& service)).contains("PeakEwma")
...

The code in this pull request has:

  • Manually tested
  • Unit Tests

Review

@teor2345 when they get the time.

Related Issues

If the solution is good enough to be merged this will close #1697

Follow Up Work

Maybe do a test case in a separated issue.


// Get the type of a variable as a string
use std::any::type_name;
fn type_of<T>(_: T) -> &'static str {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably want to put this somewhere else if we end up keeping it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn type_of<T>(_: T) -> &'static str {
// replace with type_name_of_val when it stabilises
// https://github.com/rust-lang/rust/issues/66359
fn type_of<T>(_: T) -> &'static str {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaahc might have an idea where we want to move this, but it's ok for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we can delete it and just check Self, see: https://github.com/ZcashFoundation/zebra/pull/1718/files#r573338707

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just a few minor tweaks


// `PeerSet` calls `poll_ready` multiple times on its `Client` services.
// `Buffer`ed polls can fill up their Buffer reservations, causing hangs.
// The next `assert` will make sure the service type is never `Buffer`. #1593
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move part of the explanation into an assert message:

Suggested change
// The next `assert` will make sure the service type is never `Buffer`. #1593
// See #1593 for details.

// `PeerSet` calls `poll_ready` multiple times on its `Client` services.
// `Buffer`ed polls can fill up their Buffer reservations, causing hangs.
// The next `assert` will make sure the service type is never `Buffer`. #1593
assert!(!format!("{:?}", type_of(&service)).contains("Buffer"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert!(!format!("{:?}", type_of(&service)).contains("Buffer"));
assert!(!format!("{:?}", type_of(&service)).contains("Buffer"),
"Clients must not use tower::Buffer, because PeerSet uses poll_ready multiple times before each call, which can cause hangs with buffers" );

@@ -423,6 +423,12 @@ where
.ready_services
.get_index_mut(index)
.expect("preselected index must be valid");

// `PeerSet` calls `poll_ready` multiple times on its `Client` services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now in the assert message:

Suggested change
// `PeerSet` calls `poll_ready` multiple times on its `Client` services.


// Get the type of a variable as a string
use std::any::type_name;
fn type_of<T>(_: T) -> &'static str {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaahc might have an idea where we want to move this, but it's ok for now.

Co-authored-by: teor <teor@riseup.net>
Comment on lines 426 to 431

// `Buffer`ed polls can fill up their Buffer reservations, causing hangs.
// See #1593 for details.
assert!(!format!("{:?}", type_of(&service)).contains("Buffer"),
"Clients must not use tower::Buffer, because PeerSet uses
poll_ready multiple times before each call, which can cause hangs with buffers");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peer set type is static, so we only need to check it once.

Let's move it to the start of PeerSet::new, and do the checks on Self, so we don't have to use type_of:

Suggested change
// `Buffer`ed polls can fill up their Buffer reservations, causing hangs.
// See #1593 for details.
assert!(!format!("{:?}", type_of(&service)).contains("Buffer"),
"Clients must not use tower::Buffer, because PeerSet uses
poll_ready multiple times before each call, which can cause hangs with buffers");
// Using `poll_ready` without `call` fills up `Buffer` reservations, causing hangs.
// See #1593 for details.
assert!(!type_name::<Self>().contains("Buffer"),
"Clients must not use tower::Buffer, because PeerSet uses
`poll_ready` multiple times before each `call`, which causes buffer hangs");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self (title case) is the current type:
https://doc.rust-lang.org/std/keyword.Self.html

self (lower case) is the current object:
https://doc.rust-lang.org/std/keyword.self.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think this will work. I added dbg!(type_name::<Self>()); as the first line in new() and here is the type i am getting (it contains the "Buffer" string):

[zebra-network/src/peer_set/set.rs:126] type_name::<Self>() = "zebra_network::peer_set::set::PeerSet<tower::load::peak_ewma::PeakEwmaDiscover<futures_util::stream::stream::filter::Filter<futures_channel::mpsc::Receiver<core::result::Result<tower::discover::Change<std::net::addr::SocketAddr, zebra_network::peer::client::Client>, alloc::boxed::Box<dyn std::error::Error+core::marker::Sync+core::marker::Send>>>, futures_util::future::ready::Ready<bool>, zebra_network::peer_set::initialize::init<tower::load_shed::LoadShed<tower::buffer::service::Buffer<zebrad::components::inbound::Inbound, zebra_network::protocol::internal::request::Request>>>::{{closure}}::{{closure}}>>>"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, we don't need to check the outer PeerSet type, we need to check the inner D::Service type.

The previous code did by check using the service returned by:

                let (key, service) = self
                    .ready_services
                    .get_index_mut(index)
                    .expect("preselected index must be valid");

So we should be able to do:

Suggested change
// `Buffer`ed polls can fill up their Buffer reservations, causing hangs.
// See #1593 for details.
assert!(!format!("{:?}", type_of(&service)).contains("Buffer"),
"Clients must not use tower::Buffer, because PeerSet uses
poll_ready multiple times before each call, which can cause hangs with buffers");
// Using `poll_ready` without `call` fills up `Buffer` reservations, causing hangs.
// See #1593 for details.
assert!(!type_name::<D::Service>().contains("Buffer"),
"Clients must not use tower::Buffer, because PeerSet uses `poll_ready` multiple times before each `call`, which causes buffer hangs");

Copy link
Contributor

@teor2345 teor2345 Feb 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of ready_services is...

ready_services: IndexMap<D::Key, D::Service>,

Copy link
Contributor

@yaahc yaahc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really love the idea of using type_name comparisons for this assertion, it seems like an indirect way to enforce the invariant we care about, and I'm worried about it breaking because the language makes no guarantees about the stability of type names.

The returned string must not be considered to be a unique identifier of a type as multiple types may map to the same type name. Similarly, there is no guarantee that all parts of a type will appear in the returned string: for example, lifetime specifiers are currently not included. In addition, the output may change between versions of the compiler.

I am still thinking of alternatives but to start I think one option that may be better is to use Clone as a proxy for Buffer, since Buffer/Batch are the only types that should introduce "clone-ness" to Services. We'd then use static_assertions to assert that D::Service does not implement Clone. Now, I'm not super happy with this solution either, since it is similarly indirect, but I'm more confident in its reliability, and it doesn't introduce any runtime checks. edit: this won't work due to parametricity

@yaahc
Copy link
Contributor

yaahc commented Feb 10, 2021

Lets hold off on merging this, I'm still figuring out the details but I'm talking to @hawkw right now and I think we're going to fix the issue directly so its no longer possible to misuse Buffer in a way that exhausts all the reservations, then this assertion will be unnecessary.

@oxarbitrage
Copy link
Contributor Author

Sounds good

@teor2345 teor2345 marked this pull request as draft February 10, 2021 04:38
@teor2345 teor2345 added the S-needs-triage Status: A bug report needs triage label Feb 10, 2021
@teor2345
Copy link
Contributor

I've marked this PR as draft pending the tower::Buffer fix.

@mpguerra
Copy link
Contributor

I've marked this PR as draft pending the tower::Buffer fix.

Is the fix you are referring to the one for issue #1671 ?

@teor2345
Copy link
Contributor

I've marked this PR as draft pending the tower::Buffer fix.

Is the fix you are referring to the one for issue #1671 ?

No. #1671 fixes tower-batch, a zebra crate which is based on tower::Buffer, and shares some of its issues. The PeerSet service does not use tower-batch.

The fixes I am referring to were made in the tower crate a while ago (tower-rs/tower#476 and tower-rs/tower#480). But we didn't know these bugs were fixed until yesterday, because the tower::Buffer comments were outdated, and the tower::Buffer tests were incomplete.

Since tower is already fixed, this PR and its ticket are obsolete.

@teor2345 teor2345 closed this Feb 11, 2021
@teor2345 teor2345 added this to the 2021 Sprint 3 milestone Feb 15, 2021
@teor2345
Copy link
Contributor

This PR belongs in sprint 3 because we actually did the work.

@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent Clients from being Buffered in zebra-network
4 participants