Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bootstrap triggers a debug_assert #4948

Closed
stormshield-frb opened this issue Nov 28, 2023 · 2 comments · Fixed by #4971
Closed

bootstrap triggers a debug_assert #4948

stormshield-frb opened this issue Nov 28, 2023 · 2 comments · Fixed by #4971
Labels

Comments

@stormshield-frb
Copy link
Contributor

Summary

We are encountering a strange bug in protocols/kad/src/handler.rs:queue_new_stream : the debug_assert is reached.

debug_assert!(
result.is_ok(),
"Expected to not create more streams than allowed"
);

After doing some heavy diagnostics, we have concluded there does not seem to be a bug with FuturesMap (as we thought at first).

Indeed, for some reason (that we do not fully understand, and we do not know if it is normal or not), a bootstrap request seems to trigger multiple FindeNodeReq on the same Connection, with the same QueryId and at the same time, resulting in :

  1. multiple FindeNodeReq being handled in on_behaviour_event
  2. which calls self.pending_messages.push_back
  3. which is itself processed in the poll method of Handler
  4. which calls self.pending_messages.pop_front
  5. which calls self.queue_new_stream
  6. which call self.outbound_substreams.try_push
  7. which returns an error

Expected behavior

The debug_assert should not be reached, no matter the circumstances.

Actual behavior

The debug_assert is reached.

Relevant log output

No response

Possible Solution

Our thought is: is it really Expected to not create more streams than allowed ? From our perspective, there could be some reasons why we try to create a stream even if there is already one for the corresponding request.

Indeed, is there something preventing the libp2p to not emit HandlerIn event on the same Handler and with the same QueryId while the previous one has not been handled ? It does not seem to be the case when reading the on_behaviour_event implementation with all the self.pending_messages.push_back().

Honestly, we are not quite sure where this error is coming from. The peer that was trying to be reached did trigger the timeout of the TimeoutFuture so maybe it is related, but we are not sure.

Version

master branch from yesterday (11/27/2023): dfce3cc

Would you like to work on fixing this bug ?

Yes

@thomaseizinger
Copy link
Contributor

Thank you for the detailed report @stormshield-frb !

This is almost certainly related to #4901. I'll look at the code in more detail and see if I can find anything.

@thomaseizinger
Copy link
Contributor

Our thought is: is it really Expected to not create more streams than allowed ? From our perspective, there could be some reasons why we try to create a stream even if there is already one for the corresponding request.

When I wrote this debug-assert, I didn't think that the same query could actually be sent multiple times (i.e. QueryId not being unique). Thus, the only possible error path would be that the FuturesMap is full which shouldn't happen because we've checked the size before.

I'll send a quick patch that differentiates these errors more precisely so we can confirm that.

cc @mxinden We might need your input here.

@mergify mergify bot closed this as completed in #4971 Dec 5, 2023
mergify bot pushed a commit that referenced this issue Dec 5, 2023
We mistakenly assumed that `QueryId`s are unique in that, only a single request will be emitted per `QueryId`. This is wrong. A bootstrap for example will issue multiple requests as part of the same `QueryId`. Thus, we cannot use the `QueryId` as a key for the `FuturesMap`. Instead, we use a `FuturesTupleSet` to associate the `QueryId` with the in-flight request.

Related: #4901.
Resolves: #4948.

Pull-Request: #4971.
AgeManning pushed a commit to sigp/rust-libp2p that referenced this issue Dec 13, 2023
* ci: unset `RUSTFLAGS` value in semver job

Don't fail semver-checking if a dependency version has warnings, such as deprecation notices.

Related: libp2p#4932 (comment).
Related: obi1kenobi/cargo-semver-checks#589.

Pull-Request: libp2p#4942.

* deps(webrtc): bump alpha versions

Bumps versions of `libp2p-webrtc` and `libp2p-webrtc-websys` up one minor version.

Fixes: libp2p#4953.

Pull-Request: libp2p#4959.

* feat(request-response): derive `PartialOrd`,`Ord` for `{Out,In}RequestId`

Pull-Request: libp2p#4956.

* refactor(connection-limits): make `check_limit` a free-function

Pull-Request: libp2p#4958.

* chore(webrtc-utils): bump version to allow for new release

We didn't bump this crate's version despite it depending on `libp2p_noise`. As such, we can't release `libp2p-webrtc-websys` at the moment because it needs a new release of this crate.

Pull-Request: libp2p#4968.

* feat(webrtc-websys): hide `libp2p_noise` from the public API

Currently, `libp2p-webrtc-websys` exposes the `libp2p_noise` dependency in its public API. It should really be a private dependency of the crate. By wrapping it in a new-type, we can achieve this.

Pull-Request: libp2p#4969.

* fix(kad): iterator progress to be decided by any of new peers

Pull-Request: libp2p#4932.

* chore(quic): set `max_idle_timeout` to quinn default timeout

Resolves libp2p#4917.

Pull-Request: libp2p#4965.

* feat(core): impl Display on ListenerId

Fixes: libp2p#4935.

Pull-Request: libp2p#4936.

* feat(server): support websocket

Pull-Request: libp2p#4937.

* feat(swarm): implement `Copy` and `Clone` for `FromSwarm`

We can make `FromSwarm` implement `Copy` and `Close` which makes it much easier to

a) generate code in `libp2p-swarm-derive`
b) manually wrap a `NetworkBehaviour`

Previously, we couldn't do this because `ConnectionClosed` would have a `handler` field that cannot be cloned / copied.

Related: libp2p#4076.
Related: libp2p#4581.

Pull-Request: libp2p#4825.

* deps: bump wasm-bindgen-futures from 0.4.38 to 0.4.39

Pull-Request: libp2p#4946.

* feat(connection-limit): add function to mutate `ConnectionLimits`

Resolves: libp2p#4826.

Pull-Request: libp2p#4964.

* deps: bump web-sys from 0.3.65 to 0.3.66

Pull-Request: libp2p#4976.

* deps: bump wasm-bindgen-test from 0.3.38 to 0.3.39

Pull-Request: libp2p#4975.

* fix(kad): don't assume `QuerId`s are unique

We mistakenly assumed that `QueryId`s are unique in that, only a single request will be emitted per `QueryId`. This is wrong. A bootstrap for example will issue multiple requests as part of the same `QueryId`. Thus, we cannot use the `QueryId` as a key for the `FuturesMap`. Instead, we use a `FuturesTupleSet` to associate the `QueryId` with the in-flight request.

Related: libp2p#4901.
Resolves: libp2p#4948.

Pull-Request: libp2p#4971.

* fix(webrtc example): clarify idle connection timeout

When I ran the `example/browser-webrtc` example I discovered it would break after a ping or two.
The `Ping` idle timeout needed to be extended, on both the server and the wasm client, which is what this PR fixes.
I also added a small note to the README about ensuring `wasm-pack` is install for the users who are new to the ecosystem.

Fixes: libp2p#4950.

Pull-Request: libp2p#4966.

* docs(examples/readme): fix broken link

Related: libp2p#3536.

Pull-Request: libp2p#4984.

* feat(yamux): auto-tune (dynamic) stream receive window

libp2p/rust-yamux#176 enables auto-tuning for the Yamux stream receive window. While preserving small buffers on low-latency and/or low-bandwidth connections, this change allows for high-latency and/or high-bandwidth connections to exhaust the available bandwidth on a single stream.

Using the [libp2p perf](https://github.com/libp2p/test-plans/blob/master/perf/README.md) benchmark tools (60ms, 10Gbit/s) shows an **improvement from 33 Mbit/s to 1.3 Gbit/s** in single stream throughput.

See libp2p/rust-yamux#176 for details.

To ship the above Rust Yamux change in a libp2p patch release (non-breaking), this pull request uses `yamux` `v0.13` (new version) by default and falls back to `yamux` `v0.12` (old version) when setting any configuration options. Thus default users benefit from the increased performance, while power users with custom configurations maintain the old behavior.

Pull-Request: libp2p#4970.

* deps: bump actions/deploy-pages from 2 to 3

Pull-Request: libp2p#4978.

* deps: bump the axum group with 2 updates

Pull-Request: libp2p#4943.

* chore(webrtc-websys): remove unused dependencies

Pull-Request: libp2p#4973.

* chore(quic): fix link to PR in changelog

Pull-Request: libp2p#4993.

* deps: bump tokio from 1.34.0 to 1.35.0

Pull-Request: libp2p#4995.

* deps: bump syn from 2.0.39 to 2.0.40

Pull-Request: libp2p#4996.

* deps: bump once_cell from 1.18.0 to 1.19.0

Pull-Request: libp2p#4998.

---------

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
Co-authored-by: Doug A <douganderson444@gmail.com>
Co-authored-by: Darius Clark <dariusc93@users.noreply.github.com>
Co-authored-by: zhiqiangxu <652732310@qq.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: maqi <qi.ma@maidsafe.net>
Co-authored-by: stormshield-frb <144998884+stormshield-frb@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: NAHO <90870942+trueNAHO@users.noreply.github.com>
AgeManning pushed a commit to sigp/rust-libp2p that referenced this issue Jan 15, 2024
* ci: unset `RUSTFLAGS` value in semver job

Don't fail semver-checking if a dependency version has warnings, such as deprecation notices.

Related: libp2p#4932 (comment).
Related: obi1kenobi/cargo-semver-checks#589.

Pull-Request: libp2p#4942.

* deps(webrtc): bump alpha versions

Bumps versions of `libp2p-webrtc` and `libp2p-webrtc-websys` up one minor version.

Fixes: libp2p#4953.

Pull-Request: libp2p#4959.

* feat(request-response): derive `PartialOrd`,`Ord` for `{Out,In}RequestId`

Pull-Request: libp2p#4956.

* refactor(connection-limits): make `check_limit` a free-function

Pull-Request: libp2p#4958.

* chore(webrtc-utils): bump version to allow for new release

We didn't bump this crate's version despite it depending on `libp2p_noise`. As such, we can't release `libp2p-webrtc-websys` at the moment because it needs a new release of this crate.

Pull-Request: libp2p#4968.

* feat(webrtc-websys): hide `libp2p_noise` from the public API

Currently, `libp2p-webrtc-websys` exposes the `libp2p_noise` dependency in its public API. It should really be a private dependency of the crate. By wrapping it in a new-type, we can achieve this.

Pull-Request: libp2p#4969.

* fix(kad): iterator progress to be decided by any of new peers

Pull-Request: libp2p#4932.

* chore(quic): set `max_idle_timeout` to quinn default timeout

Resolves libp2p#4917.

Pull-Request: libp2p#4965.

* feat(core): impl Display on ListenerId

Fixes: libp2p#4935.

Pull-Request: libp2p#4936.

* feat(server): support websocket

Pull-Request: libp2p#4937.

* feat(swarm): implement `Copy` and `Clone` for `FromSwarm`

We can make `FromSwarm` implement `Copy` and `Close` which makes it much easier to

a) generate code in `libp2p-swarm-derive`
b) manually wrap a `NetworkBehaviour`

Previously, we couldn't do this because `ConnectionClosed` would have a `handler` field that cannot be cloned / copied.

Related: libp2p#4076.
Related: libp2p#4581.

Pull-Request: libp2p#4825.

* deps: bump wasm-bindgen-futures from 0.4.38 to 0.4.39

Pull-Request: libp2p#4946.

* feat(connection-limit): add function to mutate `ConnectionLimits`

Resolves: libp2p#4826.

Pull-Request: libp2p#4964.

* deps: bump web-sys from 0.3.65 to 0.3.66

Pull-Request: libp2p#4976.

* deps: bump wasm-bindgen-test from 0.3.38 to 0.3.39

Pull-Request: libp2p#4975.

* fix(kad): don't assume `QuerId`s are unique

We mistakenly assumed that `QueryId`s are unique in that, only a single request will be emitted per `QueryId`. This is wrong. A bootstrap for example will issue multiple requests as part of the same `QueryId`. Thus, we cannot use the `QueryId` as a key for the `FuturesMap`. Instead, we use a `FuturesTupleSet` to associate the `QueryId` with the in-flight request.

Related: libp2p#4901.
Resolves: libp2p#4948.

Pull-Request: libp2p#4971.

* fix(webrtc example): clarify idle connection timeout

When I ran the `example/browser-webrtc` example I discovered it would break after a ping or two.
The `Ping` idle timeout needed to be extended, on both the server and the wasm client, which is what this PR fixes.
I also added a small note to the README about ensuring `wasm-pack` is install for the users who are new to the ecosystem.

Fixes: libp2p#4950.

Pull-Request: libp2p#4966.

* docs(examples/readme): fix broken link

Related: libp2p#3536.

Pull-Request: libp2p#4984.

* feat(yamux): auto-tune (dynamic) stream receive window

libp2p/rust-yamux#176 enables auto-tuning for the Yamux stream receive window. While preserving small buffers on low-latency and/or low-bandwidth connections, this change allows for high-latency and/or high-bandwidth connections to exhaust the available bandwidth on a single stream.

Using the [libp2p perf](https://github.com/libp2p/test-plans/blob/master/perf/README.md) benchmark tools (60ms, 10Gbit/s) shows an **improvement from 33 Mbit/s to 1.3 Gbit/s** in single stream throughput.

See libp2p/rust-yamux#176 for details.

To ship the above Rust Yamux change in a libp2p patch release (non-breaking), this pull request uses `yamux` `v0.13` (new version) by default and falls back to `yamux` `v0.12` (old version) when setting any configuration options. Thus default users benefit from the increased performance, while power users with custom configurations maintain the old behavior.

Pull-Request: libp2p#4970.

* deps: bump actions/deploy-pages from 2 to 3

Pull-Request: libp2p#4978.

* deps: bump the axum group with 2 updates

Pull-Request: libp2p#4943.

* chore(webrtc-websys): remove unused dependencies

Pull-Request: libp2p#4973.

* chore(quic): fix link to PR in changelog

Pull-Request: libp2p#4993.

* deps: bump tokio from 1.34.0 to 1.35.0

Pull-Request: libp2p#4995.

* deps: bump syn from 2.0.39 to 2.0.40

Pull-Request: libp2p#4996.

* deps: bump once_cell from 1.18.0 to 1.19.0

Pull-Request: libp2p#4998.

* deps: bump hkdf from 0.12.3 to 0.12.4

Pull-Request: libp2p#5009.

* deps: bump clap from 4.4.10 to 4.4.11

Pull-Request: libp2p#4997.

* deps: bump thiserror from 1.0.50 to 1.0.51

Pull-Request: libp2p#5010.

* deps: bump syn from 2.0.40 to 2.0.41

Pull-Request: libp2p#5011.

* deps: bump async-io from 2.2.1 to 2.2.2

Pull-Request: libp2p#5012.

* deps: bump rust-embed from 8.0.0 to 8.1.0

Pull-Request: libp2p#5000.

* chore(deps): bump golang.org/x/crypto from 0.7.0 to 0.17.0

Pull-Request: libp2p#5019.

* deps: bump libc from 0.2.150 to 0.2.151

Pull-Request: libp2p#5002.

* docs: remove security@libp2p.io

I no longer have access to the mailing list. See
libp2p#5007.

Pull-Request: libp2p#5020.

* chore: fix typos

Pull-Request: libp2p#5021.

* fix(derive): restore support for inline generic type constraints

Fixes the `#[NetworkBehaviour]` macro to support generic constraints on behaviours without a where clause, which was the case before v0.51.

Pull-Request: libp2p#5003.

* deps: bump actions/deploy-pages from 3 to 4

Pull-Request: libp2p#5022.

* chore: fix several typos in documentation

Pull-Request: libp2p#5008.

* deps: bump async-trait from 0.1.74 to 0.1.75

Pull-Request: libp2p#5029.

* deps: bump anyhow from 1.0.75 to 1.0.76

Pull-Request: libp2p#5030.

* deps: bump futures-util from 0.3.29 to 0.3.30

Pull-Request: libp2p#5031.

* deps: bump syn from 2.0.41 to 2.0.43

Pull-Request: libp2p#5033.

* deps: bump tokio from 1.35.0 to 1.35.1

Pull-Request: libp2p#5034.

* deps: bump reqwest from 0.11.22 to 0.11.23

Pull-Request: libp2p#5035.

* deps: bump futures from 0.3.29 to 0.3.30

Pull-Request: libp2p#5032.

* deps: bump trybuild from 1.0.85 to 1.0.86

Pull-Request: libp2p#5036.

* deps: bump proc-macro2 from 1.0.69 to 1.0.71

Pull-Request: libp2p#5041.

* deps: bump actions/upload-pages-artifact from 2.0.0 to 3.0.0

Pull-Request: libp2p#5023.

* deps: bump Rust to 1.75 and fix clippy lints

Pull-Request: libp2p#5043.

* deps: bump thiserror from 1.0.51 to 1.0.53

Pull-Request: libp2p#5044.

* deps: bump clap from 4.4.11 to 4.4.12

Pull-Request: libp2p#5046.

* deps: bump tempfile from 3.8.1 to 3.9.0

Pull-Request: libp2p#5047.

* deps: bump rust-embed from 8.1.0 to 8.2.0

Pull-Request: libp2p#5049.

* deps: bump serde_json from 1.0.108 to 1.0.109

Pull-Request: libp2p#5050.

* deps: bump anyhow from 1.0.76 to 1.0.78

Pull-Request: libp2p#5051.

* deps: bump proc-macro2 from 1.0.71 to 1.0.73

Pull-Request: libp2p#5054.

* deps: bump quote from 1.0.33 to 1.0.34

Pull-Request: libp2p#5055.

* deps: bump anyhow from 1.0.78 to 1.0.79

Pull-Request: libp2p#5062.

* deps: bump serde_json from 1.0.109 to 1.0.111

Pull-Request: libp2p#5063.

* deps: bump thiserror from 1.0.53 to 1.0.56

Pull-Request: libp2p#5064.

* deps: bump libc from 0.2.151 to 0.2.152

Pull-Request: libp2p#5065.

* deps: bump trybuild from 1.0.86 to 1.0.88

Pull-Request: libp2p#5068.

* deps: bump proc-macro2 from 1.0.73 to 1.0.76

Pull-Request: libp2p#5069.

* deps: bump clap from 4.4.12 to 4.4.13

Pull-Request: libp2p#5070.

* deps: bump Swatinem/rust-cache from 2.7.1 to 2.7.2

Pull-Request: libp2p#5076.

* deps: bump tj-actions/glob from 17 to 18

Pull-Request: libp2p#5058.

* deps: bump the axum group with 1 update

Pull-Request: libp2p#5045.

* deps: bump quote from 1.0.34 to 1.0.35

Pull-Request: libp2p#5071.

* deps: bump async-trait from 0.1.75 to 0.1.77

Pull-Request: libp2p#5081.

* ci: add dependabot group for webrtc

Pull-Request: libp2p#5082.

* deps: bump base64 from 0.21.5 to 0.21.7

Pull-Request: libp2p#5086.

* deps: bump trybuild from 1.0.88 to 1.0.89

Pull-Request: libp2p#5087.

* deps: bump js-sys from 0.3.66 to 0.3.67

Pull-Request: libp2p#5091.

* deps: bump wasm-bindgen from 0.2.89 to 0.2.90

Pull-Request: libp2p#5089.

* add PeerId to ListenFailure

---------

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
Co-authored-by: Doug A <douganderson444@gmail.com>
Co-authored-by: Darius Clark <dariusc93@users.noreply.github.com>
Co-authored-by: zhiqiangxu <652732310@qq.com>
Co-authored-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: maqi <qi.ma@maidsafe.net>
Co-authored-by: stormshield-frb <144998884+stormshield-frb@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: NAHO <90870942+trueNAHO@users.noreply.github.com>
Co-authored-by: alex <152680487+bodhi-crypo@users.noreply.github.com>
Co-authored-by: Akosh Farkash <aakoshh@gmail.com>
Co-authored-by: Frieren <153332328+Frierened@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants