Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bin/client): don't close closing connection #1866

Merged
merged 2 commits into from
May 2, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented May 2, 2024

The bin/src/client/mod.rs Runner::run function continuously checks whether there is more work. In case there is none, it initiates closing of the connection (self.client.close) and then continues to the top of the loop in order to send out a closing frame.

async fn run(mut self) -> Res<Option<ResumptionToken>> {
loop {
let handler_done = self.handler.handle(&mut self.client)?;
self.process_output().await?;
if self.client.has_events() {
continue;
}
match (handler_done, self.client.is_closed()?) {
// more work
(false, _) => {}
// no more work, closing connection
(true, false) => {
self.client.close(Instant::now(), 0, "kthxbye!");
continue;
}
// no more work, connection closed, terminating
(true, true) => break,
}
match ready(self.socket, self.timeout.as_mut()).await? {
Ready::Socket => self.process_multiple_input().await?,
Ready::Timeout => {
self.timeout = None;
}
}
}
if self.args.stats {
qinfo!("{:?}", self.client.stats());
}
Ok(self.handler.take_token())
}

There is a potential busy loop when closing an already closing connection. Runner::run will call self.client.close and then continously continue to the top of the loop.

This commit differentiates a connection state in NotClosing, Closing and Closed. It only attempts to close a NotClosing connection and only then continues to the top of the loop.


Fixes #1864.

The `bin/src/client/mod.rs` `Runner::run` function continuously checks whether
there is more work. In case there is none, it initiates closing of the
connection (`self.client.close`) and then `continue`s to the top of the loop in
order to send out a closing frame.

https://github.com/mozilla/neqo/blob/14cafbaa7fa88434def2c1d19e932c08e00173f8/neqo-bin/src/client/mod.rs#L376-L409

There is a potential busy loop when closing an already closing connection.
`Runner::run` will call `self.client.close` and then continously `continue` to
the top of the loop.

This commit differentiates a connection state in `NotClosing`, `Closing` and
`Closed`. It only attempts to close a `NotClosing` connection and only then
`continue`s to the top of the loop.
@larseggert
Copy link
Collaborator

Any chance we can refactor this to avoid the duplication between h09 and h3?

Copy link

github-actions bot commented May 2, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-bin/src/client/http3.rs Outdated Show resolved Hide resolved
@larseggert larseggert enabled auto-merge May 2, 2024 12:38
auto-merge was automatically disabled May 2, 2024 13:19

Head branch was pushed to by a user without write access

@mxinden
Copy link
Collaborator Author

mxinden commented May 2, 2024

@larseggert @martinthomson the latest commit does the following:

  • Introduce ConnectionError::is_error. (Long term I think it would be great for ConnectionError not to contain ApplicationError(0) and TransportError::NoError, but I think that is a matter for another pull request.)
  • Leverage TryFrom, making the conversion a bit more idiomatic.

I think it is an improvement. I don't think it is great.

Let me know what you think.

Copy link
Collaborator

@larseggert larseggert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, good enough I think.

We should refactor the h09 and h3 client functionality in a different PR, to reduce duplication. (And maybe even do so with the server, too.)

@larseggert larseggert added this pull request to the merge queue May 2, 2024
Merged via the queue into mozilla:main with commit 87bf852 May 2, 2024
47 checks passed
Copy link

github-actions bot commented May 2, 2024

Benchmark results

Performance differences relative to 8c4411a.

  • drain a timer quickly time: [310.31 ns 317.70 ns 324.72 ns]
    change: [-2.3280% -0.3240% +1.7597%] (p = 0.78 > 0.05)
    No change in performance detected.

  • coalesce_acked_from_zero 1+1 entries
    time: [198.54 ns 199.00 ns 199.48 ns]
    change: [+1.1852% +1.9668% +2.5404%] (p = 0.00 < 0.05)
    💔 Performance has regressed.

  • coalesce_acked_from_zero 3+1 entries
    time: [240.97 ns 241.62 ns 242.34 ns]
    change: [+1.2777% +1.6570% +2.0655%] (p = 0.00 < 0.05)
    💔 Performance has regressed.

  • coalesce_acked_from_zero 10+1 entries
    time: [240.23 ns 240.91 ns 241.73 ns]
    change: [+1.0318% +1.5546% +2.0411%] (p = 0.00 < 0.05)
    💔 Performance has regressed.

  • coalesce_acked_from_zero 1000+1 entries
    time: [220.76 ns 220.95 ns 221.16 ns]
    change: [+0.6344% +1.2844% +2.0019%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • RxStreamOrderer::inbound_frame()
    time: [118.24 ms 118.34 ms 118.44 ms]
    change: [-1.0798% -0.9656% -0.8594%] (p = 0.00 < 0.05)
    Change within noise threshold.

  • transfer/Run multiple transfers with varying seeds
    time: [117.39 ms 117.64 ms 117.90 ms]
    thrpt: [33.928 MiB/s 34.001 MiB/s 34.076 MiB/s]
    change:
    time: [-1.4974% -1.1989% -0.9023%] (p = 0.00 < 0.05)
    thrpt: [+0.9105% +1.2135% +1.5202%]
    Change within noise threshold.

  • transfer/Run multiple transfers with the same seed
    time: [117.73 ms 117.95 ms 118.17 ms]
    thrpt: [33.850 MiB/s 33.912 MiB/s 33.976 MiB/s]
    change:
    time: [-0.9995% -0.7424% -0.4725%] (p = 0.00 < 0.05)
    thrpt: [+0.4748% +0.7480% +1.0096%]
    Change within noise threshold.

  • 1-conn/1-100mb-resp (aka. Download)/client
    time: [1.0987 s 1.1120 s 1.1289 s]
    thrpt: [88.582 MiB/s 89.926 MiB/s 91.015 MiB/s]
    change:
    time: [-3.4960% -1.9108% -0.2798%] (p = 0.05 < 0.05)
    thrpt: [+0.2806% +1.9481% +3.6226%]
    Change within noise threshold.

  • 1-conn/10_000-parallel-1b-resp (aka. RPS)/client
    time: [429.14 ms 431.47 ms 433.80 ms]
    thrpt: [23.052 Kelem/s 23.177 Kelem/s 23.302 Kelem/s]
    change:
    time: [-1.0116% -0.2803% +0.4576%] (p = 0.46 > 0.05)
    thrpt: [-0.4555% +0.2811% +1.0219%]
    No change in performance detected.

  • 1-conn/1-1b-resp (aka. HPS)/client
    time: [50.255 ms 50.462 ms 50.675 ms]
    thrpt: [19.733 elem/s 19.817 elem/s 19.899 elem/s]
    change:
    time: [+2.1707% +3.7633% +5.2029%] (p = 0.00 < 0.05)
    thrpt: [-4.9456% -3.6268% -2.1246%]
    💔 Performance has regressed.

Client/server transfer results

Transfer of 134217728 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 762.9 ± 231.0 492.3 1133.7 1.00
neqo msquic reno on 1038.5 ± 199.3 781.3 1373.3 1.00
neqo msquic reno 1015.1 ± 256.9 754.3 1388.6 1.00
neqo msquic cubic on 868.5 ± 109.8 742.7 1080.1 1.00
neqo msquic cubic 904.3 ± 215.8 743.1 1318.0 1.00
msquic neqo reno on 4316.9 ± 213.6 4071.4 4762.3 1.00
msquic neqo reno 4322.8 ± 152.9 4072.3 4568.6 1.00
msquic neqo cubic on 4422.5 ± 148.4 4233.2 4663.9 1.00
msquic neqo cubic 4339.6 ± 157.4 4091.4 4597.8 1.00
neqo neqo reno on 3574.7 ± 223.6 3251.6 3915.6 1.00
neqo neqo reno 3501.0 ± 296.1 2750.1 3743.5 1.00
neqo neqo cubic on 3983.6 ± 561.2 2844.5 4721.5 1.00
neqo neqo cubic 4029.1 ± 498.2 2977.3 4874.8 1.00

⬇️ Download logs

@martinthomson
Copy link
Member

@mxinden

Long term I think it would be great for ConnectionError not to contain ApplicationError(0) and TransportError::NoError, but I think that is a matter for another pull request.

That might be because ConnectionError is not a good name for the type. A better name might be CloseReason.

mxinden added a commit to mxinden/neqo that referenced this pull request May 3, 2024
The `neqo_transport::ConnectionError` enum contains the two non-error variants
`Error::NoError` and `CloseReason::Application(0)`. In other words,
`ConnectionError` contains variants that are not errors.

This commit renames `ConnectionError` to the more descriptive name
`CloseReason`.

See suggestion in mozilla#1866 (comment).

To ease the upgrade for downstream users, this commit adds a deprecated
`ConnectionError`, guiding users to rename to `CloseReason` via a deprecation warning.

``` rust
pub type ConnectionError = CloseReason;
```
mxinden added a commit to mxinden/neqo that referenced this pull request May 3, 2024
The `neqo_transport::ConnectionError` enum contains the two non-error variants
`Error::NoError` and `CloseReason::Application(0)`. In other words,
`ConnectionError` contains variants that are not errors.

This commit renames `ConnectionError` to the more descriptive name
`CloseReason`.

See suggestion in mozilla#1866 (comment).

To ease the upgrade for downstream users, this commit adds a deprecated
`ConnectionError`, guiding users to rename to `CloseReason` via a deprecation warning.

``` rust
pub type ConnectionError = CloseReason;
```
github-merge-queue bot pushed a commit that referenced this pull request May 3, 2024
The `neqo_transport::ConnectionError` enum contains the two non-error variants
`Error::NoError` and `CloseReason::Application(0)`. In other words,
`ConnectionError` contains variants that are not errors.

This commit renames `ConnectionError` to the more descriptive name
`CloseReason`.

See suggestion in #1866 (comment).

To ease the upgrade for downstream users, this commit adds a deprecated
`ConnectionError`, guiding users to rename to `CloseReason` via a deprecation warning.

``` rust
pub type ConnectionError = CloseReason;
```
mxinden added a commit to mxinden/neqo that referenced this pull request May 4, 2024
There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- mozilla#1564
- mozilla#1569
- mozilla#1578
- mozilla#1581
- mozilla#1604
- mozilla#1612
- mozilla#1676
- mozilla#1692
- mozilla#1707
- mozilla#1708
- mozilla#1727
- mozilla#1753
- mozilla#1756
- mozilla#1766
- mozilla#1772
- mozilla#1786
- mozilla#1787
- mozilla#1788
- mozilla#1794
- mozilla#1806
- mozilla#1808
- mozilla#1848
- mozilla#1866

At this point, bugs in (2) are hard to fix, see e.g.
mozilla#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).
KershawChang pushed a commit to KershawChang/neqo that referenced this pull request May 7, 2024
There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- mozilla#1564
- mozilla#1569
- mozilla#1578
- mozilla#1581
- mozilla#1604
- mozilla#1612
- mozilla#1676
- mozilla#1692
- mozilla#1707
- mozilla#1708
- mozilla#1727
- mozilla#1753
- mozilla#1756
- mozilla#1766
- mozilla#1772
- mozilla#1786
- mozilla#1787
- mozilla#1788
- mozilla#1794
- mozilla#1806
- mozilla#1808
- mozilla#1848
- mozilla#1866

At this point, bugs in (2) are hard to fix, see e.g.
mozilla#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).
github-merge-queue bot pushed a commit that referenced this pull request May 8, 2024
* refactor(bin): introduce server/http3.rs and server/http09.rs

The QUIC Interop Runner requires an http3 and http09 implementation for both
client and server. The client code is already structured into an http3 and an
http09 implementation since #1727.

This commit does the same for the server side, i.e. splits the http3 and http09
implementation into separate Rust modules.

* refactor: merge mozilla-central http3 server into neqo-bin

There are two server implementations based on neqo:

1. https://github.com/mozilla/neqo/tree/main/neqo-bin/src/server
  - http3 and http09 implementation
  - used for manual testing and QUIC Interop

2. https://searchfox.org/mozilla-central/source/netwerk/test/http3server/src/main.rs
  - used to test Firefox

I assume one was once an exact copy of the other. Both implement their own I/O,
event loop, ... Since then, the two implementations diverged significantly.
Especially (1) saw a lot of improvements in recent months:

- #1564
- #1569
- #1578
- #1581
- #1604
- #1612
- #1676
- #1692
- #1707
- #1708
- #1727
- #1753
- #1756
- #1766
- #1772
- #1786
- #1787
- #1788
- #1794
- #1806
- #1808
- #1848
- #1866

At this point, bugs in (2) are hard to fix, see e.g.
#1801.

This commit merges (2) into (1), thus removing all duplicate logic and
having (2) benefit from all the recent improvements to (1).

* Move firefox.rs to mozilla-central

* Reduce HttpServer trait functions

* Extract constructor

* Remove unused deps

* Remove clap color feature

Nice to have. Adds multiple dependencies. Hard to justify for mozilla-central.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client spews "Setting timeout of" messages
3 participants