Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send connection WINDOW_UPDATE before RTT PING #97881

Merged
merged 11 commits into from
Feb 13, 2024

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Feb 2, 2024

Fixes #97131.

Some servers in GCP have a non-standard ping accounting mechanism, meaning they reset their unsolicited PING counter when they receive DATA, HEADERS or WINDOW_UPDATE. To comply with this behavior, this PR adjusts the RTT logic ensuring that a connection WINDOW_UPDATE is being sent out before we send an RTT PING by applying two changes:

  • Swap the order of the normal connection WINDOW_UPDATE we attempt after every DATA frame received and the sending of the RTT PING so WINDOW_UPDATE goes first.
  • In case we need to send an RTT PING, and we didn't send a normal connection WINDOW_UPDATE (threshold not reached), send a WINDOW_UPDATE anyways with current _pendingWindowUpdate credits. Do not send a PING if this is not possible (_pendingWindowUpdate == 0). Just like RTT PINGs, such WINDOW_UPDATEs are relatively rare and should not hurt performance.

Ideally, this fix should be backported to all supported versions.

@ghost
Copy link

ghost commented Feb 2, 2024

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #97131.

Some servers in GCP have a non-standard ping accounting mechanism, meaning they reset their unsolicited PING counter when receiving DATA, HEADERS or WINDOW_UPDATE. To comply with this behavior, this PR adjusts the RTT logic ensuring that a connection WINDOW_UPDATE is being sent out before we send an RTT PING by doing two changes:

  • Swap the order of the normal connection WINDOW_UPDATE we attempt after every DATA frame received and the sending of the RTT PING
  • In case we need to send an RTT PING, and we didn't send a normal connection WINDOW_UPDATE (because of reaching the threshold), send a WINDOW_UPDATE anyways by sending out all the _pendingWindowUpdate credits. Do not send a PING if this is not possible (_pendingWindowUpdate == 0).

Ideally, this fix should be backported to all supported versions.

Author: antonfirsov
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

Comment on lines 1797 to 1800
if (_pendingWindowUpdate == 0) return false;
LogExceptions(SendWindowUpdateAsync(0, _pendingWindowUpdate));
_pendingWindowUpdate = 0;
return true;
Copy link
Member Author

@antonfirsov antonfirsov Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtendWindow has unnecessary synchronization around _pendingWindowUpdate manipulation which is didn't add to this method. See #97878.

Copy link
Member

@JamesNK JamesNK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lite on unit tests. What about tests for scenarios that will and won't trigger a ping? And a test that fails if WINDOWS_UPDATE isn't present.

// When sendWindowUpdateBeforePing is true, try to send a WINDOW_UPDATE to make Google backends happy.
// Unless we are doing the initial burst, do not send PING if we were not able to send the WINDOW_UPDATE.
// See point 3. in the comments above the class definition for more info.
if (sendWindowUpdateBeforePing && !connection.ForceSendConnectionWindowUpdate() && !initial)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are places that can send a windows update? Can a connection be in the situation where a window update is always being sent somewhere else and when it comes time to send a ping, the pending window update is always empty?

Copy link
Member Author

@antonfirsov antonfirsov Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are places that can send a windows update?

For connection window the standard method is ExtendWindow the new one introduced in this PR is ForceSendConnectionWindowUpdate. There are no other places sending WINDOW_UPDATE for the connection.

Can a connection be in the situation where a window update is always being sent somewhere else, the pending window update is always empty?

ExtendWindow will send a "standard" connection WINDOW_UPDATE only if receiving DATA will make _pendingWindowUpdate pass ConnectionWindowThreshold. If that's the case, sendWindowUpdateBeforePing will be false at this line, so we won't send a WINDOW_UPDATE in DataOrHeadersReceived.

Note that receiving DATA triggers ExtendWindow and DataOrHeadersReceived sequentially (with the change in the PR WINDOW_UPDATE goes first):

if (frameData.Length > 0)
{
bool windowUpdateSent = ExtendWindow(frameData.Length);
if (!endStream)
{
_rttEstimator.OnDataOrHeadersReceived(this, sendWindowUpdateBeforePing: !windowUpdateSent);
}
}

@antonfirsov
Copy link
Member Author

Pushed a test in e93920a.

Copy link
Member

@wfurt wfurt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I don't fully understand the window & ping management but I don't have any concerns. Hopefully testing was done to ensure correctness.
And I guess excessive updates is the biggest risk of this change, right?

@antonfirsov
Copy link
Member Author

excessive updates is the biggest risk of this change, right?

The amount of WINDOW_UPDATE-s we send did not get significantly higher. Now we essentially do the same as the grpc-go client. Their code has been around for quite a while, and I'm not aware of any servers having an issue with this logic.

https://github.com/grpc/grpc-go/blob/d41b01db97ca2e3627b2c9949fffe8f152a4255d/internal/transport/http2_client.go#L1128-L1140

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antonfirsov

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Side question: as far as I understand our current logic, we'll keep sending pings once every ~2 seconds, but only update the RTT if it gets lower.
Are the updates after the initial burst meaningful? That is, we expect the RTT might decrease, but we don't care if it increases?

@antonfirsov
Copy link
Member Author

antonfirsov commented Feb 10, 2024

[Deleted my previous comment because it did not make sense.]

Adjusting RTT upwards would risk growing the window "accidentally" in case of an occasional high delay measurement:

// The condition to extend the window is:
// (_deliveredBytes / dt) * rtt > _streamWindowSize * _windowScaleThresholdMultiplier

With an unstable connection the minimum delay is hopefully constant in long run for most practical SocketsHttpHandler use cases, or at least this is the assumption behind the current strategy.

@antonfirsov
Copy link
Member Author

/azp run runtime-libraries-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@antonfirsov
Copy link
Member Author

CI failures are unrelated, the massive outerloop failures are caused by #96035.

@antonfirsov antonfirsov merged commit fae6720 into dotnet:main Feb 13, 2024
109 of 119 checks passed
@antonfirsov
Copy link
Member Author

/backport to release/8.0-staging

Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/7892640557

@antonfirsov
Copy link
Member Author

/backport to release/7.0-staging

Copy link
Contributor

Started backporting to release/7.0-staging: https://github.com/dotnet/runtime/actions/runs/7892642858

@antonfirsov
Copy link
Member Author

/backport to release/6.0-staging

Copy link
Contributor

Started backporting to release/6.0-staging: https://github.com/dotnet/runtime/actions/runs/7892646182

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[8.0] HTTP/2 request to cloud hosted services is aborted by server with ENHANCE_YOUR_CALM
6 participants