-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
router: fixing a watermark bug for streaming retries #10866
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Buffer::OwnedImpl data(std::string(640000, 'a')); | ||
codec_client_->sendData(encoder_decoder.first, data, false); | ||
if (paused(test_server_)) { | ||
usleep(5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of sleeping, are there any metrics we can monitor (rx bytes on downstream, tx bytes on upstream maybe?) to monitor for when the desired state is achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I think it's a similar race there. I think if I stopped doing this for both H1 and H2 I could probably get something hard-coded that worked at least on a given platform. Given this is testing in-Envoy retry logic and is pretty protocol agnostic I think that'd be sufficient for an integration test. Would you prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always a fan of not using sleep in tests whenever possible. If it looks viable, and it'll be less racy than sleep, I'd prefer that. If it ends up being just as racy, don't bother.
Also, you have a real failure in the coverage build. |
/wait |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Yeah, after spending over an hour I can't get a non-flaky version that's not terrible. I'll just stick with the unit test :-/ |
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Fixes an issue where, if a retry was attempted when the upstream connection was watermark-overrun, data might spool upstream but reading from downstream would not resume. This is a preexisting design flaw which manifests now that we have streaming retries (causing them to time out rather than succeed, if the upstream buffer limit is smaller than the downstream buffer limit, and it backs up due to upstream slowness) because if the whole request is read, the loop of unwinding pause between requests takes care of it. Risk Level: Medium (watermarks) Testing: new unit test Docs Changes: n/a Release Notes: n/a Signed-off-by: Alyssa Wilk <alyssar@chromium.org> Signed-off-by: pengg <pengg@google.com>
Signed-off-by: Spencer Lewis <slewis@squareup.com> * master: (46 commits) allow specifying the API version of bootstrap from the command line (envoyproxy#10803) config: adding connect matcher (unused) (envoyproxy#10894) Add missing dependency on `assert.h` (envoyproxy#10918) Lower heap and disk space used by kafka tests (envoyproxy#10915) [tools] handle commits merged without PR in deprecated script (envoyproxy#10723) tools: including working tree in modified_since_last_github_commit.sh diff. (envoyproxy#10911) rocketmq_proxy: implement rocketmq proxy [docs] PR template to include commit message (envoyproxy#10900) docs: breaking long word to stop content overflow. (envoyproxy#10880) Delete legacy connection pool code. (envoyproxy#10881) wasm: clarify how configuration is passed (envoyproxy#10782) issue template: clarify security/crash reporting (envoyproxy#10885) api/faq: add entry on incremental xDS. (envoyproxy#10876) router: retry overloaded requests (envoyproxy#10847) Remove inclusion of pthread.h, not needed for linux compilation (envoyproxy#10895) request_id: Add option to always set request id in response (envoyproxy#10808) xray: Use correct types for segment document output (envoyproxy#10834) router: fixing a watermark bug for streaming retries (envoyproxy#10866) http: auditing Path() calls for safety with Pathless CONNECT (envoyproxy#10851) Remove hardcoded type urls Part.2 (envoyproxy#10848) ...
Fixes an issue where, if a retry was attempted when the upstream connection was watermark-overrun, data might spool upstream but reading from downstream would not resume. This is a preexisting design flaw which manifests now that we have streaming retries (causing them to time out rather than succeed, if the upstream buffer limit is smaller than the downstream buffer limit, and it backs up due to upstream slowness) because if the whole request is read, the loop of unwinding pause between requests takes care of it.
Risk Level: Medium (watermarks)
Testing: new unit test
Docs Changes: n/a
Release Notes: n/a