Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

access logging: gRPC logger retry to establish underlying stream connection #17469

Merged
merged 18 commits into from
Nov 5, 2021

Conversation

Shikugawa
Copy link
Member

Signed-off-by: Shikugawa rei@tetrate.io

Commit Message: Add a retry mechanism to the grpc access logger. This retry mechanism currently only supports a simple retry count. Also, at the moment, retries are only fired when the gRPC stream fails to be established, and nothing happens if the stream is successfully established once and a reset is issued.
Additional Description:
Risk Level: Low
Testing: Unit
Docs Changes: Required
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: Shikugawa <rei@tetrate.io>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #17469 was opened by Shikugawa.

see: more, trace.

@Shikugawa Shikugawa changed the title access logging: gRPC logger retry stream access logging: gRPC logger retry to establish underlying stream connection Jul 24, 2021
Signed-off-by: Shikugawa <rei@tetrate.io>
Signed-off-by: Shikugawa <rei@tetrate.io>
Signed-off-by: Shikugawa <rei@tetrate.io>
@Shikugawa Shikugawa requested a review from lizan July 28, 2021 07:56
Signed-off-by: Shikugawa <rei@tetrate.io>
Signed-off-by: Shikugawa <rei@tetrate.io>
@dmitri-d
Copy link
Contributor

lgtm!

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/wait

Signed-off-by: Shikugawa <rei@tetrate.io>
Signed-off-by: Shikugawa <rei@tetrate.io>
Signed-off-by: Shikugawa <rei@tetrate.io>
@zuercher
Copy link
Member

zuercher commented Sep 8, 2021

/wait

…er-retry

Signed-off-by: Shikugawa <rei@tetrate.io>
@htuch
Copy link
Member

htuch commented Oct 20, 2021

@lizan friendly ping.

@Shikugawa
Copy link
Member Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17469 (comment) was created by @Shikugawa.

see: more, trace.

lizan
lizan previously approved these changes Oct 27, 2021
Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch can you do a non-tetrands review?

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one Q: how does this behave when we are using the Google gRPC async client vs. the in-built Envoy async gRPC client?

@Shikugawa
Copy link
Member Author

Looks good, just one Q: how does this behave when we are using the Google gRPC async client vs. the in-built Envoy async gRPC client?

It seems opaque behavior on Google gRPC with this API. Maybe we should configure channel args for retry and buffer on google_grpc configuration.

// TODO(htuch): figure out how to propagate "this request should be buffered for
// retry" bit to Google gRPC library.
void GoogleAsyncStreamImpl::initialize(bool /*buffer_body_for_retry*/) {
parent_.stats_.streams_total_->inc();
gpr_timespec abs_deadline =
options_.timeout
? gpr_time_add(gpr_now(GPR_CLOCK_REALTIME),
gpr_time_from_millis(options_.timeout.value().count(), GPR_TIMESPAN))
: gpr_inf_future(GPR_CLOCK_REALTIME);
ctxt_.set_deadline(abs_deadline);
// Fill service-wide initial metadata.
auto initial_metadata = Http::RequestHeaderMapImpl::create();
parent_.metadata_parser_->evaluateHeaders(*initial_metadata, options_.parent_context.stream_info);
callbacks_.onCreateInitialMetadata(*initial_metadata);
initial_metadata->iterate([this](const Http::HeaderEntry& header) {
ctxt_.AddMetadata(std::string(header.key().getStringView()),
std::string(header.value().getStringView()));
return Http::HeaderMap::Iterate::Continue;
});
// Invoke stub call.
rw_ = parent_.stub_->PrepareCall(&ctxt_, "/" + service_full_name_ + "/" + method_name_,
&parent_.tls_.completionQueue());
if (rw_ == nullptr) {
notifyRemoteClose(Status::WellKnownGrpcStatus::Unavailable, nullptr, EMPTY_STRING);
call_failed_ = true;
return;
}
parent_.tls_.registerStream(this);
rw_->StartCall(&init_tag_);
++inflight_tags_;
}

If we'd like to integrate between Envoy gRPC and Google gRPC for retry and buffer, we should have API migration for envoy's retry policy and Google gRPC's channel args. Is there any plan to work on this?

@htuch
Copy link
Member

htuch commented Oct 29, 2021

I don't think there is a solid story for how to get the retry/buffering semantics aligned across the gRPC implementations. This really touches on the fact that we have not made any progress on unifying the gRPC implementations more broadly.

I think for this PR, maybe just clarify in the release notes and comments that this only works with the Envoy gRPC client and we can merge.

…er-retry

Signed-off-by: Shikugawa <rei@tetrate.io>
Signed-off-by: Shikugawa <rei@tetrate.io>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #17469 was synchronize by Shikugawa.

see: more, trace.

@Shikugawa
Copy link
Member Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17469 (comment) was created by @Shikugawa.

see: more, trace.

@Shikugawa
Copy link
Member Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17469 (comment) was created by @Shikugawa.

see: more, trace.

@lizan
Copy link
Member

lizan commented Nov 4, 2021

@htuch ping?

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@repokitteh-read-only repokitteh-read-only bot removed the api label Nov 5, 2021
@htuch htuch merged commit 3a5f795 into envoyproxy:main Nov 5, 2021
@durd07
Copy link
Member

durd07 commented Nov 5, 2021

@htuch @Shikugawa the merge bring back the generated_api_shadow into main repo.
can we add a hook to check whether generated_api_shadow added by mistake?
I have seen serval times generated_api_shadow back.

@Shikugawa
Copy link
Member Author

Shikugawa commented Nov 5, 2021

@htuch @Shikugawa the merge bring back the generated_api_shadow into main repo. can we add a hook to check whether generated_api_shadow added by mistake? I have seen serval times generated_api_shadow back.

@durd07 Thank you for the notification. This is no longer needed and just my mistake. #18911

@Shikugawa Shikugawa deleted the cluster-retry branch November 5, 2021 07:37
@htuch
Copy link
Member

htuch commented Nov 7, 2021

@durd07 what other examples are there? If this is common I can add a hook, otherwise I think we will soon forget about generated_api_shadow/ and this won't sneak back in.

@durd07
Copy link
Member

durd07 commented Nov 8, 2021

@htuch , yes, you're right. maybe we will forget about it soon. I met this situation at least twice. but I think it is not necessary to emphasis the alreay not exist codes. please forget it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants