Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[utility]: wrap try in an assertion to ensure that exception are only caught on main thread #15251

Merged
merged 50 commits into from
Mar 30, 2021

Conversation

chaoqin-li1123
Copy link
Member

@chaoqin-li1123 chaoqin-li1123 commented Mar 2, 2021

Commit Message: Define a macros that wrap try with an assertion that exceptions are only caught in main thread. Currently, envoy use c++ exception for error propagation. This PR is one of the steps to address #14320. The long term goal is to disallow raw try in envoy core code and eliminate c++ exception from data plane, which can improve exception safety. The try in the PR happen on main thread and can be wrapped in main thread assertion without breaking any existing test. In the following PR, raw try that can not be replaced by TRY_ASSERT_MAIN_THREAD will be removed from core codebase with other error propagation.
Additional Description:
Risk Level:
Testing: none
Docs Changes: none
Release Notes: none
Platform Specific Features: none
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]

chaoqin-li1123 added 11 commits February 23, 2021 16:42
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@alyssawilk
Copy link
Contributor

cc @mattklein123 for thoughts.

Also please ignore the mac CI timeout - we're looking into it!

@mattklein123 mattklein123 self-assigned this Mar 2, 2021
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level this makes sense and looks great to me. Are you going to add a linter also?

/wait

@chaoqin-li1123
Copy link
Member Author

At a high level this makes sense and looks great to me. Are you going to add a linter also?

/wait

I plan to add format checking later, which disallow "try {" in /source repo but whitelist the /source/extensions repo. But that will be after all the raw try has been removed from core codebase of envoy.

@mattklein123
Copy link
Member

mattklein123 commented Mar 2, 2021

I plan to add format checking later, which disallow "try {" in /source repo but whitelist the /source/extensions repo. But that will be after all the raw try has been removed from core codebase of envoy.

My concern is that we may play whackamole until then. What about a TRY_NEEDS_AUDIT macro and then we can lint for try now?

@chaoqin-li1123
Copy link
Member Author

I plan to add format checking later, which disallow "try {" in /source repo but whitelist the /source/extensions repo. But that will be after all the raw try has been removed from core codebase of envoy.

My concern is that we may play whackamole until then. What about a TRY_NEEDS_AUDIT macro and then we can lint for try now?

Yes, that makes sense. I will add format checking now with this workaround.

@htuch
Copy link
Member

htuch commented Mar 3, 2021

+1 on linting in CI. Thanks @chaoqin-li1123 this looks great.

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
chaoqin-li1123 added 5 commits March 3, 2021 15:39
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM with some small comments.

/wait

source/common/common/regex.cc Outdated Show resolved Hide resolved
source/common/common/thread.h Show resolved Hide resolved
try {
MessageUtil::jsonConvertValue(*proto, val);
} catch (EnvoyException& ex) {
TRY_ASSERT_MAIN_THREAD { MessageUtil::jsonConvertValue(*proto, val); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this is on workers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't break any test when I add the main thread assertion. maybe because of missing test coverage, I will take a look at the test coverage later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping please audit this. I'm 100% positive this happens on workers given the function params. (This feature should probably have an integration test anyway so maybe add that in a different PR?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will see whether some integration test can be added.

source/common/http/utility.cc Show resolved Hide resolved
source/common/network/dns_impl.cc Show resolved Hide resolved
source/common/protobuf/utility.cc Outdated Show resolved Hide resolved
source/server/admin/utils.cc Outdated Show resolved Hide resolved
tools/code_format/check_format.py Outdated Show resolved Hide resolved
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 2 remaining nits from my perspective.

thanks for pushing this through!

test/common/formatter/substitution_formatter_test.cc Outdated Show resolved Hide resolved
chaoqin-li1123 added 3 commits March 25, 2021 05:23
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM with small comments, thanks!

/wait

source/common/http/utility.cc Show resolved Hide resolved
@@ -168,6 +169,7 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i

if (completed_) {
if (!cancelled_) {
// TODO(chaoqin-li1123): remove this exception catching since the process is dying anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate. This can happen during normal operation for some of these I think. Doesn't this need some AUDIT statement or CI should fail? Same below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in RAW_TRY_ALLOWLIST in check_format.py. However I'm not sure if this is justified and think maybe it should be a NEEDS_AUDIT.

Do we need to allow the callback to throw? Or can we just have the callbacks return an error?

I assume it isn't c_ares that forces us to handle exceptions here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is executed in main thread and in dns filter, I think it is fine to allow a filter to throw and catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, comment updated.

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
mattklein123
mattklein123 previously approved these changes Mar 26, 2021
@chaoqin-li1123
Copy link
Member Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15251 (comment) was created by @chaoqin-li1123.

see: more, trace.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 this PR will not close the Issue -- it's just the first step. Then we have to go through and fix all the audits. But getting the mechanism in place is a great stepping stone.

Matt -- you approved, but I wanted to double check you are you OK to merge? There were a couple of unanswered comment threads I think.

source/common/http/utility.cc Outdated Show resolved Hide resolved
@mattklein123
Copy link
Member

I'm fine to merge as long as we have a good plan for all the follow ups. @jmarantz I will defer to you for final review/merge, thanks!

@chaoqin-li1123
Copy link
Member Author

There are only 4 TRY_NEEDS_AUDIT left, I think I can clean them up in the followup PRs.

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of unresolved comments and then we can merge.

@@ -168,6 +169,7 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i

if (completed_) {
if (!cancelled_) {
// TODO(chaoqin-li1123): remove this exception catching since the process is dying anyway.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is in RAW_TRY_ALLOWLIST in check_format.py. However I'm not sure if this is justified and think maybe it should be a NEEDS_AUDIT.

Do we need to allow the callback to throw? Or can we just have the callbacks return an error?

I assume it isn't c_ares that forces us to handle exceptions here?

Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
@chaoqin-li1123
Copy link
Member Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15251 (comment) was created by @chaoqin-li1123.

see: more, trace.

@jmarantz jmarantz merged commit cfbb1a8 into envoyproxy:main Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants