-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[utility]: wrap try in an assertion to ensure that exception are only caught on main thread #15251
[utility]: wrap try in an assertion to ensure that exception are only caught on main thread #15251
Conversation
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
cc @mattklein123 for thoughts. Also please ignore the mac CI timeout - we're looking into it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level this makes sense and looks great to me. Are you going to add a linter also?
/wait
I plan to add format checking later, which disallow "try {" in /source repo but whitelist the /source/extensions repo. But that will be after all the raw try has been removed from core codebase of envoy. |
My concern is that we may play whackamole until then. What about a TRY_NEEDS_AUDIT macro and then we can lint for |
Yes, that makes sense. I will add format checking now with this workaround. |
+1 on linting in CI. Thanks @chaoqin-li1123 this looks great. |
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
…ith_assert_macros
…ith_assert_macros
…ith_assert_macros
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM with some small comments.
/wait
try { | ||
MessageUtil::jsonConvertValue(*proto, val); | ||
} catch (EnvoyException& ex) { | ||
TRY_ASSERT_MAIN_THREAD { MessageUtil::jsonConvertValue(*proto, val); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this is on workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't break any test when I add the main thread assertion. maybe because of missing test coverage, I will take a look at the test coverage later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping please audit this. I'm 100% positive this happens on workers given the function params. (This feature should probably have an integration test anyway so maybe add that in a different PR?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will see whether some integration test can be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just 2 remaining nits from my perspective.
thanks for pushing this through!
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks LGTM with small comments, thanks!
/wait
source/common/network/dns_impl.cc
Outdated
@@ -168,6 +169,7 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |||
|
|||
if (completed_) { | |||
if (!cancelled_) { | |||
// TODO(chaoqin-li1123): remove this exception catching since the process is dying anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not accurate. This can happen during normal operation for some of these I think. Doesn't this need some AUDIT statement or CI should fail? Same below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is in RAW_TRY_ALLOWLIST
in check_format.py. However I'm not sure if this is justified and think maybe it should be a NEEDS_AUDIT.
Do we need to allow the callback to throw? Or can we just have the callbacks return an error?
I assume it isn't c_ares that forces us to handle exceptions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is executed in main thread and in dns filter, I think it is fine to allow a filter to throw and catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, comment updated.
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
/retest |
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattklein123 this PR will not close the Issue -- it's just the first step. Then we have to go through and fix all the audits. But getting the mechanism in place is a great stepping stone.
Matt -- you approved, but I wanted to double check you are you OK to merge? There were a couple of unanswered comment threads I think.
I'm fine to merge as long as we have a good plan for all the follow ups. @jmarantz I will defer to you for final review/merge, thanks! |
There are only 4 TRY_NEEDS_AUDIT left, I think I can clean them up in the followup PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of unresolved comments and then we can merge.
source/common/network/dns_impl.cc
Outdated
@@ -168,6 +169,7 @@ void DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback(int status, i | |||
|
|||
if (completed_) { | |||
if (!cancelled_) { | |||
// TODO(chaoqin-li1123): remove this exception catching since the process is dying anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is in RAW_TRY_ALLOWLIST
in check_format.py. However I'm not sure if this is justified and think maybe it should be a NEEDS_AUDIT.
Do we need to allow the callback to throw? Or can we just have the callbacks return an error?
I assume it isn't c_ares that forces us to handle exceptions here?
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
Signed-off-by: chaoqin-li1123 <chaoqinli@google.com>
/retest |
Retrying Azure Pipelines: |
Commit Message: Define a macros that wrap try with an assertion that exceptions are only caught in main thread. Currently, envoy use c++ exception for error propagation. This PR is one of the steps to address #14320. The long term goal is to disallow raw try in envoy core code and eliminate c++ exception from data plane, which can improve exception safety. The try in the PR happen on main thread and can be wrapped in main thread assertion without breaking any existing test. In the following PR, raw try that can not be replaced by TRY_ASSERT_MAIN_THREAD will be removed from core codebase with other error propagation.
Additional Description:
Risk Level:
Testing: none
Docs Changes: none
Release Notes: none
Platform Specific Features: none
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
[Optional API Considerations:]