-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert ext_proc: send attributes #30781 #31017
Revert ext_proc: send attributes #30781 #31017
Conversation
Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
/assign @jbohanon @tyxia @yanavlasov |
@jbohanon cannot be assigned to this issue. |
This is the ASAN crash traceback decode: ================================================================= |
Is this not a part of a required check? Is there a way to diagnose the test failure without reverting the feature immediately? What is the blast radius of the issue that the fuzz test is revealing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks
@jbohanon I left comment here https://github.com/envoyproxy/envoy/pull/30781/files#r1402254604.
It is probably not related to the specific crash here but I think it is wrong pattern of using CEL . Also, given it is close to holiday, it is a good idea that we revert it first, investigate the issue in the meantime.
We are a little bit concerned this may cause Envoy crash if someone accidentally configured request_attributes. So reverting it is safer. Once we have a good root cause and fix, we can commit them back in. |
#30781 was merged at 2 days ago. So, it is ok to revert it. /lgtm api |
/assign @yanavlasov for the code change. |
🙀 Error while processing event:
|
The reason to revert is that invalid config can cause data plane crash. This is to risky to keep this PR. It seems like validation of CEL expression is missing, but I'm not sure. It is better to investigate the crash without having added risk in the codebase. |
Waiting for the merge conflict to be resolved. /wait |
AFAIU, it is not invalid config but the fuzzer issue instead? cced @adisuissa who figured out this. Back to original CL, my 2 cents is that we should address the issue i mentioned before merging it. |
Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>
The fuzz-issue was fixed in google/oss-fuzz#11258 by enabling instrumentation in the CEL code. |
So, with the fix of google/oss-fuzz#11258, the ext_proc fuzz crash is still observed with the change of #30781, right? |
Which ext_proc fuzz crash are you referring to? Can you paste the link? |
I updated my workspace to latest, and in the process of running the ext_proc unit test fuzzer and see whether there is any crash in ext_proc code. |
This is not an oss-fuzz crash, and I believe it has to do something with the build flags that are used for |
With the latest workspace which includes google/oss-fuzz#11258, but does not include this revert PR , I saw the crash below Envoy::Extensions::HttpFilters::ExternalProcessing::FilterConfig::initExpressions(). Note, this function is added by #30781 , which is the reason why we are trying to revert it. ==12==ERROR: AddressSanitizer: heap-use-after-free on address 0x61100029ce00 at pc 0x00000341b53f bp 0x7fff612a6310 sp 0x7fff612a6308
|
I suspect there is some variable life-time issues in this function: FilterConfig::initExpressions, as @tyxia pointed out earlier. But I could be wrong. We should debug this after reverting this PR first. |
This appears to be an issue with the upstream CEL parser. The fuzzer ASAN |
I think it is an issue with Envoy's building flags when using |
@yanavlasov @yanjunxiang-google @tyxia since this failure is unrelated to net-new code would you mind looking at #31090 |
ext_proc unit test fuzzer is the main one protect ext_proc code. If we commit this back in now, it will leave the fuzzer in broken state. Can we spend a little bit more time on the root cause, or at least skip the asan-fuzzer test if request_attributes is configured in the fuzzer input? envoy/test/extensions/filters/http/ext_proc/unit_test_fuzz/ext_proc_unit_test_fuzz.cc Line 52 in d4ed8d4
|
@yanjunxiang-google in that PR I have added the conditional to skip the fuzzing of I have confirmed the behavior change with the command listed above now timing out instead of failing with the previous error. |
Description:
After the commit of #30781 , ext_proc filter unit test fuzzer is reporting crashes.
We suspect there is lifetime issues in the cel expression initialization code during filter config parsing. The crash can be simply reproduced by below command with the two crash input files committed with this PR:
bazel test --config asan-fuzzer //test/extensions/filters/http/ext_proc/unit_test_fuzz:ext_proc_unit_test_fuzz --test_timeout=300
Revert the PR for now.
Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
Fixes commit #30781
[Optional Deprecated:]
[Optional API Considerations:]