x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions #30938

efd6 · 2022-03-21T22:59:57Z

What does this PR do?

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.

Why is it important?

Fixes a bug.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~- [ ] I have made corresponding changes to the documentation~~
~~- [ ] I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes [Filebeat] decode_cef - recover from errors in the CEF header #30757.

Use cases

Screenshots

Logs

elasticmachine · 2022-03-21T22:59:59Z

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

elasticmachine · 2022-03-22T00:13:28Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-03-29T07:00:45.233+0000
Duration: 67 min 17 sec

Test stats 🧪

Test	Results
Failed	0
Passed	2035
Skipped	159
Total	2194

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

andrewkroh

I like the reorganization. Had one question.

andrewkroh · 2022-03-22T01:52:31Z

x-pack/filebeat/processors/decode_cef/cef/cef_test.go

+	t.Run("truncatedHeader", func(t *testing.T) {
+		var e Event
+		err := e.Unpack(truncatedHeader)
+		assert.Equal(t, errUnexpectedEndOfEvent, err)


Would it be possible to make this error more descriptive based on the machine state? Like have the error indicate that the header is incomplete.

~~I have added a second error in the case that there is a failure on the first round and there are no extensions.~~

Added an action into the first pass since it gives greater control. Also added a test in the decode_cef package to show what the events end up looking like.

andrewkroh

LGTM. Before merging, can you please run the decode_cef/cef/fuzz on the changes for a few minutes if you haven't already.

mergify · 2022-03-24T17:52:17Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b cefrecover upstream/cefrecover
git merge upstream/main
git push upstream cefrecover

…ttempt to recover extensions This adds a second pass at parsing messages if the original parse operation fails and there are no extensions found by relaxing the requirements on header syntax. Because header values may include syntax that looks like the extension key/value pair syntax, the parsers are separated and under control of the host language.

efd6 · 2022-03-24T22:40:02Z

This will need some addition logic there are a bunch of crashers.

efd6 · 2022-03-25T01:16:06Z

@andrewkroh It's worth taking another look at the ragel since I've needed to make some reasonably large changes (by ragel standards) to the actions, but it's been fuzzing after the fixes for 30 minutes without a crasher.

This reverts commit 7c9acf5.

andrewkroh · 2022-03-27T23:31:43Z

x-pack/filebeat/processors/decode_cef/cef/cef_actions.rl

+        state.key = data[mark:p]
+    }
+    action extension_value_start {
+        if len(state.escapes) != 0 {


Why is this condition based on the number of escapes? Can't we have extensions without escapes?

The issue here that if we have got to this point we have a syntax that is not expected and so may have escapes that have not been consumed. So if there are some to consume, this makes sure we do.

The original was

action extension_value_start { state.valueStart = p; state.valueEnd = p }

Without this, what happens is that we can end up with negative indices into the data handed to the unescaper because we have moved on from the fragment being handled without being signaled by the syntax to unescape it so our offset is greater than the span described by each of the escapes. This happens with the fuzz test cases added here.

My point is that there may be an "extension" that needs to be consumed which does not contain any escapes. That's why I am questioning the condition being based on len(escapes) rather than something like len(key) && something with the value start/end. Does that make sense or do I have a misunderstanding.

Those won't be affected by this. This is purely to cope with the cases where things have been set up and not used. I the first pass of the parse, this condition is never triggered because the header state machine prevents this from ever happening, but in the salvage pass, the guarantees don't hold, but also we can't guarantee sensible results, we are just trying to capture as much as possible.

Put another way, if the escape were well formed, it would have been handled by extension_eof which is those cases that you describe, guarded by len(state.key) != 0 && state.valueStart < state.valueEnd. So this just picks up cases that that failed to identify because they were syntactically broken and did not satisfy the machine before the next extension start.

Does that clarify?

Yes, that’s clear. Thanks. Can you add a short comment there.

efd6 · 2022-03-29T00:25:31Z

/test

efd6 · 2022-03-29T00:57:06Z

/test

efd6 · 2022-03-29T00:58:05Z

The coverage tool appears to be choking on this. It shouldn't.

efd6 · 2022-03-29T01:44:06Z

The issue looks like golang/go#35781 and that we just did not hit it with the exact number of lines that we had. The options are to:

remove the comment and hope that changes in generation or cover behaviour don't resurface the issue.
turn off coverage for this package (it this possible per-package?)
inhibit line directives being emitted by ragel (not available for Go)

(I think the issue is that the line directive allows the coverage tool to believe that there are identical lines that have different coverage because the same ragel line ends up in more than one place. Why this only presents itself when the comment is in that action block, I don't know). I have moved the comment into a section of the file that won't end up in the Go code with an explanation and instructions to move it back when the issue in cover is resolved).

…ttempt to recover extensions (#30938) This adds a second pass at parsing messages if the original parse operation fails and there are no extensions found by relaxing the requirements on header syntax. Because header values may include syntax that looks like the extension key/value pair syntax, the parsers are separated and under control of the host language. (cherry picked from commit c193dde) # Conflicts: # x-pack/filebeat/processors/decode_cef/cef/cef_test.go

…ow (*Event).Unpack to attempt to recover extensions (#31036) * x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions (#30938) This adds a second pass at parsing messages if the original parse operation fails and there are no extensions found by relaxing the requirements on header syntax. Because header values may include syntax that looks like the extension key/value pair syntax, the parsers are separated and under control of the host language. (cherry picked from commit c193dde) # Conflicts: # x-pack/filebeat/processors/decode_cef/cef/cef_test.go * fix conflict Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com> Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>

…ttempt to recover extensions (elastic#30938) This adds a second pass at parsing messages if the original parse operation fails and there are no extensions found by relaxing the requirements on header syntax. Because header values may include syntax that looks like the extension key/value pair syntax, the parsers are separated and under control of the host language.

…ttempt to recover extensions (#30938) This adds a second pass at parsing messages if the original parse operation fails and there are no extensions found by relaxing the requirements on header syntax. Because header values may include syntax that looks like the extension key/value pair syntax, the parsers are separated and under control of the host language.

efd6 added bug Team:Security-External Integrations 8.2-candidate 8.1-candidate backport-v8.1.0 Automated backport with mergify labels Mar 21, 2022

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 21, 2022

mergify bot assigned efd6 Mar 21, 2022

efd6 requested review from andrewkroh and a team March 21, 2022 23:00

efd6 force-pushed the cefrecover branch from eed952d to a1f4ae7 Compare March 21, 2022 23:01

andrewkroh reviewed Mar 22, 2022

View reviewed changes

efd6 force-pushed the cefrecover branch from a1f4ae7 to 0dc1b29 Compare March 24, 2022 04:38

efd6 requested a review from a team March 24, 2022 08:10

andrewkroh approved these changes Mar 24, 2022

View reviewed changes

efd6 added 6 commits March 25, 2022 09:02

extend error reporting

9380de3

satisfy linter

0b2b70d

revise approach

f39f005

silence linter

439bd7f

silence linter

7c9acf5

fix fuzzing crashers

120460f

efd6 force-pushed the cefrecover branch from 91de447 to 120460f Compare March 25, 2022 01:14

Revert "silence linter"

6e7fba5

This reverts commit 7c9acf5.

efd6 requested a review from andrewkroh March 27, 2022 21:53

andrewkroh reviewed Mar 27, 2022

View reviewed changes

andrewkroh approved these changes Mar 28, 2022

View reviewed changes

add comment on escape consumption

9d9fcdc

avoid issue

b7a8c78

efd6 merged commit c193dde into elastic:main Mar 29, 2022

efd6 deleted the cefrecover branch March 29, 2022 08:56

mergify bot mentioned this pull request Mar 29, 2022

[8.1](backport #30938) x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions #31036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions #30938

x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions #30938

efd6 commented Mar 21, 2022

elasticmachine commented Mar 21, 2022

elasticmachine commented Mar 22, 2022 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

andrewkroh left a comment

andrewkroh Mar 22, 2022

efd6 Mar 24, 2022 •

edited

Loading

andrewkroh left a comment

mergify bot commented Mar 24, 2022

efd6 commented Mar 24, 2022

efd6 commented Mar 25, 2022

andrewkroh Mar 27, 2022

efd6 Mar 28, 2022 •

edited

Loading

andrewkroh Mar 28, 2022 •

edited

Loading

efd6 Mar 28, 2022

andrewkroh Mar 28, 2022

efd6 commented Mar 29, 2022

efd6 commented Mar 29, 2022

efd6 commented Mar 29, 2022

efd6 commented Mar 29, 2022 •

edited

Loading

x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions #30938

x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions #30938

Conversation

efd6 commented Mar 21, 2022

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

elasticmachine commented Mar 21, 2022

elasticmachine commented Mar 22, 2022 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

andrewkroh left a comment

Choose a reason for hiding this comment

andrewkroh Mar 22, 2022

Choose a reason for hiding this comment

efd6 Mar 24, 2022 • edited Loading

Choose a reason for hiding this comment

andrewkroh left a comment

Choose a reason for hiding this comment

mergify bot commented Mar 24, 2022

efd6 commented Mar 24, 2022

efd6 commented Mar 25, 2022

andrewkroh Mar 27, 2022

Choose a reason for hiding this comment

efd6 Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

andrewkroh Mar 28, 2022 • edited Loading

Choose a reason for hiding this comment

efd6 Mar 28, 2022

Choose a reason for hiding this comment

andrewkroh Mar 28, 2022

Choose a reason for hiding this comment

efd6 commented Mar 29, 2022

efd6 commented Mar 29, 2022

efd6 commented Mar 29, 2022

efd6 commented Mar 29, 2022 • edited Loading

elasticmachine commented Mar 22, 2022 •

edited by jenkins-beats-ci bot

Loading

efd6 Mar 24, 2022 •

edited

Loading

efd6 Mar 28, 2022 •

edited

Loading

andrewkroh Mar 28, 2022 •

edited

Loading

efd6 commented Mar 29, 2022 •

edited

Loading