Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions #30938

Merged
merged 10 commits into from
Mar 29, 2022

Conversation

efd6
Copy link
Contributor

@efd6 efd6 commented Mar 21, 2022

What does this PR do?

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.

Why is it important?

Fixes a bug.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 21, 2022
@mergify mergify bot assigned efd6 Mar 21, 2022
@efd6 efd6 requested review from andrewkroh and a team March 21, 2022 23:00
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 22, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-03-29T07:00:45.233+0000

  • Duration: 67 min 17 sec

Test stats 🧪

Test Results
Failed 0
Passed 2035
Skipped 159
Total 2194

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the reorganization. Had one question.

t.Run("truncatedHeader", func(t *testing.T) {
var e Event
err := e.Unpack(truncatedHeader)
assert.Equal(t, errUnexpectedEndOfEvent, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make this error more descriptive based on the machine state? Like have the error indicate that the header is incomplete.

Copy link
Contributor Author

@efd6 efd6 Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a second error in the case that there is a failure on the first round and there are no extensions.

Added an action into the first pass since it gives greater control. Also added a test in the decode_cef package to show what the events end up looking like.

@efd6 efd6 requested a review from a team March 24, 2022 08:10
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Before merging, can you please run the decode_cef/cef/fuzz on the changes for a few minutes if you haven't already.

@mergify
Copy link
Contributor

mergify bot commented Mar 24, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b cefrecover upstream/cefrecover
git merge upstream/main
git push upstream cefrecover

efd6 added 6 commits March 25, 2022 09:02
…ttempt to recover extensions

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.
@efd6
Copy link
Contributor Author

efd6 commented Mar 24, 2022

This will need some addition logic there are a bunch of crashers.

@efd6
Copy link
Contributor Author

efd6 commented Mar 25, 2022

@andrewkroh It's worth taking another look at the ragel since I've needed to make some reasonably large changes (by ragel standards) to the actions, but it's been fuzzing after the fixes for 30 minutes without a crasher.

This reverts commit 7c9acf5.
@efd6 efd6 requested a review from andrewkroh March 27, 2022 21:53
state.key = data[mark:p]
}
action extension_value_start {
if len(state.escapes) != 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this condition based on the number of escapes? Can't we have extensions without escapes?

Copy link
Contributor Author

@efd6 efd6 Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here that if we have got to this point we have a syntax that is not expected and so may have escapes that have not been consumed. So if there are some to consume, this makes sure we do.

The original was

        action extension_value_start {
            state.valueStart = p;
            state.valueEnd = p
        }

Without this, what happens is that we can end up with negative indices into the data handed to the unescaper because we have moved on from the fragment being handled without being signaled by the syntax to unescape it so our offset is greater than the span described by each of the escapes. This happens with the fuzz test cases added here.

Copy link
Member

@andrewkroh andrewkroh Mar 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that there may be an "extension" that needs to be consumed which does not contain any escapes. That's why I am questioning the condition being based on len(escapes) rather than something like len(key) && something with the value start/end. Does that make sense or do I have a misunderstanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those won't be affected by this. This is purely to cope with the cases where things have been set up and not used. I the first pass of the parse, this condition is never triggered because the header state machine prevents this from ever happening, but in the salvage pass, the guarantees don't hold, but also we can't guarantee sensible results, we are just trying to capture as much as possible.

Put another way, if the escape were well formed, it would have been handled by extension_eof which is those cases that you describe, guarded by len(state.key) != 0 && state.valueStart < state.valueEnd. So this just picks up cases that that failed to identify because they were syntactically broken and did not satisfy the machine before the next extension start.

Does that clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s clear. Thanks. Can you add a short comment there.

@efd6
Copy link
Contributor Author

efd6 commented Mar 29, 2022

/test

1 similar comment
@efd6
Copy link
Contributor Author

efd6 commented Mar 29, 2022

/test

@efd6
Copy link
Contributor Author

efd6 commented Mar 29, 2022

The coverage tool appears to be choking on this. It shouldn't.

@efd6
Copy link
Contributor Author

efd6 commented Mar 29, 2022

The issue looks like golang/go#35781 and that we just did not hit it with the exact number of lines that we had. The options are to:

  1. remove the comment and hope that changes in generation or cover behaviour don't resurface the issue.
  2. turn off coverage for this package (it this possible per-package?)
  3. inhibit line directives being emitted by ragel (not available for Go)

(I think the issue is that the line directive allows the coverage tool to believe that there are identical lines that have different coverage because the same ragel line ends up in more than one place. Why this only presents itself when the comment is in that action block, I don't know). I have moved the comment into a section of the file that won't end up in the Go code with an explanation and instructions to move it back when the issue in cover is resolved).

@efd6 efd6 merged commit c193dde into elastic:main Mar 29, 2022
@efd6 efd6 deleted the cefrecover branch March 29, 2022 08:56
mergify bot pushed a commit that referenced this pull request Mar 29, 2022
…ttempt to recover extensions (#30938)

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.

(cherry picked from commit c193dde)

# Conflicts:
#	x-pack/filebeat/processors/decode_cef/cef/cef_test.go
efd6 added a commit that referenced this pull request Mar 29, 2022
…ow (*Event).Unpack to attempt to recover extensions (#31036)

* x-pack/filebeat/processors/decode_cef/cef: allow (*Event).Unpack to attempt to recover extensions (#30938)

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.

(cherry picked from commit c193dde)

# Conflicts:
#	x-pack/filebeat/processors/decode_cef/cef/cef_test.go

* fix conflict

Co-authored-by: Dan Kortschak <90160302+efd6@users.noreply.github.com>
Co-authored-by: Dan Kortschak <dan.kortschak@elastic.co>
emilioalvap pushed a commit to emilioalvap/beats that referenced this pull request Apr 6, 2022
…ttempt to recover extensions (elastic#30938)

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.
kush-elastic pushed a commit to kush-elastic/beats that referenced this pull request May 2, 2022
…ttempt to recover extensions (elastic#30938)

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
…ttempt to recover extensions (#30938)

This adds a second pass at parsing messages if the original parse operation fails
and there are no extensions found by relaxing the requirements on header syntax.
Because header values may include syntax that looks like the extension key/value
pair syntax, the parsers are separated and under control of the host language.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Filebeat] decode_cef - recover from errors in the CEF header
3 participants