Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Processors] Mime-Type Detection #22940

Merged
merged 10 commits into from
Dec 8, 2020

Conversation

andrewstucki
Copy link

@andrewstucki andrewstucki commented Dec 4, 2020

What does this PR do?

Adds a basic mime type sniffer beats processor and uses it in packetbeat. This allows us to implement the new ECS 1.7 http.*.mime_type fields.

Basically we do the following:

  1. Run a portion of the whatever data we want to run detection on through detection via h2non/filetype
  2. If that fails, run through the net/http sniffer
  3. If the net/http sniffer says this is plain text (no binary encoding/html detected), attempt to determine if we have some sort of "specially encoded" text (i.e. json, xml, etc.)
  4. If all else fails and we get back a generic mime type (application/octet-stream) return without filling in the field

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Dec 4, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 4, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #22940 updated

  • Start Time: 2020-12-07T21:44:52.668+0000

  • Duration: 113 min 32 sec

Test stats 🧪

Test Results
Failed 0
Passed 17359
Skipped 1373
Total 18732

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 17359
Skipped 1373
Total 18732

@andrewstucki andrewstucki requested a review from a team December 5, 2020 02:55
@andrewstucki
Copy link
Author

The E2E tests appear to be triggered by the packaging job which got triggered by the go.mod update. Not entirely sure what's up with the failures, but they appear to be unrelated to these changes and are all related to fleet.

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about performance implications of xml.Unmarshal.

others are little things.

libbeat/mime/byte.go Outdated Show resolved Hide resolved
libbeat/mime/byte.go Show resolved Hide resolved
libbeat/mime/byte.go Show resolved Hide resolved
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewstucki andrewstucki merged commit 5f52979 into elastic:master Dec 8, 2020
@andrewstucki andrewstucki deleted the mimetype-detection branch December 8, 2020 01:02
andrewstucki pushed a commit to andrewstucki/beats that referenced this pull request Dec 8, 2020
* Add mimetype processor

* Add mimetype detection for packetbeat

* Update changelog

* Rev go.sum

* Refactor for reusability and rename to detect_mime_type

* reformat imports

* update docs

* Update maxHeaderSize name and add comment on the fallback behavior

(cherry picked from commit 5f52979)
andrewstucki pushed a commit that referenced this pull request Dec 8, 2020
* [Processors] Mime-Type Detection (#22940)

* Add mimetype processor

* Add mimetype detection for packetbeat

* Update changelog

* Rev go.sum

* Refactor for reusability and rename to detect_mime_type

* reformat imports

* update docs

* Update maxHeaderSize name and add comment on the fallback behavior

(cherry picked from commit 5f52979)

* Fix up changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants