Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filebeat] Add URI Parts Processor to multiple modules #24699

Merged
merged 16 commits into from
Apr 27, 2021

Conversation

legoguy1000
Copy link
Contributor

@legoguy1000 legoguy1000 commented Mar 23, 2021

What does this PR do?

Updates Ingest Pipelines for the below modules:

Apache, Nginx, IIS, Traefik, S3Access, Cisco, F5, Fortinet, Google Workspace, Imperva, Microsoft, Netscout, O365, Sophos, Squid, Suricata, Zeek, Zia, Zoom, ZScaler

With the below changes

  • Add uri_parts processor to parse URIs (includes URL decoding) to add url.path, url.extension, url.query....
  • URL Decodes http.request.referrer (when applicable) to make them human readable

Why is it important?

Parses URLs to break up the URL into the different parts and URL decodes them.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

How to test this PR locally

cd beats/filebeat
GENERATE=true TESTING_FILEBEAT_MODULES=apache,nginx,iis,traefik mage -v pythonIntegTest
cd beats/x-pack/filebeat
GENERATE=true TESTING_FILEBEAT_MODULES=s3access,cisco,f5,fortinet,google_workspace,imperva,microsoft,netscout,o365,sophos,squid,suricata,zeek,zia,zoom,zscaler mage -v pythonIntegTest

Related issues

Use cases

Screenshots

Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 23, 2021
@legoguy1000 legoguy1000 marked this pull request as ready for review March 23, 2021 05:04
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 23, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: andrewstucki commented: jenkins run tests

  • Start Time: 2021-04-27T16:37:55.516+0000

  • Duration: 140 min 57 sec

  • Commit: b01d893

Test stats 🧪

Test Results
Failed 0
Passed 13707
Skipped 2285
Total 15992

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 13707
Skipped 2285
Total 15992

@legoguy1000 legoguy1000 marked this pull request as draft March 23, 2021 13:28
@legoguy1000 legoguy1000 changed the title [Filebeat] Update Nginx pipelines [Filebeat] Update Nginx & Apache pipelines Mar 23, 2021
@legoguy1000 legoguy1000 changed the title [Filebeat] Update Nginx & Apache pipelines [Filebeat] Update URI decoding and parsing across multiple modules Mar 25, 2021
@legoguy1000 legoguy1000 force-pushed the 19088-nginx-pipelines branch 4 times, most recently from 5c94728 to 50cc14b Compare March 26, 2021 15:23
@legoguy1000 legoguy1000 marked this pull request as ready for review March 27, 2021 21:36
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 29, 2021
@legoguy1000
Copy link
Contributor Author

@andrewstucki This should be ready review and CI tests

@andrewstucki
Copy link
Contributor

jenkins run tests

@andrewstucki
Copy link
Contributor

@legoguy1000 so it looks to me like the new log entries that were generated for the nginx module are missing user_agent.device.type which was added recently to Elasticsearch in elastic/elasticsearch#69322 -- can you make sure your development Elasticsearch instance is up-to-date with the latest snapshot before regeneration? I believe you should be able to grab the latest docker container with:

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT

and then run it with:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "logger.org.elasticsearch.transport=trace" docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT

prior to re-running the regeneration for nginx.

@legoguy1000
Copy link
Contributor Author

@legoguy1000 so it looks to me like the new log entries that were generated for the nginx module are missing user_agent.device.type which was added recently to Elasticsearch in elastic/elasticsearch#69322 -- can you make sure your development Elasticsearch instance is up-to-date with the latest snapshot before regeneration? I believe you should be able to grab the latest docker container with:

docker pull docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT

and then run it with:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "logger.org.elasticsearch.transport=trace" docker.elastic.co/elasticsearch/elasticsearch:8.0.0-SNAPSHOT

prior to re-running the regeneration for nginx.

Not a problem

Copy link
Contributor

@andrewstucki andrewstucki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about the backslash behavior. Not entirely sure if this is behavior based off of a combination of using urldecode + uri_parts and ordering or if this is just some strange behavior/potentially a bug in the processor itself. I'll try and check this out locally and play around with it to see what's going on.

@legoguy1000
Copy link
Contributor Author

Ran the pipelines to update the user_agent.device.type taking a look at the extra escapes.

@legoguy1000
Copy link
Contributor Author

@andrewstucki @andrewkroh If you guys are good, can u run the pipeline?

@andrewstucki
Copy link
Contributor

jenkins run tests

Copy link
Contributor

@andrewstucki andrewstucki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, based off of #19088 (comment) I want to see if @ebeahan can chime in about whether the best course is to keep the url.original field in url-encoded form or if it should be decoded.

@andrewstucki
Copy link
Contributor

jenkins run tests

@andrewstucki
Copy link
Contributor

@legoguy1000 In general, this looks fine to me, I'll try and get it merged assuming that the tests pass

@legoguy1000
Copy link
Contributor Author

legoguy1000 commented Apr 26, 2021

@legoguy1000 In general, this looks fine to me, I'll try and get it merged assuming that the tests pass

I suspect its going to fail since they made the changes to the geo IP database, i'm re-generating the data now. I'll hold off pushing to see if it passes.

@legoguy1000
Copy link
Contributor Author

@andrewstucki I update the generated data files to account for the changes in the pipeline. If you rerun the tests, we should be good.

@andrewstucki
Copy link
Contributor

jenkins run tests

@legoguy1000
Copy link
Contributor Author

@andrewstucki all passed

@andrewstucki andrewstucki merged commit f1fea95 into elastic:master Apr 27, 2021
@andrewstucki
Copy link
Contributor

@legoguy1000 thanks for the additions, I'll backport this for the 7.14 release

@andrewstucki andrewstucki added the backport-v7.14.0 Automated backport with mergify label Apr 27, 2021
@legoguy1000 legoguy1000 deleted the 19088-nginx-pipelines branch April 27, 2021 19:34
@legoguy1000
Copy link
Contributor Author

@legoguy1000 thanks for the additions, I'll backport this for the 7.14 release

Ya, that was definitely a long one. Definitely good conversations with the ECS team clarifying the fields.

andrewstucki pushed a commit to andrewstucki/beats that referenced this pull request Apr 27, 2021
* Update Nginx pipelines

* Update Apache, Nginx, IIS, Traefik pipelines

* Update AWS S3

* Update Cisco

* Update F5

* Update Fortinet

* Update Imperva, Netscout, O365, Sophos, Squid, Suricata, Zscaler

* additional fixes

* update pipelines

* unescape \

* remove urldecodes for url.original

* updates after rebase

* update zeek SIP

* update changelog as requested by @andrewstucki

* remove `url_decode` for `http.request.referrer`

* update generated data

(cherry picked from commit f1fea95)
@andrewstucki
Copy link
Contributor

@legoguy1000 backport opened at #25353

andrewstucki pushed a commit that referenced this pull request Apr 27, 2021
* Update Nginx pipelines

* Update Apache, Nginx, IIS, Traefik pipelines

* Update AWS S3

* Update Cisco

* Update F5

* Update Fortinet

* Update Imperva, Netscout, O365, Sophos, Squid, Suricata, Zscaler

* additional fixes

* update pipelines

* unescape \

* remove urldecodes for url.original

* updates after rebase

* update zeek SIP

* update changelog as requested by @andrewstucki

* remove `url_decode` for `http.request.referrer`

* update generated data

(cherry picked from commit f1fea95)

Co-authored-by: Alex Resnick <adr8292@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.14.0 Automated backport with mergify Filebeat Filebeat v7.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default nginx pipelines should decode url.original field
5 participants