Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick #21258 to 7.9: o365input: Restart after fatal error #21386

Merged
merged 2 commits into from
Sep 30, 2020

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Sep 29, 2020

Cherry-pick of PR #21258 to 7.9 branch. Original message:

What does this PR do?

Updates o365input to restart the input after a fatal error is encountered, for example an authentication token refresh error or a parsing error.

This enables the input to be more resilient against errors.

Before this patch, the input would index an error document and terminate. Now it will index an error and restart after a fixed timeout of 5 minutes.

Why is it important?

Some users are reporting that the o365 module stops ingesting events after some days. In all cases it's been observed that the input terminated at some point due to errors contacting the Azure authentication server to refresh a token.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • Make sure that there's no better way, i.e. some way to make an input v2 restart automatically when it returns an error.

How to test this PR locally

Testing the case of token refresh errors is difficult as they are refreshed once every ~12h. But the behavior can be tested by starting Filebeat without an internet connection.

Update the o365input to restart the input after a fatal error is
encountered, for example an authentication token refresh error or a parsing
error.

This enables the input to be more resilient against transient errors.

Before this patch, the input would index an error document and terminate.
Now it will index an error and restart after a fixed timeout of 5 minutes.

(cherry picked from commit 8716d98)
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/siem (Team:SIEM)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 29, 2020
@elasticmachine
Copy link
Collaborator

💔 Tests Failed

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #21386 opened]

  • Start Time: 2020-09-29T15:03:35.181+0000

  • Duration: 86 min 26 sec

Test stats 🧪

Test Results
Failed 1
Passed 19414
Skipped 1859
Total 21274

Test errors

Expand to view the tests failures

  • Name: Build&Test / x-pack/metricbeat-build / TestFetch – ec2

    • Age: 1
    • Duration: 19.25
    • Error Details: Failed

Steps errors

Expand to view the steps failures

  • Name: make -C generator/_templates/beat test test-package

    • Description: make -C generator/_templates/beat test test-package

    • Duration: 3 min 43 sec

    • Start Time: 2020-09-29T15:30:01.900+0000

    • log

  • Name: Notifies GitHub of the status of a Pull Request

    • Description: script returned exit code 2

    • Duration: 0 min 1 sec

    • Start Time: 2020-09-29T15:32:56.634+0000

    • log

  • Name: make -C generator/_templates/beat test

    • Description: make -C generator/_templates/beat test

    • Duration: 8 min 41 sec

    • Start Time: 2020-09-29T15:59:24.771+0000

    • log

  • Name: Notifies GitHub of the status of a Pull Request

    • Description: script returned exit code 2

    • Duration: 0 min 1 sec

    • Start Time: 2020-09-29T16:07:16.446+0000

    • log

  • Name: mage build test

    • Description: mage build test

    • Duration: 24 min 16 sec

    • Start Time: 2020-09-29T15:37:13.125+0000

    • log

  • Name: Notifies GitHub of the status of a Pull Request

    • Description: script returned exit code 1

    • Duration: 0 min 1 sec

    • Start Time: 2020-09-29T16:01:54.850+0000

    • log

Log output

Expand to view the last 100 lines of log output

[2020-09-29T16:29:17.678Z] + tar -xpf source.tgz
[2020-09-29T16:29:30.291Z] + rm source.tgz
[2020-09-29T16:29:31.013Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats
[2020-09-29T16:29:31.058Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/uncategorized-1601393131873
[2020-09-29T16:29:31.172Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/libbeat-stress-tests-1601393405952
[2020-09-29T16:29:31.293Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/winlogbeat-crosscompile-1601393489906
[2020-09-29T16:29:31.416Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/auditbeat-crosscompile-1601393544607
[2020-09-29T16:29:31.538Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-elastic-agent-build-1601393552954
[2020-09-29T16:29:31.661Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/generator-beat-test-1601393571336
[2020-09-29T16:29:31.787Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-dockerlogbeat-build-1601393585658
[2020-09-29T16:29:31.905Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/journalbeat-unitTest-1601393606028
[2020-09-29T16:29:32.028Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/libbeat-crosscompile-1601393691975
[2020-09-29T16:29:32.159Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-functionbeat-build-1601393705066
[2020-09-29T16:29:32.290Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/packetbeat-build-1601393768669
[2020-09-29T16:29:32.414Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-elastic-agent-windows-windows-2019-1601393799657
[2020-09-29T16:29:32.539Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-unitTest-1601393887864
[2020-09-29T16:29:32.664Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/auditbeat-windows-windows-2019-1601393983549
[2020-09-29T16:29:32.782Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/heartbeat-build-1601394010341
[2020-09-29T16:29:32.910Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/heartbeat-windows-windows-2019-1601394058428
[2020-09-29T16:29:33.030Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-auditbeat-windows-windows-2019-1601394075720
[2020-09-29T16:29:33.151Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/generator-metricbeat-test-1601394081791
[2020-09-29T16:29:33.269Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-crosscompile-1601394088347
[2020-09-29T16:29:33.390Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-winlogbeat-build-windows-2019-1601394110595
[2020-09-29T16:29:33.512Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-auditbeat-build-1601394129507
[2020-09-29T16:29:33.639Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/auditbeat-build-1601394154817
[2020-09-29T16:29:33.769Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-functionbeat-windows-windows-2019-1601394160111
[2020-09-29T16:29:33.897Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-libbeat-build-1601394165484
[2020-09-29T16:29:34.111Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-filebeat-windows-windows-2019-1601394183961
[2020-09-29T16:29:34.233Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/packetbeat-windows-windows-2019-1601394245365
[2020-09-29T16:29:34.348Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/filebeat-windows-windows-2019-1601394263076
[2020-09-29T16:29:34.469Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/winlogbeat-windows-windows-2019-1601394303326
[2020-09-29T16:29:34.585Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-windows-windows-2019-1601394428113
[2020-09-29T16:29:34.729Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-windows-windows-2019-1601394494418
[2020-09-29T16:29:34.877Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/auditbeat-macos-macosx-1601394777703
[2020-09-29T16:29:34.998Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/generator-macos-metricbeat-macosx-1601394888636
[2020-09-29T16:29:35.123Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/heartbeat-macos-macosx-1601395027123
[2020-09-29T16:29:35.238Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-filebeat-build-1601395075358
[2020-09-29T16:29:35.355Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/filebeat-macos-macosx-1601395102639
[2020-09-29T16:29:35.467Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/filebeat-build-1601395107492
[2020-09-29T16:29:35.573Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-macos-macosx-1601395256788
[2020-09-29T16:29:35.677Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-auditbeat-macos-macosx-1601395271035
[2020-09-29T16:29:35.794Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-elastic-agent-macos-macosx-1601395282416
[2020-09-29T16:29:35.907Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/packetbeat-macos-macosx-1601395290632
[2020-09-29T16:29:36.016Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-build-1601395302415
[2020-09-29T16:29:36.119Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/libbeat-build-1601395446340
[2020-09-29T16:29:36.223Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-pythonIntegTest-1601395568576
[2020-09-29T16:29:36.332Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/generator-macos-beat-macosx-1601395628844
[2020-09-29T16:29:36.439Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-goIntegTest-1601395652016
[2020-09-29T16:29:36.540Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-macos-macosx-1601395655201
[2020-09-29T16:29:36.643Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-functionbeat-macos-macosx-1601396226325
[2020-09-29T16:29:36.746Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-filebeat-macos-macosx-1601396895331
[2020-09-29T16:29:37.185Z] + cat
[2020-09-29T16:29:37.185Z] + /usr/local/bin/runbld ./runbld-test-reports --job-name elastic+beats+pull-request
[2020-09-29T16:29:37.185Z] Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
[2020-09-29T16:29:43.803Z] runbld>>> runbld started
[2020-09-29T16:29:43.803Z] runbld>>> 1.6.12/f45d832f2ba0aa2722ab4ec1fda8ad140f027f8b
[2020-09-29T16:29:45.754Z] runbld>>> The following profiles matched the job 'elastic+beats+pull-request' in order of occurrence in the config (last value wins).
[2020-09-29T16:29:45.754Z] runbld>>> Matches in the system config:
[2020-09-29T16:29:45.754Z] runbld>>> - Matched ^elastic\+beats
[2020-09-29T16:29:45.754Z] runbld>>> - Matched ^elastic\+beats\+pull-request
[2020-09-29T16:29:47.141Z] runbld>>> Debug logging enabled.
[2020-09-29T16:29:47.141Z] runbld>>> Storing result
[2020-09-29T16:29:47.404Z] runbld>>> Store result: created {:total 2, :successful 2, :failed 0} 1
[2020-09-29T16:29:47.404Z] runbld>>> BUILD: https://c150076387b5421f9154dfbf536e5c60.us-west1.gcp.cloud.es.io:9243/build-1597739501209/t/20200929162947-DC279D68
[2020-09-29T16:29:47.404Z] runbld>>> Adding system facts.
[2020-09-29T16:29:48.802Z] runbld>>> Adding vcs info for the latest commit:  d26a50f54cdcb78b0db471a3ef0dd5d372c5a10d
[2020-09-29T16:29:48.803Z] runbld>>> >>>>>>>>>>>> SCRIPT EXECUTION BEGIN >>>>>>>>>>>>
[2020-09-29T16:29:48.803Z] runbld>>> Adding /usr/lib/jvm/java-8-openjdk-amd64/bin to the path.
[2020-09-29T16:29:48.803Z] Processing JUnit reports with runbld...
[2020-09-29T16:29:48.803Z] + echo 'Processing JUnit reports with runbld...'
[2020-09-29T16:29:49.065Z] runbld>>> <<<<<<<<<<<< SCRIPT EXECUTION END <<<<<<<<<<<<
[2020-09-29T16:29:49.065Z] runbld>>> DURATION: 31ms
[2020-09-29T16:29:49.065Z] runbld>>> STDOUT: 40 bytes
[2020-09-29T16:29:49.065Z] runbld>>> STDERR: 49 bytes
[2020-09-29T16:29:49.065Z] runbld>>> WRAPPED PROCESS: SUCCESS (0)
[2020-09-29T16:29:49.065Z] runbld>>> Searching for build metadata in /var/lib/jenkins/workspace/Beats_beats_PR-21386
[2020-09-29T16:29:50.010Z] runbld>>> Storing build metadata: 
[2020-09-29T16:29:50.011Z] runbld>>> Adding test report.
[2020-09-29T16:29:50.011Z] runbld>>> Searching for junit test output files with the pattern: TEST-.*\.xml$ in: /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats
[2020-09-29T16:29:50.585Z] runbld>>> Found 141 test output files
[2020-09-29T16:29:51.159Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-build-1601395302415/x-pack/metricbeat/build/TEST-go-integration-openmetrics.xml
[2020-09-29T16:29:51.159Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-build-1601395302415/x-pack/metricbeat/build/TEST-go-integration-istio.xml
[2020-09-29T16:29:51.159Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-build-1601395302415/x-pack/metricbeat/build/TEST-go-integration-activemq.xml
[2020-09-29T16:29:51.159Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-build-1601395302415/x-pack/metricbeat/build/TEST-go-integration-iis.xml
[2020-09-29T16:29:51.159Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/x-pack-metricbeat-build-1601395302415/x-pack/metricbeat/build/TEST-go-integration-tomcat.xml
[2020-09-29T16:29:53.079Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-goIntegTest-1601395652016/metricbeat/build/TEST-go-integration-graphite.xml
[2020-09-29T16:29:53.079Z] runbld>>> No testsuite node found in /var/lib/jenkins/workspace/Beats_beats_PR-21386/src/github.com/elastic/beats/metricbeat-goIntegTest-1601395652016/metricbeat/build/TEST-go-integration-windows.xml
[2020-09-29T16:29:54.025Z] runbld>>> Test output logs contained: Errors: 0 Failures: 1 Tests: 21274 Skipped: 1561
[2020-09-29T16:29:54.025Z] runbld>>> Storing result
[2020-09-29T16:29:54.025Z] runbld>>> FAILURES: 1
[2020-09-29T16:29:54.598Z] runbld>>> Store result: updated {:total 2, :successful 2, :failed 0} 2
[2020-09-29T16:29:54.598Z] runbld>>> BUILD: https://c150076387b5421f9154dfbf536e5c60.us-west1.gcp.cloud.es.io:9243/build-1597739501209/t/20200929162947-DC279D68
[2020-09-29T16:29:54.598Z] runbld>>> Email notification disabled by environment variable.
[2020-09-29T16:29:54.598Z] runbld>>> Slack notification disabled by environment variable.
[2020-09-29T16:30:00.258Z] Running on Jenkins in /var/lib/jenkins/workspace/Beats_beats_PR-21386
[2020-09-29T16:30:00.350Z] [INFO] getVaultSecret: Getting secrets
[2020-09-29T16:30:00.437Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-09-29T16:30:01.287Z] + chmod 755 generate-build-data.sh
[2020-09-29T16:30:01.287Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-21386/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-21386/runs/1 FAILURE 5185837
[2020-09-29T16:30:01.287Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-21386/runs/1/steps/?limit=10000 -o steps-info.json

@adriansr
Copy link
Contributor Author

CI failures unrelated

@adriansr adriansr merged commit 498d440 into elastic:7.9 Sep 30, 2020
@zube zube bot removed the [zube]: Done label Dec 30, 2020
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
elastic#21386)

Update the o365input to restart the input after a fatal error is
encountered, for example an authentication token refresh error or a parsing
error.

This enables the input to be more resilient against transient errors.

Before this patch, the input would index an error document and terminate.
Now it will index an error and restart after a fixed timeout of 5 minutes.

(cherry picked from commit c723c1e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants