Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable cleanup_timeout by default in docker and kubernetes autodiscover #24681

Merged
merged 2 commits into from
Mar 24, 2021

Conversation

jsoriano
Copy link
Member

What does this PR do?

Disable cleanup_timeout by default in docker and kubernetes autodiscover for all beats except Filebeat.

It is kept to 60 seconds in Filebeat, to give a time to collect logs.

Why is it important?

Keeping configurations running for some time after containers have stopped is needed in some cases to complete the collection of logs. But in the rest of cases it is not usually needed, and leads to errors when querying endpoints known to be down.
It can also lead to query IPs that are being reused in newer containers, what can be misleading if the newer pod answers because these events will still have the metadata of the old container.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Start metricbeat and filebeat in Kubernetes with autodiscover and some configuration.
  • Create a pod in Kubernetes that matches the existing configurations.
  • Check that metricbeat and filebat start collecting metrics and logs.
  • Delete the pod.
  • Check that metricbeat stops collecting metrics.
  • Check that filebeat stops collecting metrics about 60 seconds later.

Related issues

@jsoriano jsoriano added review needs_backport PR is waiting to be backported to other branches. Team:Integrations Label for the Integrations team breaking change test-plan Add this PR to be manual test plan v7.13.0 labels Mar 22, 2021
@jsoriano jsoriano self-assigned this Mar 22, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 22, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 22, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #24681 updated

  • Start Time: 2021-03-24T13:00:35.295+0000

  • Duration: 68 min 29 sec

  • Commit: 224623d

Test stats 🧪

Test Results
Failed 0
Passed 46419
Skipped 5104
Total 51523

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 46419
Skipped 5104
Total 51523

@jsoriano
Copy link
Member Author

/test

@jsoriano
Copy link
Member Author

/package

@jsoriano
Copy link
Member Author

/test

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@jsoriano
Copy link
Member Author

/test

1 similar comment
@jsoriano
Copy link
Member Author

/test

@jsoriano jsoriano merged commit 439b808 into elastic:master Mar 24, 2021
@jsoriano jsoriano deleted the disable-cleanup-timeout branch March 24, 2021 14:26
jsoriano added a commit to jsoriano/beats that referenced this pull request Mar 24, 2021
…ver (elastic#24681)

It is kept to 60 seconds in Filebeat, to give a time to collect logs.

Keeping configurations running for some time after containers have stopped
is needed in some cases to complete the collection of logs. But in the rest of
cases it is not usually needed, and leads to errors when querying endpoints
known to be down.
It can also lead to query IPs that are being reused in newer containers, what
can be misleading if the newer pod answers because these events will still
have the metadata of the old container.

(cherry picked from commit 439b808)
@jsoriano jsoriano removed the needs_backport PR is waiting to be backported to other branches. label Mar 24, 2021
jsoriano added a commit that referenced this pull request Mar 24, 2021
…ver (#24681) (#24730)

It is kept to 60 seconds in Filebeat, to give a time to collect logs.

Keeping configurations running for some time after containers have stopped
is needed in some cases to complete the collection of logs. But in the rest of
cases it is not usually needed, and leads to errors when querying endpoints
known to be down.
It can also lead to query IPs that are being reused in newer containers, what
can be misleading if the newer pod answers because these events will still
have the metadata of the old container.

(cherry picked from commit 439b808)
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change review Team:Integrations Label for the Integrations team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set a different autodiscover cleanup_timeout per beat
4 participants