Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log input: remove states more eagerly #25756

Merged
merged 3 commits into from
May 27, 2021

Conversation

urso
Copy link

@urso urso commented May 18, 2021

What does this PR do?

The change introduced here more eagerly removes the state if
clean_removed is set, reduing the time window of detection and state
removal to scan_frequency in the log input.

Why is it important?

The log input removes states on clean_removed by setting the TTL to 0, and wait for another scan to finally remove the state for the from the state registry. The total time window for a state removal thusly was 2*scan_frequency. The disk scan always is subject to race conditions with the actual on disk state. In case of inode reuse an increased time window might might lead to filebeat detecting a new file as a rename of an old file. Next Filebeat truncate detection would try to handle the case, but if logs are written and rotated very fast the new file might already be bigger then the old file.

By removing the state more eagerly we reduce the risk of running into a race condition reopening a new file as old.

Checklist

  • My code follows the style guidelines of this project
    - [ ] I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
    - [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

The log input removes states on clean_removed by setting the TTL to 0,
and wait for another scan to finally remove the state for the from the
state registry. The total time window for a state removal thusly was
`2*scan_frequency`. The disk scan always is subject to race conditions
with the actual on disk state. In case of inode reuse an increased time
window might might lead to filebeat detecting a new file as a rename of
an old file. Next Filebeat truncate detection would try to handle the
case, but if logs are written and rotated very fast the new file might
already be bigger then the old file.

The change introduced here more eagerly removes the state if
clean_removed is set, reduing the time window of detection and state
removal to `scan_frequency` in the log input.
@urso urso added enhancement Team:Elastic-Agent Label for the Agent team labels May 18, 2021
@urso urso requested a review from kvch May 18, 2021 10:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels May 18, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 18, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #25756 updated

  • Start Time: 2021-05-25T16:32:14.534+0000

  • Duration: 105 min 28 sec

  • Commit: 5d1b9fa

Test stats 🧪

Test Results
Failed 0
Passed 13886
Skipped 2292
Total 16178

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 13886
Skipped 2292
Total 16178

Copy link
Contributor

@kvch kvch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a changelog entry is added, it is good to go.

@urso urso added backport-v7.13.0 Automated backport with mergify backport-v7.14.0 Automated backport with mergify labels May 25, 2021
@mergify
Copy link
Contributor

mergify bot commented May 25, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fb-state-clean-removed-now upstream/fb-state-clean-removed-now
git merge upstream/master
git push upstream fb-state-clean-removed-now

@urso urso merged commit 2ee21d9 into elastic:master May 27, 2021
@urso urso deleted the fb-state-clean-removed-now branch May 27, 2021 10:52
mergify bot pushed a commit that referenced this pull request May 27, 2021
## What does this PR do?

The change introduced here more eagerly removes the state if
clean_removed is set, reduing the time window of detection and state
removal to `scan_frequency` in the log input.

## Why is it important?

The log input removes states on clean_removed by setting the TTL to 0, and wait for another scan to finally remove the state for the from the state registry. The total time window for a state removal thusly was `2*scan_frequency`. The disk scan always is subject to race conditions with the actual on disk state. In case of inode reuse an increased time window might might lead to filebeat detecting a new file as a rename of an old file. Next Filebeat truncate detection would try to handle the case, but if logs are written and rotated very fast the new file might already be bigger then the old file.

By removing the state more eagerly we reduce the risk of running into a race condition reopening a new file as old.

(cherry picked from commit 2ee21d9)
mergify bot pushed a commit that referenced this pull request May 27, 2021
## What does this PR do?

The change introduced here more eagerly removes the state if
clean_removed is set, reduing the time window of detection and state
removal to `scan_frequency` in the log input.

## Why is it important?

The log input removes states on clean_removed by setting the TTL to 0, and wait for another scan to finally remove the state for the from the state registry. The total time window for a state removal thusly was `2*scan_frequency`. The disk scan always is subject to race conditions with the actual on disk state. In case of inode reuse an increased time window might might lead to filebeat detecting a new file as a rename of an old file. Next Filebeat truncate detection would try to handle the case, but if logs are written and rotated very fast the new file might already be bigger then the old file.

By removing the state more eagerly we reduce the risk of running into a race condition reopening a new file as old.

(cherry picked from commit 2ee21d9)

# Conflicts:
#	filebeat/input/log/input.go
cachedout pushed a commit that referenced this pull request May 28, 2021
## What does this PR do?

The change introduced here more eagerly removes the state if
clean_removed is set, reduing the time window of detection and state
removal to `scan_frequency` in the log input.

## Why is it important?

The log input removes states on clean_removed by setting the TTL to 0, and wait for another scan to finally remove the state for the from the state registry. The total time window for a state removal thusly was `2*scan_frequency`. The disk scan always is subject to race conditions with the actual on disk state. In case of inode reuse an increased time window might might lead to filebeat detecting a new file as a rename of an old file. Next Filebeat truncate detection would try to handle the case, but if logs are written and rotated very fast the new file might already be bigger then the old file.

By removing the state more eagerly we reduce the risk of running into a race condition reopening a new file as old.

(cherry picked from commit 2ee21d9)
urso pushed a commit that referenced this pull request Jun 1, 2021
Co-authored-by: Steffen Siering <steffen.siering@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.13.0 Automated backport with mergify backport-v7.14.0 Automated backport with mergify enhancement Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants