Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws] feat: aws-s3 input registry cleanup for untracked s3 objects #41694

Merged
merged 1 commit into from
Dec 6, 2024

Conversation

Kavindu-Dodan
Copy link
Contributor

@Kavindu-Dodan Kavindu-Dodan commented Nov 19, 2024

Proposed commit message

This PR partially addresses #39116 by introducing a registry cleanup strategy for aws-s3 input.

The cleanup implemented here removes registry entries if the s3 object is no longer available (aka tracked) when listing inside the polling lookup. The cleanup removes objects that are not tracked from both the local state and internal store (backed by the registry) to reduce the memory usage.

Note that, this only benefits when s3 objects get removed (ex:- using lifecycle policy) and are no longer available. There should be a follow-up for instances where such removal is not done at the bucket. For example, this could be done by,

  • Defining a time-based sliding window (ex:- only consider s3 objects of the past 3 days)
  • Only accept objects of a known storage class (ex:- S3 Standard ) so that users can archive objects 1

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Build or run filebeat with aws-s3 input
  • Add s3 objects and observe memory growth with pprof (regex filter for awss3)
  • Remove s3 objects and observe memory reduction

Screenshots

Given below are pprof analyss comparisons for ~4K objects in registry and once they were clean up by removing S3 objects (emptying the bucket)

  • With ~4K objects processed

image

  • When S3 objects were removed

image

Related issues

Footnotes

  1. https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html

@Kavindu-Dodan Kavindu-Dodan added enhancement backport-8.x Automated backport to the 8.x branch with mergify labels Nov 19, 2024
@Kavindu-Dodan Kavindu-Dodan requested a review from a team as a code owner November 19, 2024 21:59
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 19, 2024
@Kavindu-Dodan Kavindu-Dodan added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Nov 19, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 19, 2024
@Kavindu-Dodan Kavindu-Dodan force-pushed the feat/s3-registry-cleanup branch from c129de6 to f394447 Compare November 19, 2024 21:59
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @Kavindu-Dodan ! One quick question first, should we also introduce clean_removed parameter just to be matching the filestream input?

@Kavindu-Dodan
Copy link
Contributor Author

Nice work @Kavindu-Dodan ! One quick question first, should we also introduce clean_removed parameter just to be matching the filestream input?

For s3 input, if clean_removed is set to false, then we will avoid cleanup of registry even for s3 objects that are no longer visible when listing through ListObjectsV2 API. This again can overload the in-memory state store. Besides, there's a core difference between filestream input and s3. In filestream input, we can continuously get new events as file get updated. Whereas in s3, we mark object processed as soon as we process and get ack for all events generated after processing it. So, personally I do not think we need to introduce clean_removed for s3 input unless we see another usage for this. Let me know your view.

@Kavindu-Dodan Kavindu-Dodan force-pushed the feat/s3-registry-cleanup branch from fc03402 to 9b29500 Compare November 22, 2024 16:15
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 25, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@lucabelluccini
Copy link
Contributor

Looking at the change, if I've correctly understood, we're removing the IDs from the registry once the files have been read.
Shouldn't be removing the entries only when we've acknowledged the shipping of the logs downstream (to Elasticsearch/Outputs)?

@Kavindu-Dodan
Copy link
Contributor Author

Looking at the change, if I've correctly understood, we're removing the IDs from the registry once the files have been read. Shouldn't be removing the entries only when we've acknowledged the shipping of the logs downstream (to Elasticsearch/Outputs)?

The cleanup happens for objects we no longer see in the configured bucket and they won't be processed so there won't be any elastic acknowledgment. Any known bucket entry will be kept in the registry and will be tracked for their success or failure. See this logic [1] which is unchanged.

[1] -

acks.Add(publishCount, func() {
err := in.states.AddState(state)
if err != nil {
in.log.Errorf("saving completed object state: %v", err.Error())
}
// Metrics
in.metrics.s3ObjectsAckedTotal.Inc()
})

@Kavindu-Dodan Kavindu-Dodan requested a review from faec November 26, 2024 15:33
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
@Kavindu-Dodan Kavindu-Dodan force-pushed the feat/s3-registry-cleanup branch from b17324a to 3c84e63 Compare December 3, 2024 23:00
@elastic elastic deleted a comment from mergify bot Dec 3, 2024
@Kavindu-Dodan Kavindu-Dodan added backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify labels Dec 5, 2024
@Kavindu-Dodan Kavindu-Dodan merged commit 583d345 into elastic:main Dec 6, 2024
22 checks passed
mergify bot pushed a commit that referenced this pull request Dec 6, 2024
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
(cherry picked from commit 583d345)
mergify bot pushed a commit that referenced this pull request Dec 6, 2024
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
(cherry picked from commit 583d345)
mergify bot pushed a commit that referenced this pull request Dec 6, 2024
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
(cherry picked from commit 583d345)
Kavindu-Dodan added a commit that referenced this pull request Dec 6, 2024
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
(cherry picked from commit 583d345)

Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Kavindu-Dodan added a commit that referenced this pull request Dec 6, 2024
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
(cherry picked from commit 583d345)

Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
Kavindu-Dodan added a commit that referenced this pull request Dec 6, 2024
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>

# Conflicts:
#	x-pack/filebeat/input/awss3/states.go
#	x-pack/filebeat/input/awss3/states_test.go
(cherry picked from commit 583d345)

Co-authored-by: Kavindu Dodanduwa <Kavindu-Dodan@users.noreply.github.com>
@Kavindu-Dodan Kavindu-Dodan deleted the feat/s3-registry-cleanup branch December 6, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants