-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable pipeline to discard data older than XX #4667
Comments
We do have the I think what we are missing is a Data Prepper expression and/or function for comparing time. Something like this could work:
Where However, we do not have a |
This could be done a little more easily by adding just a
@marfago , Are you interested in working on adding the |
@dlvenable thank you for your comment. For the solution that you propose, how is the |
@marfago , Do your events have an existing timestamp field that you could use? The Are you using Amazon S3 as a source? If you also need a timestamp, we could include the value of the S3 object header |
Is your feature request related to a problem? Please describe.
I have a data pipeline built as a combination of AOSS pipeline and AOSS collection. This pipeline is a real time monitor for logs.
We recently had an outage so the source did not move logs for few days. When we finally unblocked the pipeline and restarted the ingestion, all the days were moved at once and the AOSS pipeline started to ingest oldest to newest. This behavior does not work for us where we prioritize fresher data over older because we want a real-time monitor.
Describe the solution you'd like
I propose to introduce a a new behavior where the pipeline can discard data in the queue that are older than XX (days, hours,minutes). In this way users may choose to prioritize fresher data over older data without causing the queue to grow indefinitely. In my case I may just set this flag on 1H and only ingest fresh data (at least for some time) forgetting about the past.
For example:
max_retention: 1h
max_retention: 1d
max_retention: 1w
Describe alternatives you've considered (Optional)
I dont have any.
Additional context
Related to #4666
The text was updated successfully, but these errors were encountered: