End to end acknowledgments for Dynamo source #3538

graytaylor0 · 2023-10-21T00:04:34Z

Is your feature request related to a problem? Please describe.
Currently, the s3 and opensearch source have an internal acknowledgment timeout of 1 or 2 hours. This is because these are mainly used for historical migrations of data, so they are less time sensitive. However, as dynamodb streams is a streaming use case where order as well as any delay really matters, it is not as simple to provide a high default timeout like these other sources.

Describe the solution you'd like
The simple approach would be to just make the timeout configurable for the user with an acknowledgment_timeout parameter that specifies the timeout, but this is not ideal either since this value can be hard to gauge for streams on their own (also with OpenSearch backpressure), and would still require a fairly large default value to make sure that timeouts don't occur too early right before the data gets to OpenSearch. We need a way to lower the acknowledgment_timeout to quickly recover for streams, but to do that we need to lower the amount of data, since a shard could take a long time to process, which means a longer time to indicate a failure and timeout (and save progress instead of restarting the entire shard when the last part of it has an acknowledgment timeout).

To handle these complications, I would propose that we chunk the AcknowledgmentSets for the dynamo source to a general amount of data (in bytes). We can do this by grouping by sequence numbers for a DDB stream shard. For example, we have a configurable

acknowledgment_checkpoint_size: "10mb"
acknowledgment_timeout: "30s"

which will keep reading from the shard until it hits a sequence number that goes to (or past this point, we don't care if it's a little overestimated). Now when we receive an acknowledgment for one of these chunks (we should receive them in order), we can update the partition ownership timeout for that partition to be something small (potentially as low as 30 seconds, or however long the acknowledgment_checkpoint_size takes to travel through the pipeline (#3494 could be used to tune these values)but it is wise for us to allow this to be configurable as well since there is always sink backpressure. For a well scaled sink, low values for acknowledgment_timeout may be possible, which would allow nodes to pick up on crashes very quickly and continue processing the shard.

When an acknowledgment callback is received, not only do we update the ownership timeout with acknowledgment_timeout amount of time, we also update the partition progress state of the partition with the last sequence number that we received an ack callback for. This will allow us to pick up right where we left off in the case of failure instead of having to restart the processing of the entire shard.

Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

graytaylor0 · 2023-10-24T16:58:33Z

For now, due to simplicity, it may make sense to have a single AcknowledgmentSet per shard, and have an hour timeout for each. We can optimize to get closer to the above proposal in the future if needed

graytaylor0 added the untriaged label Oct 21, 2023

github-project-automation bot added this to Data Prepper Tracking Board Oct 21, 2023

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Oct 21, 2023

graytaylor0 added enhancement New feature or request and removed untriaged labels Oct 23, 2023

graytaylor0 mentioned this issue Nov 1, 2023

Add acknowledgments for the ddb source #3575

Merged

4 tasks

dlvenable added this to the v2.6 milestone Nov 1, 2023

dlvenable assigned graytaylor0 Nov 1, 2023

graytaylor0 closed this as completed in #3575 Nov 2, 2023

github-project-automation bot moved this from Unplanned to Done in Data Prepper Tracking Board Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End to end acknowledgments for Dynamo source #3538

End to end acknowledgments for Dynamo source #3538

graytaylor0 commented Oct 21, 2023

graytaylor0 commented Oct 24, 2023

End to end acknowledgments for Dynamo source #3538

End to end acknowledgments for Dynamo source #3538

Comments

graytaylor0 commented Oct 21, 2023

graytaylor0 commented Oct 24, 2023