You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, the s3 and opensearch source have an internal acknowledgment timeout of 1 or 2 hours. This is because these are mainly used for historical migrations of data, so they are less time sensitive. However, as dynamodb streams is a streaming use case where order as well as any delay really matters, it is not as simple to provide a high default timeout like these other sources.
Describe the solution you'd like
The simple approach would be to just make the timeout configurable for the user with an acknowledgment_timeout parameter that specifies the timeout, but this is not ideal either since this value can be hard to gauge for streams on their own (also with OpenSearch backpressure), and would still require a fairly large default value to make sure that timeouts don't occur too early right before the data gets to OpenSearch. We need a way to lower the acknowledgment_timeout to quickly recover for streams, but to do that we need to lower the amount of data, since a shard could take a long time to process, which means a longer time to indicate a failure and timeout (and save progress instead of restarting the entire shard when the last part of it has an acknowledgment timeout).
To handle these complications, I would propose that we chunk the AcknowledgmentSets for the dynamo source to a general amount of data (in bytes). We can do this by grouping by sequence numbers for a DDB stream shard. For example, we have a configurable
which will keep reading from the shard until it hits a sequence number that goes to (or past this point, we don't care if it's a little overestimated). Now when we receive an acknowledgment for one of these chunks (we should receive them in order), we can update the partition ownership timeout for that partition to be something small (potentially as low as 30 seconds, or however long the acknowledgment_checkpoint_size takes to travel through the pipeline (#3494 could be used to tune these values)but it is wise for us to allow this to be configurable as well since there is always sink backpressure. For a well scaled sink, low values for acknowledgment_timeout may be possible, which would allow nodes to pick up on crashes very quickly and continue processing the shard.
When an acknowledgment callback is received, not only do we update the ownership timeout with acknowledgment_timeout amount of time, we also update the partition progress state of the partition with the last sequence number that we received an ack callback for. This will allow us to pick up right where we left off in the case of failure instead of having to restart the processing of the entire shard.
Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
For now, due to simplicity, it may make sense to have a single AcknowledgmentSet per shard, and have an hour timeout for each. We can optimize to get closer to the above proposal in the future if needed
Is your feature request related to a problem? Please describe.
Currently, the s3 and opensearch source have an internal acknowledgment timeout of 1 or 2 hours. This is because these are mainly used for historical migrations of data, so they are less time sensitive. However, as dynamodb streams is a streaming use case where order as well as any delay really matters, it is not as simple to provide a high default timeout like these other sources.
Describe the solution you'd like
The simple approach would be to just make the timeout configurable for the user with an
acknowledgment_timeout
parameter that specifies the timeout, but this is not ideal either since this value can be hard to gauge for streams on their own (also with OpenSearch backpressure), and would still require a fairly large default value to make sure that timeouts don't occur too early right before the data gets to OpenSearch. We need a way to lower theacknowledgment_timeout
to quickly recover for streams, but to do that we need to lower the amount of data, since a shard could take a long time to process, which means a longer time to indicate a failure and timeout (and save progress instead of restarting the entire shard when the last part of it has an acknowledgment timeout).To handle these complications, I would propose that we chunk the
AcknowledgmentSets
for the dynamo source to a general amount of data (in bytes). We can do this by grouping by sequence numbers for a DDB stream shard. For example, we have a configurablewhich will keep reading from the shard until it hits a sequence number that goes to (or past this point, we don't care if it's a little overestimated). Now when we receive an acknowledgment for one of these chunks (we should receive them in order), we can update the partition ownership timeout for that partition to be something small (potentially as low as 30 seconds, or however long the
acknowledgment_checkpoint_size
takes to travel through the pipeline (#3494 could be used to tune these values)but it is wise for us to allow this to be configurable as well since there is always sink backpressure. For a well scaled sink, low values foracknowledgment_timeout
may be possible, which would allow nodes to pick up on crashes very quickly and continue processing the shard.When an acknowledgment callback is received, not only do we update the ownership timeout with
acknowledgment_timeout
amount of time, we also update the partition progress state of the partition with the last sequence number that we received an ack callback for. This will allow us to pick up right where we left off in the case of failure instead of having to restart the processing of the entire shard.Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: