feat(kinesis sinks): implement full retry of partial failures in firehose/streams #16771

jasongoodwin · 2023-03-11T20:49:10Z

Has configuration, but not per each of firehose/streams (probably should be per each, not just in base)
It passes the failed count up for each of firehose/streams, then uses config to see if the request should be retried.
I have to test this, but looks okay. It's a little different than the ES implementation, but the change is fairly minimal.
Only wart is the way that the request size is "augmented" into the KinesisResponse after the call to call.

Hopefully looks alright - feel free to drop any feedback and I'll fix it up.
I think the config per each of streams/firehose is likely necessary to release this.

I reviewed/marked up the PR to help draw attention to these items and clarify.

--------- Further discussion/tickets -------

Handling of partial failures for firehose is open here: #359
Note that this issues is just the whole retry similar to what ES does (#140
)
I'd happily talk to someone about how I might improve this to handle only the partial failures (@decklyndubs on Discord - I'm in your server.) I have to look a little deeper but my fear in doing that is that a record gets indefinitely stuck and should be dropped, so in my mind there may need to be some separate retry policy in the sink. This design issue is broad and relates to multiple sinks such as ES that has partial failures.

Some other related tickets.
#7659
#9451
#9861

…hose/streams

bits-bot · 2023-03-11T20:49:14Z

All committers have signed the CLA.

netlify · 2023-03-11T20:49:14Z

✅ Deploy Preview for vrl-playground canceled.

Name	Link
🔨 Latest commit	`1d2caea`
🔍 Latest deploy log	https://app.netlify.com/sites/vrl-playground/deploys/640d1f376a1ee20008cf32fe

netlify · 2023-03-11T20:49:15Z

✅ Deploy Preview for vector-project canceled.

Name	Link
🔨 Latest commit	`1d2caea`
🔍 Latest deploy log	https://app.netlify.com/sites/vector-project/deploys/640d1f37068f200008638a20

jasongoodwin · 2023-03-11T20:52:45Z

src/sinks/aws_kinesis/config.rs

@@ -58,6 +58,13 @@ pub struct KinesisSinkBaseConfig {
    #[serde(default)]
    pub auth: AwsAuthentication,

+    /// Whether or not to retry successful requests containing partial failures.


probably want per-sink config. This is in the "base" only

👍 especially if we're only supporting it for streams.

jasongoodwin · 2023-03-11T20:53:08Z

src/sinks/aws_kinesis/firehose/config.rs

+            let msg = format!("partial error count {}", response.failure_count);
+            return RetryAction::Retry(msg.into());
+        } else {
+            RetryAction::DontRetry("ok".into())


Should these contain anything different? New to the project.

jasongoodwin · 2023-03-11T20:53:27Z

src/sinks/aws_kinesis/firehose/config.rs

+    fn should_retry_response(&self, response: &Self::Response) -> RetryAction {
+        if self.retry_partial && response.failure_count > 0 {
+            let msg = format!("partial error count {}", response.failure_count);
+            return RetryAction::Retry(msg.into());


Fixme: doesn't need a return

jasongoodwin · 2023-03-11T20:56:59Z

src/sinks/aws_kinesis/firehose/record.rs

+            .map(|output: PutRecordBatchOutput| KinesisResponse {
+                count: rec_count,
+                failure_count: output.failed_put_count().unwrap_or(0) as usize,
+                events_byte_size: 0,


the events size isn't available here. Wasn't sure the best way to modify this - will think about it. I may just return the failure count for now, and build the KinesisResponse in the Service

spencergilbert · 2023-03-13T20:31:30Z

Thanks @jasongoodwin! I'm hoping to have this reviewed by tomorrow, appreciate the pre-review you already provided 😄

jasongoodwin · 2023-03-14T16:51:36Z

Thanks @jasongoodwin! I'm hoping to have this reviewed by tomorrow, appreciate the pre-review you already provided 😄

Yeah it needs a few things likely - but what I'm really hoping to accomplish is to do partial retry - for firehose this pr could potentially create a lot of duplication which is a risk. Streams is okay as it can deduplicate.

If you can give some insight into how i might implement partial retry, would defo appreciate it.
Eg I can get the records that failed, no problem. How would I pass this up so that the sink can batch just the failed records on the retry while using the existing retry logic? Having a hard time understanding some of those pieces.

jasongoodwin · 2023-03-15T18:04:54Z

I'll try to clean this up a bit over the weekend after reviewing/thinking about it a bit.

I closed the related pr. #16703
I'm interested in taking a stab at the partial retry feature - thanks for the feedback on that one - it gives me some leads to analyze the code a bit more.

spencergilbert · 2023-03-15T18:16:56Z

I failed to have this reviewed, but I'll do my best to leave my thoughts before you take another look this weekend

jasongoodwin · 2023-03-16T01:23:30Z

It's okay - I can see some things clean up after sitting on it. Only thing really at this point is:

what do you think about the configuration?
should this even be implemented for firehose? What's the risk of duplication and egregious expense on a lot of retries?
the KinesisResponse is built and then aggregated as it's passed out and I think that's not great.

spencergilbert · 2023-03-21T12:47:02Z

Sorry for the delay - I'm planning on dedicating a chunk of time Wednesday on this.

spencergilbert

what do you think about the configuration?

I think the configuration is fine. If we don't implement for firehose we can drop the note, and if there is a suggestion for how to protect against dedupes for streams - we could include that suggestion (as we do for Elasticsearch).

should this even be implemented for firehose? What's the risk of duplication and egregious expense on a lot of retries?

I think if there's no way for firehose to de-dupe these we should only implement it for streams (which can de-dupe?).

spencergilbert · 2023-03-23T14:52:39Z

src/sinks/aws_kinesis/config.rs

@@ -58,6 +58,13 @@ pub struct KinesisSinkBaseConfig {
    #[serde(default)]
    pub auth: AwsAuthentication,

+    /// Whether or not to retry successful requests containing partial failures.


👍 especially if we're only supporting it for streams.

spencergilbert · 2023-03-23T15:01:04Z

src/sinks/aws_kinesis/firehose/config.rs

+
+    fn should_retry_response(&self, response: &Self::Response) -> RetryAction {
+        if self.retry_partial && response.failure_count > 0 {
+            let msg = format!("partial error count {}", response.failure_count);


It would be nice to include the error type and reason if we can pull that out of the response reasonably.

spencergilbert · 2023-03-23T15:03:40Z

src/sinks/aws_kinesis/streams/record.rs

@@ -1,3 +1,5 @@
+use crate::sinks::aws_kinesis::KinesisResponse;


This could be moved into the use super:: line below.

spencergilbert · 2023-03-23T15:12:38Z

src/sinks/aws_kinesis/streams/record.rs

+            .map(|output: PutRecordsOutput| KinesisResponse {
+                count: rec_count,
+                failure_count: output.failed_record_count().unwrap_or(0) as usize,
+                events_byte_size: 0,
+            })


It definitely feels better to me to do this in the service.rs

jasongoodwin · 2023-04-01T01:46:33Z

Great thanks for the review! I'll have to rebuild some context to fix this up, but I think we can get this over the line.

spencergilbert · 2023-04-24T15:38:31Z

Hey @jasongoodwin - just noticed this was still hanging around, wanted to check in and see how things were going.

…hose/streams (#17535) This PR is from #16771 PR. Refactor some action checking. closes: #17424 --------- Signed-off-by: Spencer Gilbert <spencer.gilbert@datadoghq.com> Co-authored-by: Jason Goodwin <jgoodwin@bluecatnetworks.com> Co-authored-by: Jason Goodwin <jay.michael.goodwin@gmail.com> Co-authored-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>

jszwedko · 2023-06-28T21:23:04Z

Superceded by #17535

feat(kinesis sinks): implement full retry of partial failures in fire…

b0c1b6b

…hose/streams

jasongoodwin requested a review from spencergilbert as a code owner March 11, 2023 20:49

jasongoodwin requested a review from a team March 11, 2023 20:49

github-actions bot added the domain: sinks Anything related to the Vector's sinks label Mar 11, 2023

jasongoodwin commented Mar 11, 2023

View reviewed changes

fix(kinesis sinks): use old behavior if ok

1d2caea

fuchsnj added the sink: aws_kinesis_streams Anything `aws_kinesis_streams` sink related label Mar 13, 2023

spencergilbert requested a review from a team March 13, 2023 19:38

spencergilbert reviewed Mar 23, 2023

View reviewed changes

jszwedko assigned spencergilbert Apr 24, 2023

jszwedko mentioned this pull request May 18, 2023

Add Retry Behaviour to AWS Kinesis Data Firehose Sink #12835

Open

dengmingtong mentioned this pull request May 30, 2023

feat(kinesis sinks): implement full retry of partial failures in firehose/streams #17535

Merged

jszwedko closed this Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kinesis sinks): implement full retry of partial failures in firehose/streams #16771

feat(kinesis sinks): implement full retry of partial failures in firehose/streams #16771

jasongoodwin commented Mar 11, 2023 •

edited

Loading

bits-bot commented Mar 11, 2023 •

edited

Loading

netlify bot commented Mar 11, 2023 •

edited

Loading

netlify bot commented Mar 11, 2023 •

edited

Loading

jasongoodwin Mar 11, 2023

spencergilbert Mar 23, 2023

jasongoodwin Mar 11, 2023

jasongoodwin Mar 11, 2023

jasongoodwin Mar 11, 2023 •

edited

Loading

spencergilbert commented Mar 13, 2023

jasongoodwin commented Mar 14, 2023

jasongoodwin commented Mar 15, 2023 •

edited

Loading

spencergilbert commented Mar 15, 2023

jasongoodwin commented Mar 16, 2023 •

edited

Loading

spencergilbert commented Mar 21, 2023

spencergilbert left a comment

spencergilbert Mar 23, 2023

spencergilbert Mar 23, 2023

spencergilbert Mar 23, 2023

spencergilbert Mar 23, 2023

jasongoodwin commented Apr 1, 2023

spencergilbert commented Apr 24, 2023

jszwedko commented Jun 28, 2023

		@@ -1,3 +1,5 @@
		use crate::sinks::aws_kinesis::KinesisResponse;

feat(kinesis sinks): implement full retry of partial failures in firehose/streams #16771

feat(kinesis sinks): implement full retry of partial failures in firehose/streams #16771

Conversation

jasongoodwin commented Mar 11, 2023 • edited Loading

bits-bot commented Mar 11, 2023 • edited Loading

netlify bot commented Mar 11, 2023 • edited Loading

✅ Deploy Preview for vrl-playground canceled.

netlify bot commented Mar 11, 2023 • edited Loading

✅ Deploy Preview for vector-project canceled.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasongoodwin Mar 11, 2023 • edited Loading

Choose a reason for hiding this comment

spencergilbert commented Mar 13, 2023

jasongoodwin commented Mar 14, 2023

jasongoodwin commented Mar 15, 2023 • edited Loading

spencergilbert commented Mar 15, 2023

jasongoodwin commented Mar 16, 2023 • edited Loading

spencergilbert commented Mar 21, 2023

spencergilbert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasongoodwin commented Apr 1, 2023

spencergilbert commented Apr 24, 2023

jszwedko commented Jun 28, 2023

jasongoodwin commented Mar 11, 2023 •

edited

Loading

bits-bot commented Mar 11, 2023 •

edited

Loading

netlify bot commented Mar 11, 2023 •

edited

Loading

netlify bot commented Mar 11, 2023 •

edited

Loading

jasongoodwin Mar 11, 2023 •

edited

Loading

jasongoodwin commented Mar 15, 2023 •

edited

Loading

jasongoodwin commented Mar 16, 2023 •

edited

Loading