Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector cannot handle the throttle exception from Kinesis data stream #17424

Closed
dengmingtong opened this issue May 17, 2023 · 27 comments · Fixed by #17535
Closed

Vector cannot handle the throttle exception from Kinesis data stream #17424

dengmingtong opened this issue May 17, 2023 · 27 comments · Fixed by #17535
Labels
sink: aws_kinesis_streams Anything `aws_kinesis_streams` sink related type: bug A code related bug.

Comments

@dengmingtong
Copy link
Contributor

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I use vector to sink the data to Kinesis data stream, some requests are throttled, but vector cannot catch these exception and still return successful response to front-end.

Is it vector issue or any config I can set to handle these exception?

Thanks,
Mingtong

Configuration

[sinks.kinesis_sink]
type = "aws_kinesis_streams"
inputs =  ["json_parser"]   
partition_key_field = "timestamp"
compression = "none"
region = "%%AWS_REGION%%"
stream_name = "%%AWS_KINESIS_STREAM_NAME%%"

acknowledgements.enabled = true
  [sinks.kinesis_sink.encoding]
  codec = "json"

Version

0.25.1-debian

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@dengmingtong dengmingtong added the type: bug A code related bug. label May 17, 2023
@jszwedko
Copy link
Member

Hi @dengmingtong ! Thanks for opening this. I think it is the same as #12835 so I'll close this one as a duplicate, but feel free to follow that other issue for updates (or to add any additional thoughts).

@jszwedko jszwedko closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2023
@dengmingtong
Copy link
Contributor Author

Thank you very much for your response.
Actually, I have another question, if the request to Kinesis data stream is throttled, I think vector should return an error code to front-end, but currently, vector still return successful response to front-end.
Is it an issue or how should I config vector .toml?

Thanks,
Mingtong

@jszwedko
Copy link
Member

Thank you very much for your response. Actually, I have another question, if the request to Kinesis data stream is throttled, I think vector should return an error code to front-end, but currently, vector still return successful response to front-end. Is it an issue or how should I config vector .toml?

Thanks, Mingtong

The expected behavior is that Vector retries these requests so it would wait to respond to the front-end until they were successful. This isn't implemented yet though (see #12835).

@dengmingtong
Copy link
Contributor Author

T

So suppose kinesis has some problem, the request can never be delivered to kinesis, vector is able to return error response to front-end?

@jszwedko
Copy link
Member

T

So suppose kinesis has some problem, the request can never be delivered to kinesis, vector is able to return error response to front-end?

Ah yes, if acknowledgements.enabled = true is used, then yes, that feedback should propagate to the client. The exact behavior depends on the specific source implementation.

@dengmingtong
Copy link
Contributor Author

I already set acknowledgements.enabled = true, but when request is throttled by kinesis, vector still return successful response to front-end. it is an known issue or any other config I need to set?

@jszwedko
Copy link
Member

I already set acknowledgements.enabled = true, but when request is throttled by kinesis, vector still return successful response to front-end. it is an known issue or any other config I need to set?

Ah, yeah, that sounds the same as #12835 but for aws_kinesis_streams instead. I confused your initial issue for aws_kinesis_firehose. I'll re-open for tracking.

@jszwedko jszwedko reopened this May 19, 2023
@jszwedko jszwedko added the sink: aws_kinesis_streams Anything `aws_kinesis_streams` sink related label May 19, 2023
@dengmingtong
Copy link
Contributor Author

It seems a bit confusing, let me summarize my two questions:

  1. Currently, when vector sends requests to Kinesis Data Stream, some messages are being throttled, but there are no retries.

  2. If a request is throttled, and I have set acknowledgements.enabled = true, Vector still returns success response to the frontend.

Could you help to confirm whether these two issues are problems with Vector.
Is there any plan to fix them?
Especially the second question is particularly important for us.

Thanks,
Mingtong

@jszwedko
Copy link
Member

It seems a bit confusing, let me summarize my two questions:

  1. Currently, when vector sends requests to Kinesis Data Stream, some messages are being throttled, but there are no retries.
  2. If a request is throttled, and I have set acknowledgements.enabled = true, Vector still returns success response to the frontend.

Could you help to confirm whether these two issues are problems with Vector. Is there any plan to fix them? Especially the second question is particularly important for us.

Thanks, Mingtong

Sorry, yeah, it is a bit confusing. I think both questions have the same solution: Vector should be retrying throttled requests rather than dropping them.

There is a contributor PR that is attempting to address them: https://github.com/vectordotdev/vector/pull/16771/files . It's currently waiting on the author.

@dengmingtong
Copy link
Contributor Author

Is there any estimation for these issues can be fixed?

@jszwedko
Copy link
Member

Is there any estimation for these issues can be fixed?

Unfortunately not since we are awaiting the contributor there.

@dengmingtong
Copy link
Contributor Author

Hello, @jszwedko because we are using vector to sink data to KDS, this issue caused we lose some data.
I have verify the PR https://github.com/vectordotdev/vector/pull/16771/files, it works for our case, May I copy this PR change and submit another PR for change?

@jszwedko
Copy link
Member

Hello, @jszwedko because we are using vector to sink data to KDS, this issue caused we lose some data. I have verify the PR https://github.com/vectordotdev/vector/pull/16771/files, it works for our case, May I copy this PR change and submit another PR for change?

That would be great! I think there were just a few comments on that PR that needed to be followed up on.

@neuronull
Copy link
Contributor

👋 hello, does #17535 , close this issue?

@dengmingtong
Copy link
Contributor Author

Yes, it is #17535 PR.

@dengmingtong
Copy link
Contributor Author

dengmingtong commented Jun 1, 2023

Hello, @jszwedko, Is there any progress? Seem the PR is still blocking there.

@jszwedko
Copy link
Member

jszwedko commented Jun 1, 2023

Hello, @jszwedko, Is there any progress? Seem the PR is still blocking there.

Thanks for opening the new PR! Did you make the changes requested on the original PR? If so, we can give it a review.

@dengmingtong
Copy link
Contributor Author

@jszwedko Yes, I merge the commits from original PR and change code to fix some action checking.
Please help to review it. We need this fixing for our solution. :)

@jszwedko
Copy link
Member

jszwedko commented Jun 2, 2023

Thanks @dengmingtong ! We'll get this reviewed again then.

@dengmingtong
Copy link
Contributor Author

@jszwedko The PR is still not reviewed, we really need this change. Could you please help to speed up it. :)

@spencergilbert
Copy link
Contributor

@jszwedko The PR is still not reviewed, we really need this change. Could you please help to speed up it. :)

Apologies @dengmingtong - It's been assigned to me, but I've been stuck on some other work and traveling. I'll be reviewing it later this afternoon, the earliest this would be released (out side of nightly) would be around July 5.

@dengmingtong
Copy link
Contributor Author

@spencergilbert spencergilbert
Thanks for your review, I have rebase and refactored the code, could you please help to review it again.
:)

@dengmingtong
Copy link
Contributor Author

@spencergilbert spencergilbert
Got your comments yesterday, I have rebase and refactored the code, could you please help to review it again.
:)

@dengmingtong
Copy link
Contributor Author

@jszwedko The PR is still open status and wait for another reviewer to approve.
Could you help to reminder the reviewer to go next step?
We really need this fixing.

@spencergilbert
Copy link
Contributor

@dengmingtong apologies - the second reviewer has been out on PTO, they should be back today and I'll have them take a look.

@dengmingtong
Copy link
Contributor Author

@spencergilbert I have updated the PR again, and could you please to review it and also reminder second reviewer to go next step?
We really need this fixing. Because of jet lag, if there is small change comments, could you help to refactor the code and make this PR merged as soon as possible. :)

@dengmingtong
Copy link
Contributor Author

@spencergilbert I saw the PR has been removed from merge queue due to test-integration failed. I do not have this experience, could you help to solve this test-integration failure?
Thank you so much. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: aws_kinesis_streams Anything `aws_kinesis_streams` sink related type: bug A code related bug.
Projects
None yet
4 participants