-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Kafka: fix missing data #19587
Conversation
Hello 👋, first thank you for this amazing contribution. We really appreciate the effort you've made to improve the project. If you have any questions feel free to send me a message in Slack! |
@itaseskii are you going to review this? |
/test connector=connectors/source-kafka
Build PassedTest summary info:
|
@marcosmarxm don't merge it yet I'll review it today. |
@alexnikitchuk The fix looks good to me. It makes sense to avoid reading additional records once the max_records count is exceeded which the previous implementation was doing. The additional read shouldn't have been that problematic since the offset wasn't committed and dropped records would have been consumed in the next replication but still it is better to improve efficiency whenever possible 👍 With that in mind I'm unsure if the current implementation is correct when it comes to its constraint about not transmitting more records than specified in max_records. If the poll reads 20 records on initial read whereas the max_records count is set to 10 the records would still be committed/transmitted ignoring the constraint. I'm aware that this was not the point of this task but it is something that maybe we can address in future PR's. |
Consumer close commits all consumed messages. Why do you say it is not committed? |
because I wasn't aware that |
/publish connector=connectors/source-kafka
if you have connectors that successfully published but failed definition generation, follow step 4 here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alexnikitchuk
What
Kafka Source does not guarantee "at-least-once" delivery because it drops records in case
max_records_process
is exceeded.How
Consume all data returned from each call to poll(long) before closing the consumer.
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.