Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated records with low http.timer.interval.millis #263

Open
tomasz-sadura opened this issue Feb 17, 2023 · 1 comment
Open

Duplicated records with low http.timer.interval.millis #263

tomasz-sadura opened this issue Feb 17, 2023 · 1 comment
Assignees

Comments

@tomasz-sadura
Copy link

Describe the bug
When the http.timer.interval.millis value is lower than the offset.flush.interval.ms kafka connect worker setting, the records in topic are duplicated every http.timer.interval.millis until offset is commited.

To Reproduce
Set http.timer.interval.millis connector config to 1000 ms.
Use default http.timer.interval.millis worker setting of 60000 ms.

Expected behavior
Records should appear once in the target topic.
For the poll, the offset from the last committed record should be used.
Updating the offset value should be done at the start of each poll instead of during commit.

Kafka Connect:

  • Version 3.4.0

Plugin:

  • Version 0.8.11

Additional context
Add any other context about the problem here.

@steam0
Copy link

steam0 commented Jan 18, 2024

Yeah, I am having trouble with this as well. I would really like to poll more often than ever minute, but every poll within the 60000 ms window returns duplications.

Example from logs:

[2024-01-18 15:32:26,334] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:36,393] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:46,457] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:32:56,510] INFO Request for offset {key=ReEqHY0BN-He1Zqe92Zd, timestamp=2024-01-18T15:22:51.495794932Z} yields 1/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:33:05,610] INFO WorkerSourceTask{id=buypass-playercards-connect-0} Either no records were produced by the task since the last offset commit, or every record has been filtered out by a transformation or dropped due to transformation or conversion errors. (org.apache.kafka.connect.runtime.WorkerSourceTask)
[2024-01-18 15:33:06,567] INFO Request for offset {key=tUszHY0BfTtFkKhrnQU-, timestamp=2024-01-18T15:32:18.026652563Z} yields 0/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)
[2024-01-18 15:33:16,621] INFO Request for offset {key=tUszHY0BfTtFkKhrnQU-, timestamp=2024-01-18T15:32:18.026652563Z} yields 0/24 new records (com.github.castorm.kafka.connect.http.HttpSourceTask)

Is there any information regarding this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants