Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New records are processed and written to BigQuery table, but offset is unchanged #74

Closed
byreshb opened this issue Jan 26, 2021 · 1 comment

Comments

@byreshb
Copy link

byreshb commented Jan 26, 2021

Connector Plugin Version:

Issue Description:

  • New records processed after a failure (exception when writing to bigquery table because of invalid data) doesn't update offset.
  • Found this issue when testing GH:68

Test Precondition:

Max.poll.records = 5
Number of topic partitions = 1
Number of connect tasks: 1

Test Steps:

Publish Message 1 = Correct Data. Lag = 0, Current Offset = 1, End Offset = 1 (1 message in BQ table)
Publish Message 2 = Correct Data, Lag = 0, Current Offset = 2, End Offset = 2 (2 messages in BQ table)
Publish Message 3 = Correct Data, Lag = 0, Current Offset = 3, End Offset = 3 (3 messages in BQ table)
Publish Message 4 = Correct Data, Lag = 0, Current Offset = 4, End Offset = 4 (4 messages in BQ table)
Publish Message 5 = Correct Data, Lag = 0, Current Offset = 5, End Offset = 5 (5 messages in BQ table)

Publish Message 6 = Invalid Data, Lag = 1, Current Offset = 5, End Offset = 6 (5 messages in BQ table)
(Invalid Data: Kafka topic accepts it but bigquery cannot insert record)

Connector: Running, Connect Task: Failed

Publish Message 7 = Correct Data, Lag = 2, Current Offset = 5, End Offset = 7 (5 messages in BQ table)
Publish Message 8 = Correct Data, Lag = 3, Current Offset = 5, End Offset = 8 (5 messages in BQ table)
Publish Message 9 = Correct Data, Lag = 4, Current Offset = 5, End Offset = 9 (5 messages in BQ table)
Publish Message 10 = Correct Data, Lag = 5, Current Offset = 5, End Offset = 10 (5 messages in BQ table)
Publish Message 11 = Correct Data, Lag = 6, Current Offset = 5, End Offset = 11 (5 messages in BQ table)
Publish Message 12 = Correct Data, Lag = 7, Current Offset = 5, End Offset = 12 (5 messages in BQ table)
Publish Message 13 = Correct Data, Lag = 8, Current Offset = 5, End Offset = 13 (5 messages in BQ table)

Restart Connect Task:
curl -X POST localhost:8083/connectors/edw-bigquery-sink-connector-test/tasks/0/restart

Lag = 8, Current Offset = 5, End Offset = 13 (8 messages in BQ table)

Connector: Running, Connect Task: Failed

Publish Message 14 = Correct Data, Lag = 9, Current Offset = 5, End Offset = 14 (8 messages in BQ table)
Publish Message 15 = Correct Data, Lag = 10, Current Offset = 5, End Offset = 15 (8 messages in BQ table)
Publish Message 16 = Correct Data, Lag = 11, Current Offset = 5, End Offset = 16 (8 messages in BQ table)

Restart Connect Task:
curl -X POST localhost:8083/connectors/edw-bigquery-sink-connector-test/tasks/0/restart

Lag = 11, Current Offset = 5, End Offset = 16 (11 messages in BQ table)

Expected Result:
Offset should change when records are written to bigquery. Following screen-prints shows the end state of test result.

Offset
Number_Of_Records_BigQuery

@C0urante
Copy link

@byreshb as stated on #68, it's an unfortunate reality of the Connect framework today that it's impossible to avoid this scenario without breaking on SMTs that transform the topic name.

Until/unless a KIP is passed that changes there's nothing that can be done on the connector side, so I'll close this. If that changes, we can reopen or file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants