Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source MSSQL: implementation for CDC #4543

Closed
wants to merge 21 commits into from
Closed

Conversation

Phlair
Copy link
Contributor

@Phlair Phlair commented Jul 5, 2021

What

Related issue: #2848

How

MSSQL CDC follows conventions laid out by Postgres and MySql sources, therefore is built on the debezium SQL Server connector. The CDC works using MSSQL's CDC tables and the LSN value as an offset.

Some things worth pointing out:

  1. Like MySql, the debezium connector for SQL Server monitors the database schema evolution over time and stores the data in database history file. The way this is being handled in this PR is entirely similar to the MySQL implementation (🎉 source: implementation for mysql cdc #3505). This PR duplicates code for this functionality but abstraction is currently being worked on (Investigate creating a CDC abstraction for re-use across CDC sources #3934) so it should only be temporary.
  2. The handling of LSN is purposely as similar to Postgres CDC as possible. A difference to note is that SQL Server emits two LSN values: Commit and Change LSN. We're using Commit here as that specifies the LSN of the commit the change was part of. Another difference is rather than defining our own MSSQL LSN class, we're using Debezium's.
  3. Setup on the source database requires db_owner or sysadmin roles. The details for CDC setup are all laid out in the docs in this PR. One requirement being enforced is SNAPSHOT isolation mode for the initial snapshot of data in order to avoid locking tables from concurrent writes. If we find users are adverse to enabling this we could make this configurable (e.g. a boolean option in spec for 'enforce snapshot isolation').
  4. Some further datatype issues are introduced by Debezium. I've made an issue for this here https://github.com/airbytehq/airbyte-internal-issues/issues/162.

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Documentation updated
    • Created or updated reference docs in docs/integrations/<source or destination>/<name>.
    • Changelog in the appropriate page in docs/integrations/.... See changelog example
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

@Phlair Phlair requested a review from subodh1810 July 5, 2021 10:04
@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Jul 5, 2021
@Phlair
Copy link
Contributor Author

Phlair commented Jul 5, 2021

/test connector=source-mssql

🕑 source-mssql https://github.com/airbytehq/airbyte/actions/runs/1000745276
❌ source-mssql https://github.com/airbytehq/airbyte/actions/runs/1000745276

@Phlair Phlair self-assigned this Jul 5, 2021
@Phlair
Copy link
Contributor Author

Phlair commented Jul 5, 2021

/test connector=source-mssql

🕑 source-mssql https://github.com/airbytehq/airbyte/actions/runs/1000827100
✅ source-mssql https://github.com/airbytehq/airbyte/actions/runs/1000827100

@sherifnada sherifnada self-requested a review July 6, 2021 16:03
Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull in the latest changes from master to fix compilation error.
Also I was playing around with the property decimal.handling.mode and it worked without upgrading the debezium version. so perhaps we can stick to this version right? What do you think

}

@Override
public List<AutoCloseableIterator<AirbyteMessage>> getIncrementalIterators(JsonNode config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would cause compilation error, in master the signature of the method has been changed

* {@link io.debezium.config.CommonConnectorConfig#DEFAULT_MAX_BATCH_SIZE} is 2048 (so 20,480)
* {@link io.debezium.config.CommonConnectorConfig#DEFAULT_MAX_QUEUE_SIZE} is 8192 (so 81,920)
*/
final LinkedBlockingQueue<ChangeEvent<String, String>> queue = new LinkedBlockingQueue<>(100000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed we should probably stick with the default numbers so queue should be of 10k size

@Phlair
Copy link
Contributor Author

Phlair commented Jul 12, 2021

This PR is superseded by #4689

@Phlair
Copy link
Contributor Author

Phlair commented Jul 14, 2021

Closed completed by #4689

@Phlair Phlair closed this Jul 14, 2021
@swyxio swyxio deleted the george/sql-server-cdc branch October 12, 2022 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants