Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 fix mysql-cdc OOM: use capped queue #4203

Merged
merged 6 commits into from
Jun 23, 2021
Merged

Conversation

subodh1810
Copy link
Contributor

Issue : #3969
Read #3969 (comment) for detailed understanding of the issue

By using a queue with capped capacity we avoid inserting new elements in the queue until there is space for new elements. This avoids heap getting bigger

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Secrets are annotated with airbyte_secret in output spec
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Credentials added to Github CI if needed and not already present. instructions for injecting secrets into CI.
  • Documentation updated
    • README
    • CHANGELOG.md
    • Reference docs in the docs/integrations/ directory.
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

Connector Generator checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@subodh1810 subodh1810 self-assigned this Jun 18, 2021
@subodh1810 subodh1810 requested review from davinchia and cgardens and removed request for davinchia June 18, 2021 14:33
@github-actions github-actions bot added the area/connectors Connector related issues label Jun 18, 2021
@cgardens
Copy link
Contributor

DebeziumEngine follows a consumer / producer model (like Kafka). My understanding of this change is that you are slowing down the consumer (which takes a record and puts in the queue), but not the producer (thing actually get logs from db). Does slowing down the consumer actually slow down the producer in this case?

@davinchia
Copy link
Contributor

Huh interesting. Should we make a similar change on the Postgres source?

Copy link
Contributor

@davinchia davinchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to wait for @cgardens 's comment to be resolved - though it sounds like slowing down the producer is noop since the producer is bottlenecked on consumption.

We should have a follow up ticket to double check this for Postgres CDC.

@subodh1810
Copy link
Contributor Author

@cgardens debezium also maintains an internal queue of records but it is also bounded by capacity and has a fixed size. So if we slow down the consumer, producer would also be slowed down

@subodh1810
Copy link
Contributor Author

subodh1810 commented Jun 21, 2021

/test connector=source-mysql

🕑 source-mysql https://github.com/airbytehq/airbyte/actions/runs/958304621
✅ source-mysql https://github.com/airbytehq/airbyte/actions/runs/958304621

@subodh1810
Copy link
Contributor Author

created issue for postgres #4253

queue.add(e);
boolean inserted = false;
while (!inserted) {
inserted = queue.offer(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do a small sleep here to ensure we don't busy-loop inside this thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada What do you think would be a right sleep value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally expo backoff for 1/2/4/8/16 ms maybe capping at half a second. But barring that a magic constant (erring on the smaller side) should do. For example 10ms.

@@ -233,7 +233,7 @@ private static boolean shouldUseCDC(ConfiguredAirbyteCatalog catalog) {
final AirbyteFileOffsetBackingStore offsetManager = initializeState(stateManager);
AirbyteSchemaHistoryStorage schemaHistoryManager = initializeDBHistory(stateManager);
FilteredFileDatabaseHistory.setDatabaseName(config.get("database").asText());
final LinkedBlockingQueue<ChangeEvent<String, String>> queue = new LinkedBlockingQueue<>();
final LinkedBlockingQueue<ChangeEvent<String, String>> queue = new LinkedBlockingQueue<>(10000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any context behind this magic number? (can we add a comment(

@cgardens
Copy link
Contributor

@cgardens debezium also maintains an internal queue of records but it is also bounded by capacity and has a fixed size. So if we slow down the consumer, producer would also be slowed down

awesome! thanks for digging into this. can you add a comment explaining this?

while (!inserted) {
inserted = queue.offer(e);
try {
Thread.sleep(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw this should only trigger if inserted == false

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch! sorry about that!

@subodh1810
Copy link
Contributor Author

subodh1810 commented Jun 23, 2021

/test connector=source-mysql

🕑 source-mysql https://github.com/airbytehq/airbyte/actions/runs/963618347
✅ source-mysql https://github.com/airbytehq/airbyte/actions/runs/963618347

@subodh1810
Copy link
Contributor Author

subodh1810 commented Jun 23, 2021

/publish connector=connectors/source-mysql

🕑 connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/963726765
✅ connectors/source-mysql https://github.com/airbytehq/airbyte/actions/runs/963726765

@subodh1810
Copy link
Contributor Author

Merging this PR cause the integration tests pass and the other tests pass locally, we are having issues with Acceptance test

@subodh1810 subodh1810 merged commit 2e75ba6 into master Jun 23, 2021
@subodh1810 subodh1810 deleted the mysql-cdc-oom-fix branch June 23, 2021 10:47
@marcosmarxm marcosmarxm changed the title fix mysql-cdc OOM: use capped queue 🐛 fix mysql-cdc OOM: use capped queue Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants