-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU utilization on CDC replication from Postgres CDC #6003
Comments
This is potentially a duplicate of #5754 |
Bug was reproduced with source connector version's airbyte/source-mysql:0.4.3, but with last version of source-mysql:0.4.6 connector bug is not reproduced with more than 60 millions rows. Seems changes from #5600 fix it. [65m_rows_success_finish.txt](https://github.com/airbytehq/airbyte/files/7330477/65m_rows_success_finish.txt
|
This sounds pretty good. I am asking Daniel, who reported this issue, to try the newer version: |
Attempted to retry this after recent developments in airbyte's source + destination connectors. however, no luck. |
@danieldiamond thanks for giving it a shot. We'll have another go at this. For the record: DB stability is our #1 goal for this quarter. There'll be more details coming soon, but we're going to focus on creating robust benchmarks that reproduce the scenarios you've described as causing these issues and will apply a big push to get them resolved asap. |
In scope of the issue i worked with standard insert loading method as was mentioned in the related slack thread, but seems Daniel used S3 stagging loading method in last attempt, @sherifnada, @tuliren can we ask Daniel to reproduce initial issue with standard insert loading method to check whether cdc with insert method works correctly in his environment and to understand whether there is the bug with such description or not. Based on it we can update issue's description, because now it doesn't correspond properly to the real issue. |
I can confirm that using you can probably find more in the slack comments on other threads (i've used this table in various attempts for various issues) |
As far as initial bug with STANDARD loading method is not reproduced, i will close the issue if there are no objections. |
clarifying problem scope: I thought this issue was about the source connector utilizing 100% CPU and getting stuck. If that is the case, why would the destination connector's loading method make a difference? was the issue was actually with the destination connector? |
@sashkalife, even though the standard mode works, the issue still exists for the staging mode. We should not close the issue before the root cause was figured out and fixed. |
@sherifnada, i totally agree with you that we have moved away from the root issue. I reproduced bug about getting stuck with airbyte/source-mysql:0.4.3 connector, but with airbyte/source-mysql:0.4.6 version it works correctly. I assumed that changes from #5600 fixed the issue. But during checking this assumption, another bug was found with Snowlake staging CDC that prevent from checking root issue. |
@sashaNeshcheret, got it. Thanks for the explanation. To summarize all the related issues:
Based on Daniel's latest update, the 100% CPU usage issue still exists for the latest MySQL source ( |
Following up here, after the job was hanging, I stopped the However, even though the job synced successfully in the end. It decided to retry (from the start, which was painful, i didnt realize it would retry and jumped in after a few hours to cancel it). I have since updated the environment variable for retries. I then cancelled this retry and performed another sync, which appears to kick off at the last read position, which seems great! |
Thank you @danieldiamond, I will use your scenario with huge db volume, hope I will catch job's hanging. |
@danieldiamond can you, please, provide logs for last attempts. |
files are too large, will send through slack. are you in the community? |
Sure, please share logs and tag me oleksandrNeshcheret in slack |
Confirming this is still occurring with: |
@tuliren cc since you're managing the DB rehaul efforts |
Got it. Please note that we may not have bandwidth to fix all the CDC issues in Q4. They are likely our major focus in Q1 2022. |
Although we originally planned to work on database CDC this quarter, our priorities have changed, and unfortunately we won't have the bandwidth to work on CDC this quarter any more. We recommend anyone watching this issue to use the non CDC mode for incremental sync for now. |
Appreciate the limited resources and existing priorities but suggesting to defer to non-CDC mode for large tables isn't a great alternative. Particularly if users already had existing internal batch migration jobs. Part of Airbyte's draw is "commoditising DB replication by open-sourcing CDC" - I see this issue relating to a core functionality/expectation of using Airbyte so here's to hoping for something next quarter 🤞 |
It's been awhile for this issue. @danieldiamond, have you tried this recently? We've made many improvements in CDC. |
@bleonard appreciate you reaching out on this. |
Ok @danieldiamond sounds good. Feel free to ping me on any current issues. We want to make it the best it can be. |
Enviroment
Current Behavior
100% CPU utilization on the instance and the sync hangs. See the logs:
60m_row_table_success_but_hangs.txt
In addition, the user reports 100% CPU utilization on the node
Slack thread
Expected Behavior
I expect the sync to complete and CPU utilization not to be quite so drastic.
Logs
See above
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: