-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dump command dramatically slower than insert #51116
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I am considering adding e.g. https://github.com/proullon/ramsql as a cache in the dump code, doing a fetch all of the crdb_internal into it, and then letting the normal dump logic work against that table. |
@donbowman did the above-linked work related to cleaning up of temp tables/sequences mitigate this issue? In which case I can close this issue 🙂 |
i don't know what the the attached sql, on a clean db, shows the issue, was there a change made i should try? |
Apologies, I meant the issue referencing this one - #51219. I was going to wait for those changes to merge in before I revisited this issue to see if it had any positive impact on the performance. Let me reproduce and take a look. |
For more information, we run |
We deprecated text-based dump in 20.2 and the more performant and reliable binary BACKUP/RESTORE available to all non-enterprise users, and have removed |
although I undertand why you have done this, i think it makes your product much worse. text-based sql backups are very important, sometimes one needs to import to e.g. an older version, or make modifications. the binary backup is not a replacement for long term data retention purposes. |
@donbowman When we talked to some users about those kinds of long-term retention/archival requirements, what we heard was that for those use-cases, they often actually preferred text-based exports to CSV or similar version (and even vendor) neutral format. So along with making BACKUP free, when we removed dump we also made our parallel and distributed |
no. also, for long term retention of disaster recovery, nothing beats pg_dump or mysqldump. Removing that... its a big blow, it gives me more reason to start migrating to tidb. i view the sql output as vendor neutral (even tho its not strictly so, its mostly so when needed). |
Interesting. Our fwiw, there's also still |
i used minor sed or awk to fix any problems. we commonly use the current dump for:
we had a couple of other cases that were challenging w/o it. once was a customer had inadvertently deleted some things, which was inside a larger backup. we had not considered this case,. I could not restore the older backup, since tht would revert other customer info. Instead we used 'grep' to fetch all the info w/ their GUUID from the text dump, and re-inserted those. the alembic is troubling. since cockroachdb is not 100% feature compatible w/ other db, we sometimes end up in a case where alembic cannot update the schema to desired (e.g. we cannot always alter schema). so although i agree for the limited case of full backup/restore to the exact same cluster and setup and version that the binary is better, i do not think it is a replacement, its much more limited. the text mode dump of schema + data is something that every other db technology has, and it is often used for other than the narrow backup/restore. |
One more thing to add: for the schema use case with alembic, v21.1 will have a |
Describe the problem
attached is a sql file to load. I load it into an in-memory (store type=mem) cockroachdb. This takes approximately 35s.
I then run dump. This takes approximately 46minutes.
If I use --echo-sql, nearly all of the time is spent trying to understand the schema. It appears to take <1min for the actual data dump, the other ~45 min is spent loading the schema of each table one by one.
To Reproduce
Expected behavior
I expect in this case, a single-node, in memory, non-contended server, the dump performance to be on par w/ the load performance.
Environment:
Additional context
This is blocking ability to backup.
dump.sql.gz
The text was updated successfully, but these errors were encountered: