Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dump command dramatically slower than insert #51116

Closed
donbowman opened this issue Jul 8, 2020 · 13 comments
Closed

dump command dramatically slower than insert #51116

donbowman opened this issue Jul 8, 2020 · 13 comments
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-disaster-recovery X-blathers-triaged blathers was able to find an owner

Comments

@donbowman
Copy link

donbowman commented Jul 8, 2020

Describe the problem

attached is a sql file to load. I load it into an in-memory (store type=mem) cockroachdb. This takes approximately 35s.
I then run dump. This takes approximately 46minutes.

If I use --echo-sql, nearly all of the time is spent trying to understand the schema. It appears to take <1min for the actual data dump, the other ~45 min is spent loading the schema of each table one by one.

To Reproduce

start-single-node --background --insecure --store=type=mem,size=10% --listen-addr=localhost:26257
cockroach sql --insecure < dump.sql
cockroach dump --insecure defaultdb > /tmp/dump.sql

Expected behavior

I expect in this case, a single-node, in memory, non-contended server, the dump performance to be on par w/ the load performance.

Environment:

cockroach version
Build Tag:    v20.1.0
Build Time:   2020/05/05 00:07:18
Distribution: CCL
Platform:     linux amd64 (x86_64-unknown-linux-gnu)
Go Version:   go1.13.9
C Compiler:   gcc 6.3.0
Build SHA-1:  9d456b9ec82cbf9a740a092c0d9f56da48779689
Build Type:   release

Additional context

This is blocking ability to backup.

dump.sql.gz

@blathers-crl
Copy link

blathers-crl bot commented Jul 8, 2020

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/bulk-io (found keywords: backup,cockroach dump)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jul 8, 2020
@donbowman
Copy link
Author

I am considering adding e.g. https://github.com/proullon/ramsql as a cache in the dump code, doing a fetch all of the crdb_internal into it, and then letting the normal dump logic work against that table.

@adityamaru
Copy link
Contributor

@donbowman did the above-linked work related to cleaning up of temp tables/sequences mitigate this issue? In which case I can close this issue 🙂

@donbowman
Copy link
Author

i don't know what the above-linked means?

the attached sql, on a clean db, shows the issue, was there a change made i should try?

@adityamaru
Copy link
Contributor

Apologies, I meant the issue referencing this one - #51219. I was going to wait for those changes to merge in before I revisited this issue to see if it had any positive impact on the performance. Let me reproduce and take a look.

@artificiosus
Copy link

For more information, we run cockroach dump mydatabase --dump-mode=both > dump-file.sql nightly on our database.
When running v19.2.1 the runtime was always <1min.
Having just upgraded to v20.1.7 the runtime has increased to >30min.
Database size, or any other property, has not changed since the upgrade.

@dt
Copy link
Member

dt commented Mar 9, 2021

We deprecated text-based dump in 20.2 and the more performant and reliable binary BACKUP/RESTORE available to all non-enterprise users, and have removed dump of anything other than metadata in the upcoming 21.1, so closing this as wontfix for now.

@dt dt closed this as completed Mar 9, 2021
@donbowman
Copy link
Author

although I undertand why you have done this, i think it makes your product much worse.

text-based sql backups are very important, sometimes one needs to import to e.g. an older version, or make modifications.

the binary backup is not a replacement for long term data retention purposes.

@dt
Copy link
Member

dt commented Mar 10, 2021

@donbowman When we talked to some users about those kinds of long-term retention/archival requirements, what we heard was that for those use-cases, they often actually preferred text-based exports to CSV or similar version (and even vendor) neutral format. So along with making BACKUP free, when we removed dump we also made our parallel and distributed EXPORT to CSV command free as well. Does that sound like it might be at least partially useful for the situations you were considering?

@donbowman
Copy link
Author

no.
e.g. if i want to test last nights backup on an older version of cockroach, or a new version, or import to postgresql on mydesktop for atest.
also, the csv doesn't handle e.g. blob or json fields.

also, for long term retention of disaster recovery, nothing beats pg_dump or mysqldump. Removing that... its a big blow, it gives me more reason to start migrating to tidb.

i view the sql output as vendor neutral (even tho its not strictly so, its mostly so when needed).

@dt
Copy link
Member

dt commented Mar 10, 2021

Interesting. Our dump files were specifically not always compatible with older versions than dumped them (e.g. if they used new features and syntax) and were not compatible with non-cockroach SQL like say postgresql due to including cockroach-specific features, so we didn't believe they often a lot of value for x-version or x-vendor migrations, compared to going to something version and platform agnostic like CSV, Avro, etc and then using that version/system's data importing support.

fwiw, there's also still cockroach sql -e "select * from mytbl' --format=inserts > mydata.sql` for the rows themselves.

@donbowman
Copy link
Author

i used minor sed or awk to fix any problems.

we commonly use the current dump for:

  • restore current prod backup onto a single node cockroach desktop in memory for testing something
  • archival, format-independent backup
  • rescuing alembic when it cannot move from current ot desired schema format

we had a couple of other cases that were challenging w/o it. once was a customer had inadvertently deleted some things, which was inside a larger backup. we had not considered this case,. I could not restore the older backup, since tht would revert other customer info. Instead we used 'grep' to fetch all the info w/ their GUUID from the text dump, and re-inserted those.

the alembic is troubling. since cockroachdb is not 100% feature compatible w/ other db, we sometimes end up in a case where alembic cannot update the schema to desired (e.g. we cannot always alter schema).

so although i agree for the limited case of full backup/restore to the exact same cluster and setup and version that the binary is better, i do not think it is a replacement, its much more limited.

the text mode dump of schema + data is something that every other db technology has, and it is often used for other than the narrow backup/restore.

@rafiss
Copy link
Collaborator

rafiss commented Mar 10, 2021

One more thing to add: for the schema use case with alembic, v21.1 will have a SHOW CREATE ALL TABLES SQL command that outputs the same info as cockroach dump --dump-mode=schema. #53488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-disaster-recovery X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

No branches or pull requests

6 participants