dump command dramatically slower than insert #51116

donbowman · 2020-07-08T02:51:16Z

Describe the problem

attached is a sql file to load. I load it into an in-memory (store type=mem) cockroachdb. This takes approximately 35s.
I then run dump. This takes approximately 46minutes.

If I use --echo-sql, nearly all of the time is spent trying to understand the schema. It appears to take <1min for the actual data dump, the other ~45 min is spent loading the schema of each table one by one.

To Reproduce

start-single-node --background --insecure --store=type=mem,size=10% --listen-addr=localhost:26257
cockroach sql --insecure < dump.sql
cockroach dump --insecure defaultdb > /tmp/dump.sql

Expected behavior

I expect in this case, a single-node, in memory, non-contended server, the dump performance to be on par w/ the load performance.

Environment:

cockroach version
Build Tag:    v20.1.0
Build Time:   2020/05/05 00:07:18
Distribution: CCL
Platform:     linux amd64 (x86_64-unknown-linux-gnu)
Go Version:   go1.13.9
C Compiler:   gcc 6.3.0
Build SHA-1:  9d456b9ec82cbf9a740a092c0d9f56da48779689
Build Type:   release

Additional context

This is blocking ability to backup.

dump.sql.gz

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-07-08T02:51:19Z

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@cockroachdb/bulk-io (found keywords: backup,cockroach dump)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

donbowman · 2020-07-08T21:19:06Z

I am considering adding e.g. https://github.com/proullon/ramsql as a cache in the dump code, doing a fetch all of the crdb_internal into it, and then letting the normal dump logic work against that table.

adityamaru · 2020-08-18T16:57:29Z

@donbowman did the above-linked work related to cleaning up of temp tables/sequences mitigate this issue? In which case I can close this issue 🙂

donbowman · 2020-08-18T20:29:23Z

i don't know what the above-linked means?

the attached sql, on a clean db, shows the issue, was there a change made i should try?

adityamaru · 2020-08-18T22:52:16Z

Apologies, I meant the issue referencing this one - #51219. I was going to wait for those changes to merge in before I revisited this issue to see if it had any positive impact on the performance. Let me reproduce and take a look.

artificiosus · 2020-10-16T12:47:42Z

For more information, we run cockroach dump mydatabase --dump-mode=both > dump-file.sql nightly on our database.
When running v19.2.1 the runtime was always <1min.
Having just upgraded to v20.1.7 the runtime has increased to >30min.
Database size, or any other property, has not changed since the upgrade.

dt · 2021-03-09T16:17:20Z

We deprecated text-based dump in 20.2 and the more performant and reliable binary BACKUP/RESTORE available to all non-enterprise users, and have removed dump of anything other than metadata in the upcoming 21.1, so closing this as wontfix for now.

donbowman · 2021-03-09T17:53:53Z

although I undertand why you have done this, i think it makes your product much worse.

text-based sql backups are very important, sometimes one needs to import to e.g. an older version, or make modifications.

the binary backup is not a replacement for long term data retention purposes.

dt · 2021-03-10T14:26:56Z

@donbowman When we talked to some users about those kinds of long-term retention/archival requirements, what we heard was that for those use-cases, they often actually preferred text-based exports to CSV or similar version (and even vendor) neutral format. So along with making BACKUP free, when we removed dump we also made our parallel and distributed EXPORT to CSV command free as well. Does that sound like it might be at least partially useful for the situations you were considering?

donbowman · 2021-03-10T14:54:48Z

no.
e.g. if i want to test last nights backup on an older version of cockroach, or a new version, or import to postgresql on mydesktop for atest.
also, the csv doesn't handle e.g. blob or json fields.

also, for long term retention of disaster recovery, nothing beats pg_dump or mysqldump. Removing that... its a big blow, it gives me more reason to start migrating to tidb.

i view the sql output as vendor neutral (even tho its not strictly so, its mostly so when needed).

dt · 2021-03-10T15:54:40Z

Interesting. Our dump files were specifically not always compatible with older versions than dumped them (e.g. if they used new features and syntax) and were not compatible with non-cockroach SQL like say postgresql due to including cockroach-specific features, so we didn't believe they often a lot of value for x-version or x-vendor migrations, compared to going to something version and platform agnostic like CSV, Avro, etc and then using that version/system's data importing support.

fwiw, there's also still cockroach sql -e "select * from mytbl' --format=inserts > mydata.sql` for the rows themselves.

donbowman · 2021-03-10T19:58:07Z

i used minor sed or awk to fix any problems.

we commonly use the current dump for:

restore current prod backup onto a single node cockroach desktop in memory for testing something
archival, format-independent backup
rescuing alembic when it cannot move from current ot desired schema format

we had a couple of other cases that were challenging w/o it. once was a customer had inadvertently deleted some things, which was inside a larger backup. we had not considered this case,. I could not restore the older backup, since tht would revert other customer info. Instead we used 'grep' to fetch all the info w/ their GUUID from the text dump, and re-inserted those.

the alembic is troubling. since cockroachdb is not 100% feature compatible w/ other db, we sometimes end up in a case where alembic cannot update the schema to desired (e.g. we cannot always alter schema).

so although i agree for the limited case of full backup/restore to the exact same cluster and setup and version that the binary is better, i do not think it is a replacement, its much more limited.

the text mode dump of schema + data is something that every other db technology has, and it is often used for other than the narrow backup/restore.

rafiss · 2021-03-10T22:17:19Z

One more thing to add: for the schema use case with alembic, v21.1 will have a SHOW CREATE ALL TABLES SQL command that outputs the same info as cockroach dump --dump-mode=schema. #53488

blathers-crl bot added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jul 8, 2020

donbowman mentioned this issue Jul 8, 2020

dump: getBasicMetadata: programming error -- expected just 1 row of result, got more with TEMP SEQUENCE #51166

Closed

rohany assigned adityamaru Jul 9, 2020

donbowman mentioned this issue Jul 9, 2020

temp tables/sequences not cleaned up on session end #51219

Closed

kenliu added the T-disaster-recovery label Dec 5, 2020

dt closed this as completed Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dump command dramatically slower than insert #51116

dump command dramatically slower than insert #51116

donbowman commented Jul 8, 2020 •

edited

Loading

blathers-crl bot commented Jul 8, 2020

donbowman commented Jul 8, 2020

adityamaru commented Aug 18, 2020

donbowman commented Aug 18, 2020

adityamaru commented Aug 18, 2020

artificiosus commented Oct 16, 2020

dt commented Mar 9, 2021

donbowman commented Mar 9, 2021

dt commented Mar 10, 2021 •

edited

Loading

donbowman commented Mar 10, 2021

dt commented Mar 10, 2021

donbowman commented Mar 10, 2021

rafiss commented Mar 10, 2021

dump command dramatically slower than insert #51116

dump command dramatically slower than insert #51116

Comments

donbowman commented Jul 8, 2020 • edited Loading

blathers-crl bot commented Jul 8, 2020

donbowman commented Jul 8, 2020

adityamaru commented Aug 18, 2020

donbowman commented Aug 18, 2020

adityamaru commented Aug 18, 2020

artificiosus commented Oct 16, 2020

dt commented Mar 9, 2021

donbowman commented Mar 9, 2021

dt commented Mar 10, 2021 • edited Loading

donbowman commented Mar 10, 2021

dt commented Mar 10, 2021

donbowman commented Mar 10, 2021

rafiss commented Mar 10, 2021

donbowman commented Jul 8, 2020 •

edited

Loading

dt commented Mar 10, 2021 •

edited

Loading