Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: deprecate cockroach dump #54040

Closed
dt opened this issue Sep 8, 2020 · 17 comments · Fixed by #54044
Closed

cli: deprecate cockroach dump #54040

dt opened this issue Sep 8, 2020 · 17 comments · Fixed by #54044
Assignees
Labels
A-disaster-recovery C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@dt
Copy link
Member

dt commented Sep 8, 2020

This functionality is duplicative with BACKUP and RESTORE, though they use more reliable, native representations for data and metadata. Ensuring all new SQL features like types, schemas, etc all work correctly in dump adds significant overhead to that feature work, or in the past is often missed, leading dump to be incomplete or broken. Previously we maintained both BACKUP and dump as BACKUP was enterprise-only but in 20.2+ basic backup that can do at least as much as dump can is free so this is no longer a reason to keep dump.

Interoperability with other databases is also not a reason to keep dump around -- dump already does not produce data that can be directly loaded into another database so a more reliable migration would be to export to a vendor-neutral format like CSV and use the native loading facilities of the destination DB. If maintaining dump to keep up with just out own feature set has proven too expensive, keeping up with it against a foreign DB would be doubly so.

@dt dt added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-disaster-recovery release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 8, 2020
@adityamaru adityamaru self-assigned this Sep 8, 2020
craig bot pushed a commit that referenced this issue Sep 8, 2020
54044: cli: add deprecation notice to dump CLI command r=dt,knz a=adityamaru

Fixes: #54040

Release justification: low risk, high benefit changes to existing functionality

Co-authored-by: Aditya Maru <adityamaru@gmail.com>
@craig craig bot closed this as completed in 127bae5 Sep 8, 2020
adityamaru added a commit to adityamaru/cockroach that referenced this issue Sep 8, 2020
Fixes: cockroachdb#54040

Release justification: low risk, high benefit changes to existing functionality
@CyborgMaster
Copy link

I currently use dump --dump-mode=schema to export our database schema at every commit so we can get a diff for PR's. After the upgrade to 20.2, I'm getting a deprecation warning with a link to this issue. Do you have a new recommended way of accomplishing this?

@adityamaru
Copy link
Contributor

Hi @CyborgMaster, thanks for reaching out!

We're aware of this use case and are in the process of outlining an alternative solution in time for the next release. You can track progress at #53488 and if you have any specific requirements about what you'd like to see in that new output dump, I'd encourage you to comment on the issue 🙂

adityamaru added a commit to adityamaru/cockroach that referenced this issue Nov 20, 2020
Previously we maintained both BACKUP and dump as BACKUP was
enterprise-only but in 20.2+ basic backup that can do at least as much
as dump can is free so this is no longer a reason to keep dump.

More detailed explanation at cockroachdb#54040.

Fixes: cockroachdb#56405
adityamaru added a commit to adityamaru/cockroach that referenced this issue Nov 23, 2020
Previously we maintained both BACKUP and dump as BACKUP was
enterprise-only but in 20.2+ basic backup that can do at least as much
as dump can is free so this is no longer a reason to keep dump.

More detailed explanation at cockroachdb#54040.

Fixes: cockroachdb#56405
craig bot pushed a commit that referenced this issue Nov 23, 2020
56964: cli: remove cockroach dump from CRDB r=knz a=adityamaru

Previously we maintained both BACKUP and dump as BACKUP was
enterprise-only but in 20.2+ basic backup that can do at least as much
as dump can is free so this is no longer a reason to keep dump.

More detailed explanation at #54040.

Fixes: #56405
Fixes: #28948

Co-authored-by: Aditya Maru <adityamaru@gmail.com>
@tomholub
Copy link

tomholub commented Jun 7, 2021

What is currently the recommended method of backing up whole database to client machine? (the device I'm running the command from)

If the above is correct, it seems the only way to export all data to a local device is to write SQL statements to iterate over all existing tables to back up the data table by table with EXPORT.

Is there a more streamlined method / am I overlooking something? Thanks

@tomholub
Copy link

tomholub commented Jun 8, 2021

On a second look, EXPORT also only supports uploading files to a server. Is there any option available to simply export to stdout or local file?

@dt
Copy link
Member Author

dt commented Jun 8, 2021

Hi @tomholub!

Indeed, EXPORT writes the results of running a given query directly from the nodes running the query to cloud-storage, and to do so with the maximum throughput, it intentionally avoids sending the results all back to a single client in the middle.

If you want to write the output of running a given query to a local file where you're running your client, you can use the client's --format=csv flag and then pipe its output, e.g. cockroach sql -e "select * form mytbl" --format=csv > mytbl.csv (other output formats include tsv or runnable SQL inserts, etc).

To your original question: BACKUP is indeed only able to backup to a "storage location" which is typically cloud storage, but there are a couple options if you just want to run backup and then download the files it wrote locally without a cloud storage bucket in the middle. One option is to use the file system on one of your nodes e.g. BACKUP TO 'nodelocal://1/mybackup' and fetch the mybackup directory that node's extern directory using scp or something.

Another option, if you're working with small data sizes and don't have direct access to the node file system e.g. via SCP, was introduced in 21.1 and is the ability to store "userfiles", including backup files, as bytes which are stored in regular SQL tables in the cluster itself and then read/written by the client using a SQL connection. This means you can BACKUP TO 'userfile:///mybackup' writing into another table in your cluster and then use cockroach userfile get to fetch those "files" from that table to local files, then userfile delete to remove them from the cluster's table once fetched.

Do either of those, or piped csv format, help?

@tomholub
Copy link

tomholub commented Jun 8, 2021

Thank you for the response - between these, I should be able to choose an option that's applicable to us.

@carr123
Copy link

carr123 commented Jun 10, 2021

"cockroach dump", i think, is very convenient for dumping schema and records. and the output is human readable.
so, i recommend keep it in future verison.

@message
Copy link

message commented Aug 12, 2021

As a new user, how can I backup my data into plain SQL? Backup and restore process should be easy. I want to make sure that my data is safe and I can validate that. On a import, cockroachdb processed bigint and integer as int8. Then I spent a few hours trying to make a backup without a cloud.

@message
Copy link

message commented Aug 13, 2021

Yeah, so I gave up trying to restore database dump from file system. I couldn't figure out how to upload a database backup to
nodelocal or userfile from file system.

So the next best solution for me was to use custom s3-like hosted service. I stumbled across this post and used Minio.

Launch a Docker container (or download an executable from a website):

# Replace /app/minio/data with your path to storage. You can use ${PWD}

docker run -d -p 29190:9000 -p 29191:9001 --name minio -v /app/minio/data:/data -e "MINIO_ROOT_USER=AKIAIOSFODNN7EXAMPLE" -e "MINIO_ROOT_PASSWORD=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" minio/minio server /data --console-address ":9001"

Launch MinIO console at http://localhost:29191/, login with MINIO_ROOT_USER and MINIO_ROOT_PASSWORD credentials. Create cockroachdb bucket and play with backup/restore.

root@:26257/defaultdb> BACKUP INTO 's3://cockroachdb/backups?AWS_ENDPOINT=http://172.17.0.1:29190&AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE&AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY' AS OF SYSTEM TIME '-10s';
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+---------
  684221608140079105 | succeeded |                  1 |  499 |           860 | 449242
(1 row)

Time: 187ms total (execution 187ms / network 0ms)

root@:26257/defaultdb> SHOW BACKUPS IN 's3://cockroachdb/backups?AWS_ENDPOINT=http://172.17.0.1:29190&AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE&AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY';
          path
-------------------------
  /2021/08/13-175839.92
  /2021/08/13-175841.24
  /2021/08/13-175842.15
  /2021/08/13-175843.09
  /2021/08/13-175843.88
  /2021/08/13-181033.38
  /2021/08/13-181034.67
  /2021/08/13-181053.79
(8 rows)

Time: 7ms total (execution 6ms / network 0ms)

root@:26257/defaultdb> DROP DATABASE beep CASCADE;
DROP DATABASE

Time: 154ms total (execution 154ms / network 0ms)

root@:26257/defaultdb> RESTORE DATABASE beep FROM 's3://cockroachdb/backups/2021/08/13-181053.79?AWS_ENDPOINT=http://172.17.0.1:29190&AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE&AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
                    -> ;
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
  684221870179385345 | succeeded |                  1 |   51 |             0 |  1511
(1 row)

Time: 191ms total (execution 191ms / network 0ms)


root@:26257/defaultdb> select * from beep.countries;
  id | code_alpha_2 | code_alpha_3 | is_country
-----+--------------+--------------+-------------
   1 | AF           | AFG          |    true
   2 | AX           | ALA          |    true
   3 | AL           | ALB          |    true
(3 rows)

Time: 14ms total (execution 14ms / network 0ms)

root@:26257/defaultdb> 

@adityamaru
Copy link
Contributor

I couldn't figure out how to upload a database backup to nodelocal or userfile from file system.

Hey! I can help you figure out how to backup to nodelocal or userfile. Is there any specific error you were running into?

@message
Copy link

message commented Aug 13, 2021

Hey @adityamaru, can you please help me to upload folder 13-181053.79 to cluster and then restore a backup from it?

[~] $ tree  /app/backups/data/cockroachdb/backups/2021/08/13-181053.79/
/app/backups/data/cockroachdb/backups/2021/08/13-181053.79/
├── BACKUP-CHECKPOINT-684221608140079105-CHECKSUM
├── BACKUP-CHECKPOINT-CHECKSUM
├── BACKUP_MANIFEST
├── BACKUP_MANIFEST-CHECKSUM
├── BACKUP-STATISTICS
└── data
    ├── 684221608442167299.sst
    ├── 684221608442396674.sst
    ├── 684221608442429442.sst
    ├── 684221608442527745.sst
    ├── 684221608442691585.sst
    ├── 684221608443117569.sst
    ├── 684221608474443778.sst
    ├── 684221608484700162.sst
    └── 684221608484831233.sst

1 directory, 14 files

Also, can you please help me with PLAIN SQL backup? I want to make a database backup to a file, like pg_dump does it. I can somehow save data from plain text, if something goes wrong. If something will go wrong with binary data, i will waste much more time and not sure that I will be able to recover it.

@adityamaru
Copy link
Contributor

What version of cockroach are you running on?

@message
Copy link

message commented Aug 13, 2021

[root@roach1 cockroach]# ./cockroach version
Build Tag: v21.1.7

@adityamaru
Copy link
Contributor

There are two options when it comes to restoring from a "local" storage option, namely userfile and nodelocal. I would recommend using userfile but In 21.1.7 we do not have recursive file upload for userfile so it might be a little tedious to upload the files in your tree one by one. We will have recursive upload in our next major release following this PR - #65307

Since you do not have too many files in your backup I'm going to recommend uploading each file individually while maintaining the directory structure. This would look like:

./cockroach userfile upload <local_file_path> backup/BACKUP-CHECKPOINT-684221608140079105-CHECKSUM --insecure
./cockroach userfile upload <local_file_path> backup/BACKUP-CHECKPOINT-CHECKSUM --insecure
./cockroach userfile upload <local_file_path> backup/BACKUP_MANIFEST --insecure
./cockroach userfile upload <local_file_path> backup/BACKUP_MANIFEST-CHECKSUM --insecure
./cockroach userfile upload <local_file_path> backup/BACKUP-STATISTICS --insecure

And then for each sst file:
./cockroach userfile upload <local_file_path> backup/data/684221608442167299.sst --insecure
...

Note, I have used --insecure but if you have certs setup then you can follow the instructions here https://www.cockroachlabs.com/docs/stable/cockroach-userfile-upload.html#flags

Once you have done this you should be able to run the RESTORE query with userfile://defaultdb.public.userfiles_root/backup as your destination.

You should be able to script the above upload statements. One of our Solution Engineers has put together this blog that might be useful https://blog.ervits.com/2021/07/recover-from-disaster-using-userfile.html.

We realize this UX is not great at the moment, but with recursive upload coming soon this should become much more pleasant!

@adityamaru
Copy link
Contributor

adityamaru commented Aug 13, 2021

Also, can you please help me with PLAIN SQL backup? I want to make a database backup to a file, like pg_dump does it. I can somehow save data from plain text, if something goes wrong. If something will go wrong with binary data, i will waste much more time and not sure that I will be able to recover it.

We no longer support generating a single PGDUMP like file (DDL + DML) and recommend users rely on backup/restore as this is a battle-tested and well-maintained feature of CockroachDB. There are however a couple of other tools that might be of interest to you:

Do either of BACKUP/RESTORE, or the piped CSV solution outlined above help?

@estebanbouza
Copy link

@adityamaru EXPORT seems to require you knowing all the table names in advance.
Is there a better way of doing this?
How should one go about migrating from CockroachDB to Postgres or any other DB after the removal of dump ?

@marcofeltmann
Copy link

The thing is: This strips off the possibility of encrypted backups, i.e. via Restic.
Yes, we could do BACKUP INTO 's3://bucket/dir' WITH ENCRYPTION_PASSPHRASE='SECRET';" but that requires Enterprise License.
Only allowing data security and encryption to paying consumers clashes with my understanding of Open Source.

Please get me right: Getting paid for convenience features is totally fine, but completely taking away any encryption capability is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants