Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote restore not working - "DB metadata not changed. database may already exist" #9593

Closed
pragmaticivan opened this issue Mar 16, 2018 · 15 comments
Assignees
Labels

Comments

@pragmaticivan
Copy link

pragmaticivan commented Mar 16, 2018

Influxdb 1.5 (apine docker version)

backup

influxd backup -portable -database "metrics-cadvisor" -host $DATABASE_HOST:$DATABASE_PORT "$BACKUP_PATH"

influxd backup -portable -database "metrics-health" -host $DATABASE_HOST:$DATABASE_PORT "$BACKUP_PATH"

restore

influxd restore -portable -database "metrics-cadvisor"  -host $DATABASE_HOST:$DATABASE_PORT $BACKUP_PATH ;

influxd restore -portable -database "metrics-health" -host $DATABASE_HOST:$DATABASE_PORT $BACKUP_PATH

error

2018/03/16 17:18:44 error updating meta: DB metadata not changed. database may already exist
restore: DB metadata not changed. database may already exist
Restore failed

I tried deleting the 2 databases before restoring, and it didn't work.

@dgnorton
Copy link
Contributor

@pragmaticivan what version of InfluxDB are you running?

@pragmaticivan
Copy link
Author

Just updated Influxdb 1.5 (apine docker version)

@aanthony1243
Copy link
Contributor

When restoring, the flag for database selection is '-db' not '-database', as in influxd restore -portable -db "metrics-health" -host $DATABASE_HOST:$DATABASE_PORT $BACKUP_PATH. We'll propose improving on the flags and track them in issue #9608

@entone
Copy link

entone commented Apr 11, 2018

I'm still getting this error regardless of the -db flag. Have tried -newdb as well and no db flag. Currently running 1.5.1

Error

2018/04/11 09:56:20 error updating meta: DB metadata not changed. database may already exist restore: DB metadata not changed. database may already exist

Backup Command

influxd backup -portable -database "brood" -host localhost:8088 ~/Desktop/influx-backup-test-2

Restore Command

influxd restore -portable -db "brood" -host localhost:8088 ~/Desktop/influx-backup-test-2

Restore -newdb

influxd restore -portable -newdb "brood1" -host localhost:8088 ~/Desktop/influx-backup-test-2

@entone
Copy link

entone commented Apr 11, 2018

The legacy backup and restore appear to work fine.

@aanthony1243
Copy link
Contributor

hi @entone , try:

influxd restore -portable -db "brood" -newdb "brood1" -host localhost:8088 ~/Desktop/influx-backup-test-2

to clarify:
-db identifies the database from the backup file that you want to restore.
-newdb indicates the name you want to give to the imported database. If not given, it will default to the same name as the original. However, you must restore to a unique db name. If the original db already exists in the system, then the restore will fail, which is why you need both -db and -newdb in this case.

@bmailhe
Copy link

bmailhe commented Apr 12, 2018

Hi @aanthony1243

If the original db already exists in the system, then the restore will fail, which is why you need both -db and -newdb in this case.

Then can you explain me how to restore an incremental backup ?
Eg: every week I make a full backup, and every day an incremental one.
If I have to restore a full plus some increments, how to do it ?

@aanthony1243
Copy link
Contributor

Your commands above are not incremental, but always take a full backup. You can use either -since or -start and -end to extract the more recent data. (see https://docs.influxdata.com/influxdb/v1.5/administration/backup_and_restore/) Then for each incremental backup, you do as above and restore the incremental part to brood1. Then, after you've restored the incremental data to brood1, you can side-load it to the original DB and drop brood1:

>  use brood1
> SELECT * INTO brood..:MEASUREMENT FROM /.*/ GROUP BY *
> drop brood1

You can then repeat this process for each increment.

@mcappadonna
Copy link

Sorry @aanthony1243, this mean there isn't any way to restore incremental data without proceding db-by-db?

@aanthony1243
Copy link
Contributor

@mcappadonna yes, that's correct.

@petetnt
Copy link

petetnt commented May 17, 2018

I don't want to complain about open source software and I appreciate the work that you are doing a lot, but is incremental restore planned? Incremental backup without the means of restoring them incrementally without intensive manual work seems rather cumbersome and a huge oversight.

@petetnt
Copy link

petetnt commented Jun 13, 2018

We wrote this small script for restoring incremental backups so we won't be SOL when the worst happens

https://github.com/motleyagency/influxdb-incremental-restore

It's written in Node, so it might not be optimal for every case but has worked just fine for us.

@kaiterramike
Copy link

kaiterramike commented Aug 29, 2018

Also to add onto @petetnt's great work, before you do the SELECT * INTO operation suggested by @aanthony1243, you'll want to:

  1. disable query-timeout
  2. set some sort of CPU limit on influxdb so it doesn't render the machine completely unusable during the restore

I just tried an incremental restore of about 1.6M data points on a 2 core machine, and it took about 2 minutes. This operation is slower than the initial /usr/bin/influxd restore command by one or two orders of magnitude, which makes sense.

Edit: We're solidly in the "low" category on this page -- 18k series with appropriately sized hardware -- and doing this SELECT * INTO query on a single day's incremental backup takes down the machine by eating all the RAM. I'm afraid I'm going to have to invest a couple more days and write some custom code to stream rows between databases in a "nice" way -- chunking by series and by shard. I've looked into Kapacitor and export/import (data becomes too large on disk) and neither of them solve the problem.

Edit2: I found a way to merge everything from one database into another: pipe the output of an export command directly to an import command. Uses an extra ~500MB of RAM on my machine.

DB_FROM=r_air
DB_TO=r_air2
fifo_name=fifo-${DB_FROM}-to-${DB_TO}
mkfifo $fifo_name
influx_inspect export -datadir /var/lib/influxdb/data -waldir /var/lib/influxdb/wal -database $DB_FROM -out $fifo_name &
cat $fifo_name \
    | sed -e "s/^CREATE DATABASE ${DB_FROM} WITH/CREATE DATABASE ${DB_TO} WITH/" \
    | sed -e "s/^# CONTEXT-DATABASE:${DB_FROM}$/# CONTEXT-DATABASE:${DB_TO}/" \
    | influx -import -path /dev/stdin
rm $fifo_name

Note that you can't use -out /dev/stdout on the influx_inspect command because it already writes random stuff to stdout, which messes up influx -import. A named pipe is required.

The only downside: it's extremely slow. The number of lines to process is somewhere around 3/4 the number of bytes in the gzipped backup, so a 1GB backup will require processing about 750M lines, and at 100k lines/sec on a modest machine, that's about 2 hours.

@lafrech
Copy link

lafrech commented Dec 17, 2020

Currently struggling with restore/backup, I'm also surprised that I can't restore a backup over an existing database. I expected new data to be added and existing data to be overwritten.

I thought this was what is meant in the docs by

If duplicate data points are included in the backup files, the points will be written again, overwriting any existing data.

Perhaps the docs refer to duplicate data points in the backup itself (although I wonder how that could happen in the first place).

Please consider this a feature request then. The lack of an ability to restore a partial DB file into an existing database is an important shortcoming.

Use case: I recently backed up / restored databases to migrate an InfluxDB instance and something went wrong, I guess, because new databases lack chunks of data. After a few days, I made new backups and would like to complete the new databases. I can't afford to replace from scratch because new databases have been recording new data since the migration.


I've been trying the workaround proposed by @aanthony1243 in #9593 (comment):

use brood1
SELECT * INTO brood..:MEASUREMENT FROM /.*/ GROUP BY *
drop brood1

Unfortunately, it seems only measurements of numerical type are copied (#18132) so it does not provide a full backup/restore solution.

At this point, I have data in two different databases and I can't find a way to merge data from one into another.

@kaiterramike
Copy link

@lafrech the InfluxDB team has moved on to version 2.0, so you won't see a feature like this added to 1.x.

Consider using the hack I mentioned a few comments ago to merge data from one database to another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants