-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: incremental backups can miss non-transactional writes #62564
Comments
As discussed offline (har, har) we also need to exclude |
Chatted with @dt about the approaches on handing OFFLINE tables. I'm summarizing our conversation here and outlining plans on how to move forward here. Today, database backups do not include OFFLINE tables. This means that they don't run into this particular issue, but an incremental backup that would have missed data would have failed will run into #61571 instead. So, this ticket will address solving the problem for OFFLINE tables include in cluster backups. The solution is similar for database backups with OFFLINE tables once those are included. In general, the fix that we need (until ExportRequest supports non-MVCC operations) is for the first incremental backup after the offline table returns to public to capture all changes since the table was brought offline (ie all of the AddSSTable traffic). What We Do TodayBackup/restore manages the covering of span/time space that it needs to include in incremental backups. This can be visualized by plotting the span/time space that each incremental backup covers. Today, each incremental backup of an offline table will just cover the difference from the last backup: However inc2 and inc3 will be missing AddSSTable requests that have written at time t(import_start) after inc1 was taken. This means that restores at t > t(import_end) will be missing data. Fix 1The first step to fixing this is for the first incremental backup after the table comes back online (inc3 in this case), to take another full backup of this table (rather than incrementally capture the difference). This is the easiest option to implement since we already have a mechanism to do this (introduced spans). This would yield in: The drawback is that inc3 will need to re-backup all of the data that was in the table pre-import as well as all of the imported data. Fix 2To resolve the drawback, further changes can be made so that instead of covering all the way back to t=0, inc3 just covers back until the import started: This optimization will only be implemented after fix 1 since it requires a more disruptive code change, which may not be backportable. This solution would perform an incremental read from t(import_start) until t(inc3). Today, this won't work because we check that the start time of ExportRequests are above the GC threshold and the GC threshold may be at any point below t(inc2) - including times above t(import_start). However, today IMPORT, IMPORT INTO, and RESTORE do not overwrite keys that previously existed in the table. And since we know that t(import_end) is above the GC threshold, we can conclude that it is safe to read the KVs from this table all the way back to t(import_start). ExportRequest is not aware of this though, so we would need to convince it it is safe (possibly by passing in a flag overriding it's check of the start time against the GC threshold). |
62572: backupccl: only backup public indexes r=pbardea a=pbardea This commit ensures that backup only backs up PUBLIC indexes. This means that it will not back up ADDING indexes. This is because AddSSTable requests may write in the past, so incremental backups may miss writes when performing a time-based scan from the previous incremental backup. Imforms #62564. Test test previously failed with: ``` --- FAIL: TestBackupOnlyPublicSpans (1.92s) backup_test.go:7802: Error Trace: backup_test.go:7802 Error: Received unexpected error: expected incremental backup to not contain any data, found spans [/Table/53/{2-3}] ``` Release note (bug fix): Fixes a bug where index backfill data may have been missed by BACKUP in incremental backups. Co-authored-by: Paul Bardea <pbardea@gmail.com>
63121: backupccl: re-backup spans that come online during incremental backups r=pbardea a=pbardea This commit fixes a bug where backup would miss non-transactional writes (via AddSSTable) during incremental backups. These backups were missed because AddSSTable can write to a timestamp that is before the previous incremental backup. So, if a table was written to while OFFLINE (e.g. by a RESTORE or an IMPORT), during a backup, the following incremental backup may miss some data. To resolve this, BACKUP now re-backs up all of the data of OFFLINE tables on incremental backups that put this table back online. This comes with the drawback of some incremental backups (when a restore or import completes) will be much slower since it has to recapture all of the data. There is planned future work so that these incrementals only the new data written by the RESTORE or IMPORT, rather than resorting to backing up the entire table again. This will be addressed in a later PR. Implements fix 1 as described in #62564, which addresses the correctness concerns at the expense of incremental backup performance. Release note (bug fix): Incremental cluster backups may have missed data written to tables while they were OFFLINE. In practice this can happen if a RESTORE or IMPORT was running across incremental backups. Co-authored-by: Paul Bardea <pbardea@gmail.com>
Closing as done by #63121 |
The backup fixtures in this test were generated on 20.2 where cockroachdb#62564 existed. Given this was >1 major version in the past we do not expect to see a reoccurrence of the bug in 22.2+ backups that are restoreable in this release. Release note: None
The backup fixtures in this test were generated on 20.2 where cockroachdb#62564 existed. Given this was >1 major version in the past we do not expect to see a reoccurrence of the bug in 22.2+ backups that are restoreable in this release. Release note: None
Non-transactional writes, such as AddSSTable requests may write at a timestamp arbitrarily far in the past. As such, the incremental iterator used by backup will miss these writes if they write to a timestamp prior to that of the previous backup.
Indexes that are being added to tables should only be backed up when they appear online. Additionally, these spans should be considered newly introduced so that backup performs a full scan of the index, rather than an incremental read.
Offline tables should be included in the backup to account for the case where a table that was previously online was taken offline. The writes to the table while it was OFFLINE will need to be captured in a subsequent incremental backup once the table returns online.
The text was updated successfully, but these errors were encountered: