-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to backup/restore metadata files for DiskS3 #18377
Ability to backup/restore metadata files for DiskS3 #18377
Conversation
…ore-metadata # Conflicts: # src/Disks/DiskCacheWrapper.cpp # src/Disks/S3/DiskS3.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work!
I'll put here some details about implementation, correct me if I understand something wrong:
- All new data files on S3 will have name
/r<REV>-file-<RANDOM_NAME>
andpath
in metadata with path to metadata files on local disk - If we want to restore file we create metadata file for corresponding data file on local disk in path specified in
path
. Metadata files itself not stored on S3, it created for each data-file from scratch and we need to know only data-file name and path to metadata file on local disk for it. - In S3 path
/operations/r<REV>-rename
or/operations/r<REV>-hardlink
we store information about operations performed on metadata files and replay it for restored metadata files.
Also I left some general question in code.
|
||
LOG_INFO(&Poco::Logger::get("DiskS3"), "Starting up disk {}", name); | ||
|
||
/// Find last revision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we cannot keep this information in file on local disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we can lose all metadata information on the local disk including the last revision in separate file.
We can rely only on the data stored in S3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about to store latest revision in S3 object called latest_revision
? Or searching revision in such way expected to be cheap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing latest_revision
every time to the separate object will cause overhead on each file write.
Search revision at start-up is cheap, just ~64 (binary search) head object requests.
information.source_path = s3_root_path; | ||
|
||
readRestoreInformation(information); | ||
if (information.revision == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happended if we try to restore from revision, but some data files from it was deleted?
Do we actually need restoring from arbitrary revision, not from last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually needed if we want to have backups and restore metadata to an arbitrary backup state.
To ensure that data is not deleted we should perform 'ALTER TABLE PARTITIONS FREEZE' queries and remember the revision counter that dumped into revision.txt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Is there plans to write piece of documentation about how to use this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course
object_metadata = { | ||
{"path", path} | ||
}; | ||
s3_path = "r" + revisionToString(revision) + "-file-" + s3_path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What heppened if we enable this feature for disk created without it ealier? Will restoring fail or it restore information only from newly created files with this metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, we can restore the disk state for only newly created files, other files that don't have 'path' in metadata will be ignored.
To have the possibility to restore old files we should introduce a migration utility.
This utility should update metadata for existing files storing 'path' key.
This is the very important thing, but I prefer to do it in a separate PR with separate tests.
At the moment we can enable/disable restore possibility using 'send_metadata' option for DiskS3, so we can easily test this utility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we reject restoring if there are object without path
? To ensure that we don't restore some non-consistent state
UPD: this feature cannot be changed for existing disks, so we can only enable for newly created disks and partial state is impossible Am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes sense. I'll change this piece of code to throw exception during restore.
I think behaviour can be changed for existing disk if we do migration existing files with path
in metadata.
|
||
void DiskS3::restore() | ||
{ | ||
if (!exists(restore_file_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put information into config? If some metadata files exists do nothing and restore it with information from config otherwise. Is it suitable or there are other resons why we need to put it in file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an existing cluster with backups and we want to restore a backup to another cluster with a different configured bucket/path we should somehow transfer this information.
This is the usual case for ClickHouse clusters in Yandex.Cloud.
I haven't thought of anything better than just put this information into a separate file.
{ | ||
if (send_metadata) | ||
return (DiskS3::ObjectMetadata){{"path", path}}; | ||
const String key = "operations/r" + revisionToString(revision) + "-" + operation_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it support remove*
operations? Now It can be only move or hardlink. Why we can omit other operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could seem that we need to support remove
operations as well, but we can actually skip it.
Before every removing operation CH renames the part directory to '_delete_tmp_XXX'.
So we need only to restore this rename to '_delete_tmp_XXX'.
When CH starts it will just remove all these '_delete_tmp_XXX' directories by himself
Could you also comment this from my previous message? |
Your thougths about how it should work generally are right. |
…ore-metadata # Conflicts: # src/Disks/DiskDecorator.h # src/Disks/IDisk.h # src/Disks/S3/DiskS3.cpp
Looks like failed tests failed in master recently too. CI Info
|
Internal documentation ticket: DOCSUP-18488 |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add the ability to backup/restore metadata files for DiskS3
Detailed description / Documentation draft:
We want to add the possibility to restore metadata files on local FS for DiskS3 using only S3. What do we need to implement this feature?
To restore metadata files we need to list all S3 objects having 'path' in metadata and create a metadata file on local FS by the given path.
Then we need to apply all renames and hard link creations. Renames and hard links should be time ordered, so we need logical clocks to linearize all operations (file write, rename, hardlink).
We can introduce a simple implementation of such clocks - named as 'revision'. This is an atomic counter incremented by each write / rename / hardlink operation. Each object should have revision in its name.
Revisions can be used also to make a snapshot of the current state of S3 (backup). To make a backup we need to freeze all tables and remember a current revision counter. Part files wouldn't be deleted as they backed-up in the shadow folder, the revision counter cuts a set of files / operations needed to restore a particular state.