-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repository Cleanup Endpoint #43900
Repository Cleanup Endpoint #43900
Changes from all commits
add8cbb
b850f48
7abc873
7b698f4
80624cd
8d22cd5
6e14c27
059eca9
fd6f8df
bfd7e63
4d1ed1f
0b5b0c0
a828306
d2952d4
d913ad4
c54a075
71446c5
48da5d0
6f2e702
5a0c826
79bcc6b
87785e1
2bbd0f0
29b419d
2340375
b2ebb52
ba1ad03
266471f
28a4b69
81ba190
76b3fae
a0493df
019b58f
35021f1
7f41860
8dbd066
766c0be
2e2df56
65a5320
7e14c60
8e68683
3cb590e
5ad0353
0e17678
2f87260
63682c4
115a9bb
0a5d85e
21cfebc
1eb2ddc
951fc9d
a8363ab
a10a4b3
dbb21af
e68439d
6b3c0a9
c64f97f
7a05d27
aaecfed
d8f7467
1f372df
1e5a0c7
68f5dd6
3ee6559
785e33a
9759b9d
d6ec556
1bf001a
9a4e412
58a047d
f5db718
18b3b65
2a45ba5
d4ae1c0
461c732
10d158f
cb6c776
2e2a52f
776b751
e9ae50a
93d395c
590295b
f97036f
af7bc59
c288432
e8a826d
d40580f
c0ca7cd
bc687ac
3a2f3cc
aef1735
6fc9fd9
c9e3e29
d777610
dece663
091fb70
c8d513a
3f51c20
eb45c17
10e02d7
25fb827
6040781
3af846c
f8238cf
6678156
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -332,6 +332,42 @@ POST /_snapshot/my_unverified_backup/_verify | |
|
||
It returns a list of nodes where repository was successfully verified or an error message if verification process failed. | ||
|
||
[float] | ||
===== Repository Cleanup | ||
Repositories can over time accumulate data that is not referenced by any existing snapshot. This is a result of the data safety guarantees | ||
the snapshot functionality provides in failure scenarios during snapshot creation and the decentralized nature of the snapshot creation | ||
process. This unreferenced data does in no way negatively impact the performance or safety of a snapshot repository but leads to higher | ||
than necessary storage use. In order to clean up this unreferenced data, users can call the cleanup endpoint for a repository which will | ||
trigger a complete accounting of the repositories contents and subsequent deletion of all unreferenced data that was found. | ||
|
||
[source,js] | ||
----------------------------------- | ||
POST /_snapshot/my_repository/_cleanup | ||
----------------------------------- | ||
// CONSOLE | ||
// TEST[continued] | ||
|
||
The response to a cleanup request looks as follows: | ||
|
||
[source,js] | ||
-------------------------------------------------- | ||
{ | ||
"results": { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it called "results" and not "result"? I would expect to have an array if the field is named "result" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think "results" is more natural language wise here in English (I guess because we're listing multiple non-binary values but I tbh. I don't have a good explanation as to why that is :D). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is the result of the clean up operation, so I think we could just drop the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mainly did this to stay a little consistent in style with other snapshot APIs. We never really return a flat object, we always wrap under a descriptive key don't we? (e.g. getting a list of snapshots in a repo, creating a snapshot, ...) Aesthetically I always kind of like an object like that but I don't have a good argument for it otherwise :) I suppose it makes it a little easier to extend if we ever want to in the future, but that isn't so relevant for this endpoint I guess. |
||
"deleted_bytes": 20, | ||
"deleted_blobs": 5 | ||
} | ||
} | ||
-------------------------------------------------- | ||
// TESTRESPONSE | ||
|
||
Depending on the concrete repository implementation the numbers shown for bytes free as well as the number of blobs removed will either | ||
be an approximation or an exact result. Any non-zero value for the number of blobs removed implies that unreferenced blobs were found and | ||
subsequently cleaned up. | ||
|
||
Please note that most of the cleanup operations executed by this endpoint are automatically executed when deleting any snapshot from a | ||
repository. If you regularly delete snapshots, you will in most cases not get any or only minor space savings from using this functionality | ||
and should lower your frequency of invoking it accordingly. | ||
|
||
[float] | ||
[[snapshots-take-snapshot]] | ||
=== Snapshot | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that the cleanup functionality is also automatically run on snapshot deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not sure. It would make sense now, but I was gonna add the logic to clean up all the individual shard folders in a follow-up which we can't run on delete (it's just super slow), not sure if it's worth explaining the difference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most folks shouldn't bother about this API, given that all relevant clean-up will happen on snapshot deletion. I do not want to raise the impression that you will have to run this API to sensibly operate an ES cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, I added a note on that now in
25fb827