-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature chunk trash bin #215
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change, please check my suggestions.
Also, I see very nice unit tests but no integration tests. In this case, please consider to add it to see how it works in a real scenario.
The test could:
- Create some files.
- Set small trash time.
- Wait for them to be deleted.
- Check that the files were moved to the expected directory.
f06ea7c
to
bdeab93
Compare
@lgsilva3087 By your suggestion, the `test_unlink' was updated to check for the trashed files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please consider my minor suggestion for the integration test.
|
||
# Ensure thr "unlinked" files are trashed | ||
trashed_chunk_files=$(find_all_trashed_chunks | wc -l) | ||
if [ "${trashed_chunk_files}" -eq 0 ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, consider in this test to use probably assert_equals or some similar approach for checking if the number of chunks in trash was the expected one, instead of just checking if there is some trash or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test the removal is mandated before the replication of the files finishes. I added a check on the initial state, a sample on the middle and another fuzzy check at the end. Would this suffice the integration test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I consider it good now. Just take into account for comparing expected values with some variable value or function output to follow the conventions on other tests by performing this comparisons by using assert_equals
or assert_not_equal
. This is a minor change as it is most for having same coding style across all integration tests, but is beneficial for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice change, though I think it can be improved.
From my understanding, the chunks are never really unlinked, just moved to the trash folder. We could set a trashtime for those chunks, to make sure they are not filling the drives forever.
Maybe this could be an idea for a next iteration of the chunk trash bin feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍 🔥 🚀
I’ve added a few minor suggestions for your consideration, but they’re not blockers for merging.
src/chunkserver/chunkserver-common/chunk_trash_manager_unittest.cc
Outdated
Show resolved
Hide resolved
@dmga44 Yes, it is only changing the removal behavior. The trash time management is a responsibility for the Garbage collector. |
What Garbage collector? |
Currently WIP |
d436112
to
44f25fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @ictus4u 👍 🔥 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently WIP
In this case this PR should be marked as draft or the feature should be behind a feature flag (see my other comments in the code). I would prefer feature flag because this is already a very useful in high risk operations (upgrading, migrating from LizardFS etc.) and we should avoid long feature branches that take months.
I already love this feature (you could already achieve the deletion with a cron job), but putting a unfinished feature, that changes how chunkserver behaves significantly (it doesn't delete chunks, so chunkserver can get full over time without emptying the trash) and that is on by default on into dev is not something that should NOT be accepted at ALL in my opinion.
It's fine to have unfinished features in dev and even releases, but they should be disabled by default (for example prometheus is woefully unfinished, but needs to enabled specifically) so they don't affect anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please see my comments.
Also: Consider Urmas' suggestion to add it as an option.
src/chunkserver/chunkserver-common/chunk_trash_manager_unittest.cc
Outdated
Show resolved
Hide resolved
fi | ||
# Below value is non-deterministic, but it should be enough to test the | ||
# feature. Avoiding to use a fixed value to prevent false positives. | ||
random_files_count_before_removal=$(find_all_chunks | wc -l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random_files_count_before_removal=$(find_all_chunks | wc -l) | |
expected_chunks_count_before_removal=$(find_all_chunks | wc -l) |
fi | ||
# Below value is non-deterministic, but it should be enough to test the | ||
# feature. Avoiding to use a fixed value to prevent false positives. | ||
random_files_count_before_removal=$(find_all_chunks | wc -l) | ||
rm -f "$file" "$xorfile" | ||
|
||
# Wait for removing all the chunks | ||
timeout="3 minutes" | ||
if ! wait_for '[[ $(find_all_chunks | wc -l) == 0 ]]' "$timeout"; then | ||
test_add_failure $'The following chunks were not removed:\n'"$(find_all_chunks)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_add_failure $'The following chunks were not removed:\n'"$(find_all_chunks)" | |
test_add_failure $'The following chunks are still pending for removal:\n'"$(find_all_chunks)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a more descriptive message as removal could take more than 3 minutes in some cases. For instance consider setting this master param to low value to trigger much faster chunk operations: OPERATIONS_DELAY_INIT = 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c4da37e
to
2d0e144
Compare
72b2204
to
a46c8d8
Compare
This commit provides additional safety measure for the system, temporarilly allowing easier recovery for deleted chunks.
For splitting responsibility and better tests.
Co-authored-by: Antuan <antuan@leil.io> Co-authored-by: Guillex <guillex@leil.io>
Co-authored-by: Antuan <antuan@leil.io>
Implement Chunk Trash garbage collection with configuration-based trash management, including methods for moving files to trash, deleting expired files, and freeing space based on disk thresholds. Co-authored-by: Antuan <antuan@leil.io>
a46c8d8
to
16c6a2d
Compare
if [ "${trashed_chunks_count}" -eq 0 ]; then | ||
test_add_failure $'The removed chunks were not moved to the trash folder' | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this kind of comparisons to be treated like this on the tests
# Ensure the "unlinked" files are trashed
assert_equals "0" "${trashed_chunks_count}"
In order to be conventionally align to the other tests.
Add a safety layer in case of chunk removal by SaunaFS. Instead, the chunks files will go to a temporary trash bin folder.