-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring Large File Upload Continification Logic #286
Conversation
@@ -17,6 +17,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
|
|||
### Infrastructure | |||
* Replaced `pyflakes` with `ruff` for linting | |||
* Automatically set copyright date when generating the docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you accidentally started from #282 branch
btw, was that ever submitted upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right. i did by mistake. it is actually my branch and should be already merged, it was reviewed. I think we can merge this commit with this PR. ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we can - basically workflow is, you create PR in reef, get it reviewed, then recreate it for upstream (backblaze repo).
Also we don't want stuff to get stuck on some other changes.
I did that a lot recently and it was no fun;p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i can rebase (i guess this is the correct term) and resubmit my branch. that would be the reasonable approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
little to late for a rebase if it was already submitted for review (the golden rule of rebasing ;) )
thankfully it is not that big change that got bundled in here - so we can just ignore it I guess for now
whatever we do - please create PR for date change to upstream
This commit refactors the logic used to search for unfinished file uploads in the bucket, primarily focusing on the `_find_unfinished_file_by_plan_id`, `_match_unfinished_file_if_possible`, and `_find_matching_unfinished_file` functions. Key changes include: - Extracting shared logic into `_find_matching_unfinished_file`. This increased code reuse, and improved code maintenance and readability. - Refactoring `_find_unfinished_file_by_plan_id` and `_match_unfinished_file_if_possible` to use the shared logic, enhancing their readability and reliability. - Enhancing function documentation. Docstrings were updated to be more informative and adhere to a standard template. - Adding a note indicating 'listFiles' access is required for the operations.
This PR addresses the issue Backblaze#381, "Unify large file continuation preventors". The common logic from
_find_unfinished_file_by_plan_id
and_match_unfinished_file_if_possible
is extracted into a new function named_find_matching_unfinished_file
, which now handles their shared behavior. This new function includes three parameters,log_rejections
,check_file_info_without_large_file_sha1
, andeager_mode
, to allow for flexible control over its behavior. Additionally, docstrings for all functions have been enhanced to better express their purpose and functionality.Key changes and observations:
_find_unfinished_file_by_plan_id
had significantly less logging.None
encryption appeared inconsistent. Specifically, the wayNone
encryption was handled seemed to vary, which was identified as a potential bug. In this refactor, we've standardized the handling ofNone
encryption. The new condition for this is:if encryption is not None and encryption != file_.encryption:
. This means that if encryption is notNone
and the encryption type does not match the file encryption, the condition will be true, leading to a more consistent and correct treatment of this edge case.sha1_sum
check was inconsistent. The new approach is less penalizing, re-uploading only the part with a mismatchingsha1
, similar to_find_unfinished_file_by_plan_id
._match_unfinished_file_if_possible
was checkingfile_info_without_large_file_sha1
, while_find_unfinished_file_by_plan_id
was not. In the refactored function_find_matching_unfinished_file
, this check is now controlled by a parameter (check_file_info_without_large_file_sha1
), giving us flexibility to handle this check based on the specific needs of the caller function.sha
, previously only checked by_match_unfinished_file_if_possible
. The updated approach reuploads parts with non-matchingsha
or size, rather than failing completely.plan_id
logic was not explicit, and while it hasn't been changed in this PR, it is noted for potential future improvement.Potential next steps (not addressed in this PR to maintain simplicity):
plan_id
logic more explicit in the code._match_unfinished_file_if_possible
and_find_unfinished_file_by_plan_id
entirely. They are only called in a single place in the code each, so perhaps we could replace these calls with calls to_find_matching_unfinished_file
.