Refactoring Large File Upload Continification Logic #286

przadka · 2023-06-17T08:11:39Z

This PR addresses the issue Backblaze#381, "Unify large file continuation preventors". The common logic from _find_unfinished_file_by_plan_id and _match_unfinished_file_if_possible is extracted into a new function named _find_matching_unfinished_file, which now handles their shared behavior. This new function includes three parameters, log_rejections, check_file_info_without_large_file_sha1, and eager_mode, to allow for flexible control over its behavior. Additionally, docstrings for all functions have been enhanced to better express their purpose and functionality.

Key changes and observations:

Most of the code between the two original functions was repetitive, except _find_unfinished_file_by_plan_id had significantly less logging.
The original handling of None encryption appeared inconsistent. Specifically, the way None encryption was handled seemed to vary, which was identified as a potential bug. In this refactor, we've standardized the handling of None encryption. The new condition for this is: if encryption is not None and encryption != file_.encryption:. This means that if encryption is not None and the encryption type does not match the file encryption, the condition will be true, leading to a more consistent and correct treatment of this edge case.
The original behavior for sha1_sum check was inconsistent. The new approach is less penalizing, re-uploading only the part with a mismatching sha1, similar to _find_unfinished_file_by_plan_id.
Previously, _match_unfinished_file_if_possible was checking file_info_without_large_file_sha1, while _find_unfinished_file_by_plan_id was not. In the refactored function _find_matching_unfinished_file, this check is now controlled by a parameter (check_file_info_without_large_file_sha1), giving us flexibility to handle this check based on the specific needs of the caller function.
The new function also checks for part size and part sha, previously only checked by _match_unfinished_file_if_possible. The updated approach reuploads parts with non-matching sha or size, rather than failing completely.
The original handling of plan_id logic was not explicit, and while it hasn't been changed in this PR, it is noted for potential future improvement.

Potential next steps (not addressed in this PR to maintain simplicity):

Make plan_id logic more explicit in the code.
Unify the approach to logging. Currently, the refactored functions have different approaches, which seems unnecessary.
Consider removing _match_unfinished_file_if_possible and _find_unfinished_file_by_plan_id entirely. They are only called in a single place in the code each, so perhaps we could replace these calls with calls to _find_matching_unfinished_file.

rooterkyberian · 2023-06-17T08:58:32Z

CHANGELOG.md

@@ -17,6 +17,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### Infrastructure
 * Replaced `pyflakes` with `ruff` for linting
+* Automatically set copyright date when generating the docs


I think you accidentally started from #282 branch
btw, was that ever submitted upstream?

yes, you are right. i did by mistake. it is actually my branch and should be already merged, it was reviewed. I think we can merge this commit with this PR. ok?

I don't think that we can - basically workflow is, you create PR in reef, get it reviewed, then recreate it for upstream (backblaze repo).
Also we don't want stuff to get stuck on some other changes.

I did that a lot recently and it was no fun;p

ok, i can rebase (i guess this is the correct term) and resubmit my branch. that would be the reasonable approach?

little to late for a rebase if it was already submitted for review (the golden rule of rebasing ;) )
thankfully it is not that big change that got bundled in here - so we can just ignore it I guess for now

whatever we do - please create PR for date change to upstream

b2sdk/transfer/emerge/executor.py

This commit refactors the logic used to search for unfinished file uploads in the bucket, primarily focusing on the `_find_unfinished_file_by_plan_id`, `_match_unfinished_file_if_possible`, and `_find_matching_unfinished_file` functions. Key changes include: - Extracting shared logic into `_find_matching_unfinished_file`. This increased code reuse, and improved code maintenance and readability. - Refactoring `_find_unfinished_file_by_plan_id` and `_match_unfinished_file_if_possible` to use the shared logic, enhancing their readability and reliability. - Enhancing function documentation. Docstrings were updated to be more informative and adhere to a standard template. - Adding a note indicating 'listFiles' access is required for the operations.

rooterkyberian reviewed Jun 17, 2023

View reviewed changes

mpnowacki-reef reviewed Jun 17, 2023

View reviewed changes

b2sdk/transfer/emerge/executor.py Outdated Show resolved Hide resolved

b2sdk/transfer/emerge/executor.py Show resolved Hide resolved

b2sdk/transfer/emerge/executor.py Outdated Show resolved Hide resolved

b2sdk/transfer/emerge/executor.py Outdated Show resolved Hide resolved

przadka added 10 commits June 20, 2023 06:49

Improved docs for the functions to be refactored

3d5777b

Logging improvements, eager mode added

2863d88

Fixed typo in a comment.

a51417a

Fixed formatting.

a78ba72

Changelog update

c1ab961

Removed . from changelog entry.

d077efd

PR feedback implemented

370043e

Always log rejections

a56bd07

Replace assert with exception

3965e2c

przadka force-pushed the large-file-cont branch from e9320df to 3965e2c Compare June 20, 2023 04:52

Merge branch 'master' into large-file-cont

c9752b0

mjurbanski-reef merged commit 0ad8ae8 into master Jul 3, 2023
21 checks passed

mjurbanski-reef deleted the large-file-cont branch July 3, 2023 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring Large File Upload Continification Logic #286

Refactoring Large File Upload Continification Logic #286

przadka commented Jun 17, 2023

rooterkyberian Jun 17, 2023

przadka Jun 17, 2023

mjurbanski-reef Jun 17, 2023

przadka Jun 17, 2023

mjurbanski-reef Jun 17, 2023

Refactoring Large File Upload Continification Logic #286

Refactoring Large File Upload Continification Logic #286

Conversation

przadka commented Jun 17, 2023

rooterkyberian Jun 17, 2023

Choose a reason for hiding this comment

przadka Jun 17, 2023

Choose a reason for hiding this comment

mjurbanski-reef Jun 17, 2023

Choose a reason for hiding this comment

przadka Jun 17, 2023

Choose a reason for hiding this comment

mjurbanski-reef Jun 17, 2023

Choose a reason for hiding this comment