Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update zarr checksums #120

Merged
merged 6 commits into from
Mar 16, 2022
Merged

Update zarr checksums #120

merged 6 commits into from
Mar 16, 2022

Conversation

dchiquito
Copy link
Contributor

@dchiquito dchiquito commented Mar 2, 2022

Fixes dandi/dandi-archive#937
Fixes dandi/dandi-archive#925
Fixes dandi/dandi-archive#931

attn @jwodder note that the method signature for get_checksum has changed. Files/directores are now represented by a tuple (checksum, size) instead of just a string checksum.

@codecov
Copy link

codecov bot commented Mar 2, 2022

Codecov Report

Merging #120 (02972c5) into master (0d267ca) will decrease coverage by 0.01%.
The diff coverage is 97.82%.

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
- Coverage   96.67%   96.66%   -0.02%     
==========================================
  Files          18       18              
  Lines        1625     1649      +24     
==========================================
+ Hits         1571     1594      +23     
- Misses         54       55       +1     
Flag Coverage Δ
unittests 96.66% <97.82%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandischema/digests/zarr.py 98.73% <96.29%> (-1.27%) ⬇️
dandischema/digests/tests/test_zarr.py 100.00% <100.00%> (ø)
dandischema/models.py 94.25% <100.00%> (+0.06%) ⬆️
dandischema/tests/test_models.py 96.39% <100.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d267ca...02972c5. Read the comment docs.

@jwodder
Copy link
Member

jwodder commented Mar 2, 2022

@dchiquito How are you definining the size of a directory?

@dchiquito
Copy link
Contributor Author

@jwodder The sum of all files contained in the directory.

@jwodder jwodder added the minor Increment the minor version when merged label Mar 2, 2022
"""

md5: str
path: str
name: str
size: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if to simplify transition there could be a proper deprecation cycle which would support path, and decompose it into name and size as needed. Should be simple right? Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's necessary. As soon as this is released and incorporated in dandi-archive and dandi-cli, we can just recompute checksums on all zarrs and path will be updated to name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, since there is no way to announce that dandischema (next release) breaks dandi-cli << (whatever would become compatible with new interface) people would end up with a dandi-cli which might puke an exception if they manage to upgrade dandischema but not dandi-cli. unlikely but possible. deprecation cycle allows to amortize perturbation and avoid such problems...
oh well -- since python is dynamic and zarr isn't primary target yet for users, I am ok to proceed without deprecation cycle at hopefully low or none count of affected users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make the next dandischema release a new minor version, it would be incompatible with dandi-cli's dandischema ~= 0.5.1 requirement and thus help to prevent that from happening (but there would still be problems with checksum failures if people use the old client to upload Zarrs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right, nice.
@jwodder Could you prep a PR for dandi-cli to accompany this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchiquito
Copy link
Contributor Author

@jwodder In 38efd13 I changed the md5 field to digest, since it no longer represents just an md5. You will need to calculate directory checksums with the new metadata appended. There are two new methods, generate_directory_digest and parse_directory_digest that might help.

@jwodder
Copy link
Member

jwodder commented Mar 2, 2022

@dchiquito Does get_checksum() return the new directory checksums? If so, I don't think I need to change my code.

@dchiquito
Copy link
Contributor Author

@jwodder It does, yes. Just wanted to make you aware of the change.

Copy link
Member

@satra satra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update the schema checks for the digest as well. also any reason to use one - followed by two -- in the digest?

@dchiquito
Copy link
Contributor Author

@satra one - then two -- was Yarick's original suggestion in dandi/dandi-archive#937 (comment). I am ambivalent.

@dchiquito
Copy link
Contributor Author

@satra can you approve if you have no more qualms?

@satra
Copy link
Member

satra commented Mar 11, 2022

@dchiquito - sorry i wasn't clear in my previous comment.

when i meant schema checks, i meant changes in models.py and test_models.py to reflect the new format of the checksum, so that the metadata validates.

Copy link
Member

@jjnesbitt jjnesbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me, approval pending Satra's comments being addressed.

@dchiquito
Copy link
Contributor Author

Ah I see, I'll fix that up

@dchiquito dchiquito requested a review from satra March 16, 2022 14:49
Copy link
Member

@satra satra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @dchiquito - looks good to me.

@dchiquito dchiquito merged commit 7ed2bab into master Mar 16, 2022
@dchiquito dchiquito deleted the update-zarr-checksums branch March 16, 2022 20:30
@dchiquito
Copy link
Contributor Author

@satra This PR was tagged minor but not release so the release was not cut. What's the procedure for releasing 0.6.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor Increment the minor version when merged
Projects
None yet
5 participants