-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF+RF validation #1209
BF+RF validation #1209
Conversation
…if no schema version provided Otherwise (if no schema provided) -- we were completely skipping validation, and that is IMHO is not right. Such change already breakes some tests, e.g. dandi/tests/test_files.py::test_validate_deep_zarr >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > /home/yoh/proj/dandi/dandi-cli-master/dandi/files/bases.py(183)get_validation_errors() -> if devel_debug: (Pdb) l 178 try: 179 asset = self.get_metadata(digest=DUMMY_DIGEST) 180 BareAsset(**asset.dict()) 181 except ValidationError as e: 182 import pdb; pdb.set_trace() 183 -> if devel_debug: 184 raise 185 # TODO: how do we get **all** errors from validation - there must be a way 186 return [ 187 ValidationResult( 188 origin=ValidationOrigin( (Pdb) p e ValidationError(model='BareAsset', errors=[{'loc': ('digest',), 'msg': 'A zarr asset must have a zarr checksum.', 'type': 'value_error'}])
we have logic quite duplicated in a number of places and similar exception handling was already RFed in DandisetMetadataFile but was left "old" in the LocalAsset .
…WBAsset validation behavior
…per usage, fixed type annotation Also getting proper `msg` field, not `message` from the dict although allowing for both since why not.
We can easily have too deep and some other schema based errors at the same time.
… Path among types Note: for some reason our type testing was not triggerring error but pycharm did highlight this issue for me. @jwodder might know more.
Codecov ReportBase: 89.19% // Head: 89.24% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1209 +/- ##
==========================================
+ Coverage 89.19% 89.24% +0.04%
==========================================
Files 76 76
Lines 9487 9492 +5
==========================================
+ Hits 8462 8471 +9
+ Misses 1025 1021 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
dandi/files/bases.py
Outdated
@@ -743,12 +739,15 @@ def _get_nwb_inspector_version(): | |||
|
|||
|
|||
def _pydantic_errors_to_validation_results( | |||
errors: Any[list[dict], Exception], | |||
errors: list[dict | Exception] | ValidationError, | |||
file_path: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the only use of file_path
in this function converts it to a Path
, it would be better to make this argument a Path
and remove the str()
call when passing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed! thanks, done in b6f8180
dandi/files/bases.py
Outdated
@@ -488,6 +476,12 @@ def get_validation_errors( | |||
schema_version: Optional[str] = None, | |||
devel_debug: bool = False, | |||
) -> list[ValidationResult]: | |||
"""Validate NWB asset | |||
|
|||
If `schema_version` was provided, we only validate basic metadata, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If `schema_version` was provided, we only validate basic metadata, | |
If ``schema_version`` was provided, we only validate basic metadata, |
One backtick for things that can be linked to (like classes and functions), two backticks for other code.
Thanks @jwodder for the review. Before I embark on further RF journey -- do you remember/have an idea why we had that different behavior depending on |
dandi/files/zarr.py
Outdated
@@ -180,6 +182,9 @@ def get_metadata( | |||
) -> BareAsset: | |||
metadata = get_default_metadata(self.filepath, digest=digest) | |||
metadata.encodingFormat = ZARR_MIME_TYPE | |||
# TODO: .size obtained via get_default_metadata would be the one | |||
# from os.stat and thus not reflective of actual size of the zarr | |||
# folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, for Zarrs, the size is extracted from the digest:
Lines 993 to 998 in 4920831
if digest is not None and digest.algorithm is models.DigestType.dandi_zarr_checksum: | |
m = re.fullmatch( | |
r"(?P<hash>[0-9a-f]{32})-(?P<files>[0-9]+)--(?P<size>[0-9]+)", digest.value | |
) | |
if m: | |
metadata["contentSize"] = int(m["size"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, missed that, thanks! I will remove the TODO but might RF to move this zarr specific code into zarr specific place ;)
@yarikoptic I think the goal of |
Thanks! that aligns with my observation on behavior of metadata extraction on dandiset to use |
…rrors_to_validation_results Thanks @jwodder for the review
7d3a0d0
to
9890f00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked through the diff, nothing I can see wrong with it. However, the schema specification dichotomy wasn't my design decision, and I'm unsure why it's done this way, so with respect to that me not seeing anything wrong might be of limited value.
One thing which I do see, is a bit of extra steps for string/Path object conversion. We have a bunch of gotchas like that where some things take strings and some others Path objects. Probably not worth dealing with in this PR, just putting it out there. Maybe paths should always be strings, as they always are in GNU utils, and the objects should be created only if we ever need the advanced features like .relative_to()
which can't be easily done with os.path
.
I disagree. Paths should be |
I tend to agree with @jwodder that |
with above, let's proceed and I hope to push more work here soon. |
🚀 PR was released in |
Might still be incomplete since I left in place the odd dichotomy of results in validation depending on either schema_version is provided or not, as emphasized in the tests at https://github.com/dandi/dandi-cli/blob/HEAD/dandi/tests/test_files.py#L313 . @jwodder @TheChymera do you remember a reason why we did it this way?
meanwhile I made
LocalAsset
to always validate against schema even if aschema_version
is not provided and that might be going against some logic which is pointed to above. But I think we should simplify/harmonize it here one way or another: we might want to always validate against everything possible and then results just filtered, or would need to pass some filtering option (e.g. listing validationid
prefixes thus to enable only some) to select which validators to use. So if no objections/concerns would be stated, I would proceed to remove dichotomy in NWBAsset validation and always include super's validation (thus validation against schema).Then I fixed up validation of zarrs in that their metadata records should be populated with zarr specific dummy checksum (previously tests would not even get to that case I believe).