BF+RF validation #1209

yarikoptic · 2023-02-08T18:09:48Z

Might still be incomplete since I left in place the odd dichotomy of results in validation depending on either schema_version is provided or not, as emphasized in the tests at https://github.com/dandi/dandi-cli/blob/HEAD/dandi/tests/test_files.py#L313 . @jwodder @TheChymera do you remember a reason why we did it this way?

meanwhile I made LocalAsset to always validate against schema even if a schema_version is not provided and that might be going against some logic which is pointed to above. But I think we should simplify/harmonize it here one way or another: we might want to always validate against everything possible and then results just filtered, or would need to pass some filtering option (e.g. listing validation id prefixes thus to enable only some) to select which validators to use. So if no objections/concerns would be stated, I would proceed to remove dichotomy in NWBAsset validation and always include super's validation (thus validation against schema).

Then I fixed up validation of zarrs in that their metadata records should be populated with zarr specific dummy checksum (previously tests would not even get to that case I believe).

…if no schema version provided Otherwise (if no schema provided) -- we were completely skipping validation, and that is IMHO is not right. Such change already breakes some tests, e.g. dandi/tests/test_files.py::test_validate_deep_zarr >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB set_trace >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > /home/yoh/proj/dandi/dandi-cli-master/dandi/files/bases.py(183)get_validation_errors() -> if devel_debug: (Pdb) l 178 try: 179 asset = self.get_metadata(digest=DUMMY_DIGEST) 180 BareAsset(**asset.dict()) 181 except ValidationError as e: 182 import pdb; pdb.set_trace() 183 -> if devel_debug: 184 raise 185 # TODO: how do we get **all** errors from validation - there must be a way 186 return [ 187 ValidationResult( 188 origin=ValidationOrigin( (Pdb) p e ValidationError(model='BareAsset', errors=[{'loc': ('digest',), 'msg': 'A zarr asset must have a zarr checksum.', 'type': 'value_error'}])

we have logic quite duplicated in a number of places and similar exception handling was already RFed in DandisetMetadataFile but was left "old" in the LocalAsset .

…WBAsset validation behavior

…per usage, fixed type annotation Also getting proper `msg` field, not `message` from the dict although allowing for both since why not.

We can easily have too deep and some other schema based errors at the same time.

@jwodder

… Path among types Note: for some reason our type testing was not triggerring error but pycharm did highlight this issue for me. @jwodder might know more.

codecov · 2023-02-08T18:16:01Z

Codecov Report

Base: 89.19% // Head: 89.24% // Increases project coverage by +0.04% 🎉

Coverage data is based on head (9890f00) compared to base (4ccbe12).
Patch coverage: 59.45% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1209      +/-   ##
==========================================
+ Coverage   89.19%   89.24%   +0.04%     
==========================================
  Files          76       76              
  Lines        9487     9492       +5     
==========================================
+ Hits         8462     8471       +9     
+ Misses       1025     1021       -4

Flag	Coverage Δ
unittests	`89.24% <59.45%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
dandi/files/bases.py	`79.07% <52.00%> (+0.35%)`	⬆️
dandi/files/zarr.py	`87.95% <66.66%> (-0.22%)`	⬇️
dandi/misctypes.py	`62.96% <100.00%> (+0.34%)`	⬆️
dandi/pynwb_utils.py	`82.68% <100.00%> (ø)`
dandi/metadata.py	`87.79% <0.00%> (+0.23%)`	⬆️
dandi/dandiapi.py	`87.67% <0.00%> (+0.44%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jwodder · 2023-02-08T19:37:25Z

dandi/files/bases.py

@@ -743,12 +739,15 @@ def _get_nwb_inspector_version():


 def _pydantic_errors_to_validation_results(
-    errors: Any[list[dict], Exception],
+    errors: list[dict | Exception] | ValidationError,
    file_path: str,


Since the only use of file_path in this function converts it to a Path, it would be better to make this argument a Path and remove the str() call when passing it.

indeed! thanks, done in b6f8180

jwodder · 2023-02-08T19:38:17Z

dandi/files/bases.py

@@ -488,6 +476,12 @@ def get_validation_errors(
        schema_version: Optional[str] = None,
        devel_debug: bool = False,
    ) -> list[ValidationResult]:
+        """Validate NWB asset
+
+        If `schema_version` was provided, we only validate basic metadata,


Suggested change

If `schema_version` was provided, we only validate basic metadata,

If ``schema_version`` was provided, we only validate basic metadata,

One backtick for things that can be linked to (like classes and functions), two backticks for other code.

yarikoptic · 2023-02-08T20:59:14Z

Thanks @jwodder for the review. Before I embark on further RF journey -- do you remember/have an idea why we had that different behavior depending on schema_version I mentioned/changed here for the LocalAsset?

jwodder · 2023-02-08T20:59:32Z

dandi/files/zarr.py

@@ -180,6 +182,9 @@ def get_metadata(
    ) -> BareAsset:
        metadata = get_default_metadata(self.filepath, digest=digest)
        metadata.encodingFormat = ZARR_MIME_TYPE
+        # TODO: .size obtained via get_default_metadata would be the one
+        # from os.stat and thus not reflective of actual size of the zarr
+        # folder.


Actually, for Zarrs, the size is extracted from the digest:

dandi-cli/dandi/metadata.py

Lines 993 to 998 in 4920831

if digest is not None and digest.algorithm is models.DigestType.dandi_zarr_checksum:

m = re.fullmatch(

r"(?P<hash>[0-9a-f]{32})-(?P<files>[0-9]+)--(?P<size>[0-9]+)", digest.value

)

if m:

metadata["contentSize"] = int(m["size"])

oh, missed that, thanks! I will remove the TODO but might RF to move this zarr specific code into zarr specific place ;)

jwodder · 2023-02-08T21:01:43Z

@yarikoptic I think the goal of schema_version was to use the old, "flat" metadata if no schema_version was set, and to use the dandischema classes if it was set. It was added during the transition to the dandischema classes.

yarikoptic · 2023-02-08T21:34:04Z

Thanks! that aligns with my observation on behavior of metadata extraction on dandiset to use _check_required_fields... I think I will then continue RFing in the follow up PR to make it always work with dandischema classes, and will merge this one meanwhile unless you would recommend to grow this single PR @jwodder .

@jwodder

…rrors_to_validation_results Thanks @jwodder for the review

TheChymera

Looked through the diff, nothing I can see wrong with it. However, the schema specification dichotomy wasn't my design decision, and I'm unsure why it's done this way, so with respect to that me not seeing anything wrong might be of limited value.

One thing which I do see, is a bit of extra steps for string/Path object conversion. We have a bunch of gotchas like that where some things take strings and some others Path objects. Probably not worth dealing with in this PR, just putting it out there. Maybe paths should always be strings, as they always are in GNU utils, and the objects should be created only if we ever need the advanced features like .relative_to() which can't be easily done with os.path.

jwodder · 2023-02-09T13:55:50Z

@TheChymera

Maybe paths should always be strings

I disagree. Paths should be Paths, in order to make it clear that they're paths and in order to take advantage of the simpler & more powerful pathlib API.

yarikoptic · 2023-02-09T18:38:33Z

I tend to agree with @jwodder that Path is more versatile and easier to use construct, and indeed we can then benefit from type annotation. I wish even if there was convention to encode "RelativePath" and ensuring that proper conversion is done where needed etc.

yarikoptic · 2023-02-09T18:39:03Z

with above, let's proceed and I hope to push more work here soon.

github-actions · 2023-02-10T21:46:16Z

🚀 PR was released in 0.49.0 🚀

yarikoptic added 7 commits February 8, 2023 11:48

RF/BF: use _pydantic_errors_to_validation_results in LocalAsset

0ff8fca

we have logic quite duplicated in a number of places and similar exception handling was already RFed in DandisetMetadataFile but was left "old" in the LocalAsset .

DOC: just a docstring describing specific (probably to be fixed) of N…

9e35b08

…WBAsset validation behavior

BF: custom to zarr dummy checksum, proper scoping of pydantic errors …

848dfeb

…per usage, fixed type annotation Also getting proper `msg` field, not `message` from the dict although allowing for both since why not.

RF: accumulate errors for Zarr, do not just return right away

038109a

We can easily have too deep and some other schema based errors at the same time.

BF(typing): cast filepath to str explicitly since zarr.open does list…

aba76a5

… Path among types Note: for some reason our type testing was not triggerring error but pycharm did highlight this issue for me. @jwodder might know more.

stylistic minor

02ba9c5

yarikoptic added minor Increment the minor version when merged cmd-validate labels Feb 8, 2023

yarikoptic requested review from jwodder and TheChymera February 8, 2023 18:09

minor: remove stale implemented TODO

95fe3ef

jwodder requested changes Feb 8, 2023

View reviewed changes

jwodder reviewed Feb 8, 2023

View reviewed changes

yarikoptic added 3 commits February 8, 2023 16:36

RF: remove dance of str/Path while passing file_path into _pydantic_e…

2c8d1d0

…rrors_to_validation_results Thanks @jwodder for the review

Fix ReSt annotation of "verbatim" name

5c5d2a1

Give it a better dummy zarr checksum, aligned with dummy ETAG

9890f00

yarikoptic force-pushed the bfrf-validation branch from 7d3a0d0 to 9890f00 Compare February 8, 2023 21:36

TheChymera approved these changes Feb 9, 2023

View reviewed changes

yarikoptic merged commit 662f8ab into master Feb 9, 2023

yarikoptic deleted the bfrf-validation branch February 9, 2023 18:39

github-actions bot added the released label Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BF+RF validation #1209

BF+RF validation #1209

yarikoptic commented Feb 8, 2023

codecov bot commented Feb 8, 2023 •

edited

Loading

jwodder Feb 8, 2023

yarikoptic Feb 8, 2023

jwodder Feb 8, 2023

yarikoptic commented Feb 8, 2023

jwodder Feb 8, 2023

yarikoptic Feb 8, 2023

jwodder commented Feb 8, 2023

yarikoptic commented Feb 8, 2023

TheChymera left a comment

jwodder commented Feb 9, 2023

yarikoptic commented Feb 9, 2023

yarikoptic commented Feb 9, 2023

github-actions bot commented Feb 10, 2023

	If `schema_version` was provided, we only validate basic metadata,
	If ``schema_version`` was provided, we only validate basic metadata,

	if digest is not None and digest.algorithm is models.DigestType.dandi_zarr_checksum:
	m = re.fullmatch(
	r"(?P<hash>[0-9a-f]{32})-(?P<files>[0-9]+)--(?P<size>[0-9]+)", digest.value
	)
	if m:
	metadata["contentSize"] = int(m["size"])

BF+RF validation #1209

BF+RF validation #1209

Conversation

yarikoptic commented Feb 8, 2023

codecov bot commented Feb 8, 2023 • edited Loading

Codecov Report

jwodder Feb 8, 2023

Choose a reason for hiding this comment

yarikoptic Feb 8, 2023

Choose a reason for hiding this comment

jwodder Feb 8, 2023

Choose a reason for hiding this comment

yarikoptic commented Feb 8, 2023

jwodder Feb 8, 2023

Choose a reason for hiding this comment

yarikoptic Feb 8, 2023

Choose a reason for hiding this comment

jwodder commented Feb 8, 2023

yarikoptic commented Feb 8, 2023

TheChymera left a comment

Choose a reason for hiding this comment

jwodder commented Feb 9, 2023

yarikoptic commented Feb 9, 2023

yarikoptic commented Feb 9, 2023

github-actions bot commented Feb 10, 2023

codecov bot commented Feb 8, 2023 •

edited

Loading