Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken zarr? #14

Open
yarikoptic opened this issue Feb 20, 2024 · 9 comments
Open

broken zarr? #14

yarikoptic opened this issue Feb 20, 2024 · 9 comments

Comments

@yarikoptic
Copy link

https://kitware.github.io/itk-vtk-viewer/app/?gradientOpacity=0.3&image=https://dandiarchive.s3.amazonaws.com/zarr/0d5b9be5-e626-4f6a-96da-b6b602954899 didn't show anything while browser console logged:

image

that is for the sub-U01hm15x_ses-20220731h17m24s47_sample-mEhmAD031x15R2_ABETA_stain-ABETA_run-1_chunk-1_SPIM.ome.zarr

from
https://dandiarchive.org/dandiset/000108/draft/files?location=sub-U01hm15x%2Fses-20220731h17m24s47%2Fmicr%2F

I chose it "randomly" but looking at "smallest 'real' zarrs" in the archive:

~/proj/dandi/zarr-manifests
❯ ls -d zarr-manifests-v2-sorted/*/*/*/*json | sort -n -k 10 -t- | tail
zarr-manifests-v2-sorted/022/d78/022d7840-f696-4455-b67a-759dd6ae6341/24f6f0a556618fae6413051711d534cc-73477--41149245.versionid.json@
zarr-manifests-v2-sorted/924/410/924410ea-f204-4559-930c-d77677b82a92/a38e33822837186fabe465cb3ec956f2-68859--38584194.json@
zarr-manifests-v2-sorted/fd6/ab3/fd6ab3ea-cff6-4006-a9bf-acfa5d983985/af1b4b5849cd2b79a29762d716863379-37480--38379520.json@
zarr-manifests-v2-sorted/e6d/ea1/e6dea1d0-37c0-482e-89a0-8e245d2a3f42/6d56fe20d0c69a0f99cfe7dd643ac421-52731--28709020.json@
zarr-manifests-v2-sorted/0d5/b9b/0d5b9be5-e626-4f6a-96da-b6b602954899/0395d0a3767524377b58da3945b3c063-48379--27115470.json@
zarr-manifests-v2-sorted/0d5/b9b/0d5b9be5-e626-4f6a-96da-b6b602954899/0395d0a3767524377b58da3945b3c063-48379--27115470.versionid.json@
zarr-manifests-v2-sorted/9d8/c50/9d8c50c9-ced7-4a6c-839e-2ae745509eed/91b3a4a288022ee92a5d965498790160-37115--20807594.json@
zarr-manifests-v2-sorted/46c/828/46c828f3-84d0-410f-941b-a13ab460dfd4/89361499590e3efa79ce26e049f63bab-4--9223.json@
zarr-manifests-v2-sorted/46c/828/46c828f3-84d0-410f-941b-a13ab460dfd4/89361499590e3efa79ce26e049f63bab-4--9223.versionid.json@
zarr-manifests-v2-sorted/d67/c12/d67c1246-1565-44e8-a299-1289139b8890/3e105d25c18895df96d616bb7dd8b5ef-1--24.json@
@yarikoptic
Copy link
Author

same for https://kitware.github.io/itk-vtk-viewer/app/?gradientOpacity=0.3&image=https://dandiarchive.s3.amazonaws.com/zarr/924410ea-f204-4559-930c-d77677b82a92 for sub-MITU01/ses-20220321h14m52s06/micr/sub-MITU01_ses-20220321h14m52s06_sample-23_stain-NN_run-1_chunk-1_SPIM.ome.zarr
... I guess there is a good number of broken ones. Do you have them locally somewhere to compare based on checksums across all zarrs?

@yarikoptic
Copy link
Author

could also look differently, e.g. on sub-MITU01h3/ses-20211219h13m23s45/micr/sub-MITU01h3_ses-20211219h13m23s45_sample-16_stain-YO_run-1_chunk-1_SPIM.ome.zarr https://kitware.github.io/itk-vtk-viewer/app/?gradientOpacity=0.3&image=https://dandiarchive.s3.amazonaws.com/zarr/022d7840-f696-4455-b67a-759dd6ae6341

image

@satra
Copy link

satra commented Feb 21, 2024

@yarikoptic - the only way to check in dandi is a match between cli and server side checks. we should upload the zarr checksum with the zarr and for the server side checksum should check if they match, and mark it invalid if it doesn't. zarr allows not having to store every cell of an array. for example, it will simply treat as 0 any chunk that's missing. the browser 404s are not always an issue as far as i know.

let's ping @slaytonmarx. at least he should be able to verify checksum locally. is there a list you can provide of paths and zarr checksums that you thought were broken?

@yarikoptic
Copy link
Author

Zarrs might have been broken to start with (incomplete/interrupted generation etc) where checksum checking wouldn't help. I do not have a complete list but later will post a list of some worth checking.

@satra
Copy link

satra commented Feb 21, 2024

indeed, but that's a local check to verify the relationship between whatever data format the conversions were made from. and i leave that in the hands of the data producers to check. once given to our toolkits, checksum should verify that.

in addition for dandiset 108, we did put in a secondary check, although a basic one, that checks for errors across the pyramid. this is viewable in the dashboard. we will be doing more work in an upcoming bican project, but indeed this dandiset should be checked.

@satra
Copy link

satra commented Feb 21, 2024

@yarikoptic
Copy link
Author

that checks for errors across the pyramid

could you point to the code? is it fast/sensible to do on remote (already on s3) zarrs?

I see no errors reported in the dashboard -- all green in "errors", so I guess it is not effective either at detecting errors at hand?

@satra
Copy link

satra commented Feb 21, 2024

the code is in the example notebooks of the 000108. it again depends on what you mean by errors. missing chunks is not necessarily an error. and yes, the analysis can be run remotely. and it's not all green! and yes, this is a simple check.

@yarikoptic
Copy link
Author

I just got lost in my navigation of it and thought that it was the last one which only depicted the errors and it is green . Indeed some intermediate ones have errors e.g.
image

  • so then are those the first candidates to be checked locally?
  • do we (or anyone else so we do not reinvent the wheel) have a "list of errors" or to say "validations" to perform on zarrs?
  • when is missing chunk an error and when not?

on my end I saw lots of KeyError: '.zmetadata' for zarr.open_consolidated

as e.g. in this example
Traceback (most recent call last):
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/mapping.py", line 151, in __getitem__
    result = self.fs.cat(k)
             ^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/cached.py", line 436, in <lambda>
    return lambda *args, **kw: getattr(type(self), item).__get__(self)(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/cached.py", line 638, in cat
    self.fs.get(getpaths, storepaths)
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 650, in _get
    return await _run_coros_in_chunks(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 254, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/callbacks.py", line 81, in func
    return await fn(path1, path2, callback=child, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/http.py", line 250, in _get_file
    self._raise_not_found_for_status(r, rpath)
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/http.py", line 215, in _raise_not_found_for_status
    raise FileNotFoundError(url)
FileNotFoundError: https://dandiarchive.s3.amazonaws.com/zarr/00291b95-6495-499c-9181-8224ce82ab3f/.zmetadata

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/storage.py", line 1425, in __getitem__
    return self.map[key]
           ~~~~~~~~^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/mapping.py", line 155, in __getitem__
    raise KeyError(key)
KeyError: '.zmetadata'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yoh/proj/dandi/zarr-manifests/./validate_zarr.py", line 98, in <module>
    dataset = zarr.open_consolidated(store=store)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/convenience.py", line 1342, in open_consolidated
    meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/storage.py", line 2944, in __init__
    meta = json_loads(self.store[metadata_key])
                      ~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/storage.py", line 1427, in __getitem__
    raise KeyError(key) from e
KeyError: '.zmetadata'
url: https://dandiarchive.s3.amazonaws.com/zarr/00291b95-6495-499c-9181-8224ce82ab3f #errors: failed-to-open time: -100.00 seconds  num_files: N/A  total_size est: N/A  time: -100.00

and of those kind we have 784 hits out of 4554. (note that the zarrs here are as present on S3, so not necessarily even associated with any asset or specifically with this dandiset)

i think the best would be to look into those few I originally reported about

  • are they in the same broken state locally?
    • if yes -- verify/fix processing/saving pipelines
    • if no -- investigate their upload history (possibly looking through .log files still present): did they finish uploading correctly) and go from there

@slaytonmarx also already confirmed that there is validation issues picked up by validating against ngff schema:

and that is the errors we (@waxlamp also mentioned them IIRC) observe using ome-ngff validator online which we integrated with the archive, e.g. on

https://ome.github.io/ome-ngff-validator/?source=https://dandiarchive.s3.amazonaws.com/zarr/56509720-870c-4f43-ae41-7b75f9590722

image

which suggests that 000108 zarrs are not "clean" .ome.zarr and need fixing. Hopefully it could be a minor fix so we do not full zarr reuploads and see if we can now benefit from the "multifile" nature of zarrs.

Meanwhile I think we should push on adding more validation to dandi-cli for zarr files, so I will follow up on dandi/dandi-cli#1281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants