-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide zarrs/
top level folder with support for experimental zarr manifests
#43
Comments
@yarikoptic To be clear, you want For the record, I don't like that the manifests declare the entry fields via a {
"entries": {
".zattrs": {
"type": "entry", // or "file" or "blob" or whatever
"versionId": "...",
"lastModified": "...",
"size": 123456789,
"etag": "..."
},
"0": {
"type": "folder",
"children": {
".zarray": {
...
}
}
}
}
} |
eventually -- yes, but later, whenever archive is ready and produces manifests on S3. For now -- it is more of a "demo feature" so we could demonstrate versioning etc. Re path -- since we already have thousands of zarrs, listing re alternative schema of the manifest -- I am afraid it would result in significant growth of size and load/parsing time of such a manifest. I guess we could have compressed them e.g. with efficient xz to facilitate speedy transfer but then it would also be just slower to parse. That is why I made it into such a "compressed" form to facilitate transfer and parsing/load. IMHO it should be easy if not trivial (in Rust or Python) to cast records into any other alternative records/schema at run/load time, but thus avoiding more expensive parsing etc. FWIW, if would make it easier, we could "hardcode" the |
|
I shared on github just for initial convenience... and yes -- we will get many more after collecting them is done. At the end I expect us to share it on S3. To completely avoid github and its limits, re-sharing it now under https://datasets.datalad.org/?dir=/dandi/zarr-manifests so you can just "browse" https://datasets.datalad.org/dandi/zarr-manifests/zarr-manifests-v2-sorted/ . I will also now sort more (around 2k ready) of manifests and push them shortly there too.
I guess it is for us to decide but I do think that it would be useful to have some cache, e.g. |
FWIW -- I have pushed a few thousands now to https://datasets.datalad.org/?dir=/dandi/zarr-manifests/zarr-manifests-v2-sorted |
|
|
done:
|
FWIW, here is the wsgi script producing that json https://github.com/dandi/zarr-manifests/blob/master/myapp.wsgi happen you want to change format or see security concern to address etc. |
@yarikoptic Trying to access one of your Zarr manifests at, e.g., https://datasets.datalad.org/dandi/zarr-manifests/zarr-manifests-v2-sorted/741/632/741632e4-18ac-437e-945c-a318c3d46483/c74b10dca1bbcce66db179c1ca8f76da-72142--112269290369.json, currently returns 404. I believe the problem is that the WSGI script doesn't distinguish between paths to directories and paths to manifest files. Also, for the record, https://datasets.datalad.org/?dir=/dandi/zarr-manifests currently shows "ERROR: Could not find metadata for current dataset." |
@yarikoptic I think dandi/zarr-manifests#1 should fix the 404 issue. |
Serve Zarrs view via manifests at `/zarrs/`
[Tue Feb 06 13:57:09.972838 2024] [wsgi:error] [pid 3001752:tid 140299577837248] [client 71.198.184.218:56874] Traceback (most recent call last):, referer: dandi/dandidav#43 [Tue Feb 06 13:57:09.973156 2024] [wsgi:error] [pid 3001752:tid 140299577837248] [client 71.198.184.218:56874] File "/srv/datasets.datalad.org/www/dandi/zarr-manifests/myapp.wsgi", line 35, in <lambda>, referer: dandi/dandidav#43 [Tue Feb 06 13:57:09.973186 2024] [wsgi:error] [pid 3001752:tid 140299577837248] [client 71.198.184.218:56874] return iter(lambda: fp.read(65535), b""), referer: dandi/dandidav#43 [Tue Feb 06 13:57:09.973224 2024] [wsgi:error] [pid 3001752:tid 140299577837248] [client 71.198.184.218:56874] ^^^^^^^^^^^^^^, referer: dandi/dandidav#43 [Tue Feb 06 13:57:09.973295 2024] [wsgi:error] [pid 3001752:tid 140299577837248] [client 71.198.184.218:56874] ValueError: read of closed file, referer: dandi/dandidav#43
https://github.com/dandi/zarr-manifests has my very crude helper scripts and 2 zarr manifests samples (if desired - can share more -- they are on typhon being generated). ATM we do not have multiple versions per zarr, only 1, but that should be ok for a proof of concept. The point is to show/test that we can efficiently access those zarrs
https://dandiarchive.s3.amazonaws.com/zarr/{zarr_id}
The text was updated successfully, but these errors were encountered: