-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal]: Dump ReferenceFileSystem spec for ZarrTiffStore that can be read natively as zarr #56
Comments
I am aware of
|
Makes sense, thanks for the response. I think the multi-file issue could be accommodated by the spec, but agree the other "features" are incompatible. It's unrealistic to try to map all tiffs to zarr, but it would be useful to translate the reference store for the subset that can be mapped. e.g. from tifffile import imread
from lib_i_wish_existed import TiffStore2Zarr
with imread("data.tiff", aszarr=True) as store:
converter = TiffStore2Zarr(store, tiff_url)
ref = converter.translate() # raises exception if tiff can't be mapped directly to zarr
with open("data.tiff_offsets.json", "w") as f:
json.dump(ref, f) I'm just leaving the above for reference. Maybe there is a way that tifffile could consolidate certain "features" for the pages in a store to make this (in)compatibility with zarr easier to detect. Either way, the current zarr additions to tifffile have made it substantially easier explore this idea, so thanks a lot! |
@cgohlke, if I may add: fsspec-reference-maker is definitely experimental but having your input at this stage would be invaluable. From zarr-developers/zarr-python#556 (comment), if there are any spec changes that would help to support viable TIFF-edge cases, it'd be good to capture them. (And either way, I'm still excited by |
Also, adding additional numcodecs would be generally useful and, as far as I understand, not hard where existing python or C libs are available. |
Just to add to Martin's comment, Numcodecs ships both conda and wheel binary packages. So hopefully this makes it a bit easier to use downstream without needing to worry about compiling. We are also looking into making Numcodecs a pure Python package now that Blosc has wheels (in addition to conda) packages. |
I have started working on this issue but have some trouble testing. The following code runs on my system without raising an exception but the data returned by zarr seems random and the web server logs do not show any access to the .tif file. The file is an zlib compressed pyramidal OME-TIFF. I verified that the offsets and byte counts in the JSON file are correct. A manual range request using the requests library works. Any idea? Is there a way to test the ReferenceFileSystem on a local file system? import zarr # 2.6.1
import fsspec # 0.8.7
import tifffile # 2021.3.dev
localpath = ''
filename = 'test.ome.tif'
url = 'https://www.lfd.uci.edu/'
# create the reference file
with tifffile.imread(localpath + filename, aszarr=True) as store:
with open(localpath + filename + '.json', 'w') as fh:
store.write_fsspec(fh, url)
# open the reference file from web server
mapper = fsspec.get_mapper(
'reference://',
references=url + filename + '.json',
target_protocol='https',
)
zgrp = zarr.open(mapper, mode='r')
print(zgrp[9].info) # print info of last level
im = zgrp[9][:] # <- random data
|
I have so far looked at one key:
If I directly make a HTTPFileSystem and access this:
(if it were random data, no way zlib would happen to give that number of bytes out) I notice from the zarr info (taken from Are other chunks coming through correctly? For the other blocks I am seeing a lot of zeros. |
yes, by random I meant that I got different numbers every time.
That should be OK according to the TIFF specification. The chunk data in the file should be complete.
The OME-XML is written at the end.
No, I tried two other levels. As mentioned, the requests library works: from matplotlib import pyplot
import requests
import zlib
headers = {'Range': 'bytes=6276016176-6276049280'}
r = requests.get('https://www.lfd.uci.edu/test.ome.tif', headers=headers, stream=True)
data = b''.join(chunk for chunk in r.iter_content(1024))
d = zlib.decompress(data)
im = numpy.frombuffer(d, dtype='uint8').reshape(256, 256, 3)
pyplot.imshow(im)
pyplot.show() |
Apparently the key in the reference JSON file must be |
It works now. I can visualize the multiscales zarr Group created from the fsspec ReferenceFileSystem using napari. |
Is this correct, then? Indeed, there are thee dimensions, even if the last dimension only ever ha one chunk. zarr doesn't know about images having a colour dimension. The |
Thank you! Makes sense. I'm setting the fill value to None/null. That's why the chunks were uninitialized. |
I ran into another issue during testing of multi-series TIFF files. It seems not possible to use more than one FSMap instance (?). In the following example, the remote TIFF file is never accessed because of a silent import fsspec
import zarr
# map series 1
mapper1 = fsspec.get_mapper(
'reference://',
references='http://localhost:8080/test_zarr_fsspec.ome.tif.s1.json',
target_protocol='http',
)
# map series2
mapper2 = fsspec.get_mapper(
'reference://',
references='http://localhost:8080/test_zarr_fsspec.ome.tif.s2.json',
target_protocol='http',
)
za = zarr.open(mapper2, mode='r')
print(za.info)
print(za[:]) # <- zeroed data Output
{
".zattrs": "{}",
".zarray": "{\n \"chunks\": [\n 219,\n 301,\n 3\n ],\n \"compressor\": null,\n \"dtype\": \"|u1\",\n \"fill_value\": 0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n 219,\n 301,\n 3\n ],\n \"zarr_format\": 2\n}",
"0.0.0": ["http://localhost:8080/test_zarr_fsspec.ome.tif", 261136, 197757]
}
{
".zattrs": "{}",
".zarray": "{\n \"chunks\": [\n 1,\n 219,\n 301\n ],\n \"compressor\": null,\n \"dtype\": \"|u1\",\n \"fill_value\": 0,\n \"filters\": null,\n \"order\": \"C\",\n \"shape\": [\n 3,\n 219,\n 301\n ],\n \"zarr_format\": 2\n}",
"0.0.0": ["http://localhost:8080/test_zarr_fsspec.ome.tif", 459136, 65919],
"1.0.0": ["http://localhost:8080/test_zarr_fsspec.ome.tif", 525055, 65919],
"2.0.0": ["http://localhost:8080/test_zarr_fsspec.ome.tif", 590974, 65919]
} Traceback from re-raising the RuntimeError in File "test_issue56.py", line 16, in <module>
print(za[:])
File "X:\Python38\lib\site-packages\zarr\core.py", line 571, in __getitem__
return self.get_basic_selection(selection, fields=fields)
File "X:\Python38\lib\site-packages\zarr\core.py", line 696, in get_basic_selection
return self._get_basic_selection_nd(selection=selection, out=out,
File "X:\Python38\lib\site-packages\zarr\core.py", line 739, in _get_basic_selection_nd
return self._get_selection(indexer=indexer, out=out, fields=fields)
File "X:\Python38\lib\site-packages\zarr\core.py", line 1034, in _get_selection
self._chunk_getitems(lchunk_coords, lchunk_selection, out, lout_selection,
File "X:\Python38\lib\site-packages\zarr\core.py", line 1691, in _chunk_getitems
cdatas = self.chunk_store.getitems(ckeys, on_error="omit")
File "X:\Python38\lib\site-packages\fsspec\mapping.py", line 91, in getitems
raise out['0.0.0'] # re-raise RuntimeError
File "X:\Python38\lib\site-packages\fsspec\implementations\reference.py", line 90, in _cat_file
return await self.fs._cat_file(url, start=start, end=end)
File "X:\Python38\lib\site-packages\fsspec\implementations\http.py", line 168, in _cat_file
async with self.session.get(url, **kw) as r:
File "X:\Python38\lib\site-packages\aiohttp\client.py", line 1117, in __aenter__
self._resp = await self._coro
File "X:\Python38\lib\site-packages\aiohttp\client.py", line 448, in _request
with timer:
File "X:\Python38\lib\site-packages\aiohttp\helpers.py", line 635, in __enter__
raise RuntimeError(
RuntimeError: Timeout context manager should be used inside a task |
That exception is a new one for me, and doesn't make much sense to me... I have been trying to simplify the async handling in fsspec, would you mind trying with the fsspec/filesystem_spec#572 version of fsspec ( git+https://github.com/martindurant/filesystem_spec.git@ioloop_massage2 )? |
fsspec/filesystem_spec#572 fixes the issue for me. The tests pass now. Thank you very much! |
PS: I don't know if you have been following fsspec/kerchunk#17 , which establishes a more formal spec for the content of the references JSON file, with some features to make that file more compact. The ReferenceFileSystem implementation ( PR ) will be backwards compatible. |
Yes, I've seen version 1 of the specification. Using a template for the URL will make the file more compact. But for now I'm going to release tifffile with experimental version 0 support. |
Tifffile-2021.3.16 adds a store method ( with tifffile.imread(tiff_filename, aszarr=True) as store:
store.write_fsspec(tiff_filename + '.json', url)
A
The JSON files can get quite large. One of the local WSI test files contains over 23 million tiles and the JSON file is larger than 1.5 GB. |
@cgohlke Thanks for the release! I tried out the CLI for a couple of images and it worked well. One issue is that I don't think endianness in the Interactive notebook: https://observablehq.com/d/16524d8e7fd4f9ef I have shared the reference in a gist. I think this is likely due to Lines 8155 to 8161 in b69ddd4
|
I would love to see this functionality in a blog article somewhere |
You are right. Fixed in v2021.3.17. |
Thank you so much for your work on this project. I just came across experimental
aszarr
andZarrTiffStore
and am so excited! I'd written some one-off stores wrapping tifffile to read different pyramidal images as zarr (for napari), but having this in tifffile is incredible!I'm curious if you've seen the proposed JSON specification for describing a
ReferenceFileSystem
? Asking naively, and a bit selfishly, would it be possible to detect whether aZarrTiffStore
can be natively read by zarr and "export" one of these references?I work on web-based tools for visualizing OME-TIFF / Zarr data, and it would be really useful to quickly create these references.
Here is an example viewing a multiscale tiff on the web using zarr.js, and this is the python script I wrote with the newest version of
tifffile
to generate the reference. I wonder if there is some way to generalize this script, but don't have the familiarly with the underlying formats to know if this is a silly idea.I notice that the
TiffZarrStore
handles all compression, so I know at least you need to detect whether the chunk compression is supported in Zarr.The text was updated successfully, but these errors were encountered: