-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out multiband writes #44
Comments
From that link,
One workaround is to create a vrt for each band, but there is a very good point that GeoTiff's are pixel interleaved by default, though they can be interleaved by band. However, while the gdal GeoTiff driver supports band interleaving, the gdal COG driver doesn't. This is even more confusing as the COG standard supports BSQ writing. So ultimately, while the VRT approach may work, there are performance considerations on read (as an aside, this may be why some operations on the multiband tide data are so memory intensive). (Though also consider writing a COG using the GeoTiff driver as we used to do). |
I think the simplest (probably) workable solution is to offer the option to load a dataset right before write - when the values have been (in most cases) scaled to their minimal representation. Not dissimilar to what Alex was doing in the PR I refused last week. These shouldn't be that large for the grid size we're dealing with. The only frustration is then we will have two versions in memory at one (one uncompressed as an xarray, and one compressed as a blob). This precludes us from the ultimate goal of never having a whole dataset in memory at once, but that hasn't yet been possible (unless we use the dask writer to s3 from odc, which we haven't) |
I have some hesitations about that writer. It's not using GDAL at all, and I worry about the maintenance of it. I also had an issue when trying to use it, but that might have just been my environment.
I've been having errors when not loading data into memory before writing, possibly only with big dask graphs. Doing the load before writing has proven reliable. |
I haven't had errors, but if the bands are written to separate files, it will load common source bands (like qa_pixel) multiple times. My guess is this is part of the issue you were experiencing that made you implement multithreaded writes |
Since the popular approach is to only write singleband tiffs, datasets with the same source data end up running the dask graph for each band. Threaded writes don't resolve this. Loading into memory before calling the write function works but is wasteful (since writing the corresponding multiband tiff to memory compresses the data). One solution might be to write to memory, then parse the bands. Or, look in the stac guidance for a possible way to write multibands
The text was updated successfully, but these errors were encountered: