Figure out multiband writes #44

jessjaco · 2024-01-05T03:40:04Z

Since the popular approach is to only write singleband tiffs, datasets with the same source data end up running the dask graph for each band. Threaded writes don't resolve this. Loading into memory before calling the write function works but is wasteful (since writing the corresponding multiband tiff to memory compresses the data). One solution might be to write to memory, then parse the bands. Or, look in the stac guidance for a possible way to write multibands

jessjaco · 2024-01-05T03:43:42Z

See gjoseph92/stackstac#62

jessjaco · 2024-01-05T18:10:16Z

From that link,

You can write multiband stac items but
Neither stackstac nor odc.stac.load (I tested it) can read them

One workaround is to create a vrt for each band, but there is a very good point that GeoTiff's are pixel interleaved by default, though they can be interleaved by band. However, while the gdal GeoTiff driver supports band interleaving, the gdal COG driver doesn't. This is even more confusing as the COG standard supports BSQ writing. So ultimately, while the VRT approach may work, there are performance considerations on read (as an aside, this may be why some operations on the multiband tide data are so memory intensive). (Though also consider writing a COG using the GeoTiff driver as we used to do).

jessjaco · 2024-01-08T20:09:32Z

I think the simplest (probably) workable solution is to offer the option to load a dataset right before write - when the values have been (in most cases) scaled to their minimal representation. Not dissimilar to what Alex was doing in the PR I refused last week. These shouldn't be that large for the grid size we're dealing with. The only frustration is then we will have two versions in memory at one (one uncompressed as an xarray, and one compressed as a blob).

This precludes us from the ultimate goal of never having a whole dataset in memory at once, but that hasn't yet been possible (unless we use the dask writer to s3 from odc, which we haven't)

alexgleith · 2024-02-02T23:05:46Z

unless we use the dask writer to s3 from odc

I have some hesitations about that writer. It's not using GDAL at all, and I worry about the maintenance of it. I also had an issue when trying to use it, but that might have just been my environment.

the simplest (probably) workable solution is to offer the option to load a dataset right before write

I've been having errors when not loading data into memory before writing, possibly only with big dask graphs. Doing the load before writing has proven reliable.

jessjaco · 2024-02-02T23:54:41Z

unless we use the dask writer to s3 from odc

I have some hesitations about that writer. It's not using GDAL at all, and I worry about the maintenance of it. I also had an issue when trying to use it, but that might have just been my environment.

the simplest (probably) workable solution is to offer the option to load a dataset right before write

I've been having errors when not loading data into memory before writing, possibly only with big dask graphs. Doing the load before writing has proven reliable.

I haven't had errors, but if the bands are written to separate files, it will load common source bands (like qa_pixel) multiple times. My guess is this is part of the issue you were experiencing that made you implement multithreaded writes

jessjaco mentioned this issue Feb 2, 2024

Add load-before-write option #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out multiband writes #44

Figure out multiband writes #44

jessjaco commented Jan 5, 2024

jessjaco commented Jan 5, 2024

jessjaco commented Jan 5, 2024

jessjaco commented Jan 8, 2024 •

edited

Loading

alexgleith commented Feb 2, 2024

jessjaco commented Feb 2, 2024

Figure out multiband writes #44

Figure out multiband writes #44

Comments

jessjaco commented Jan 5, 2024

jessjaco commented Jan 5, 2024

jessjaco commented Jan 5, 2024

jessjaco commented Jan 8, 2024 • edited Loading

alexgleith commented Feb 2, 2024

jessjaco commented Feb 2, 2024

jessjaco commented Jan 8, 2024 •

edited

Loading