Options for handling offsets in Sentinel-2 COGs #26
-
E84 is looking for feedback on how to properly handle the offsets in the Sentinel-2 data JP2 when they are converted to COGs for all new Sentinel-2 data with a processing baseline of 04.00 or higher. This change introduced an offset that must be applied to the data and is explained here. Currently for the Sentinel-2 COGs, since Jan 2022, we apply the offset to the data. The offset is specified in the metadata, but currently has always been -0.1. The scale is 0.0001.
Going forward there are 4 options, ordered from worst to best, in my opinion. 1 - Leave the data as isThe COGs keep the same values as the JP2 files leaving the user responsible responsible for applying the offset. A serious issue with this is that when calculating simple band indices the scale of 0.0001 cancels out, e.g.,
This is how a great number of notebooks and code out in the wild do this. If the offset is not applied the NDVI calculation becomes:
which would break a large number of implementations. I would not recommend this. 2 - Apply offset as per the S2 Technical GuideThe recommended way for applying the offset is provided by ESA, however this as serious consequences. With this method, any pixels 0 or less after applying the offset become equivalent to NODATA. While this seems reasonable at first because the pixel is invalid, the absence of collecting any data is not the same as measuring the surface and getting no meaningful signal. These are dark regions where the atmosphere has been overcompensated for. If they become 0 that tells the user there is no data rather than it being a dark object. This causes issues when visualizing as these dark regions would become transparent. For analysis this can cause gaps where there should not be. I would not recommend this approach. 3 - Apply offset, keep nodata=0The current method used for the Sentinel-2 COGs maintains NODATA=0, but any values <-0 after applying the offset are set to a value of 1, which corresponds to a reflectance of 0.0001. The original nodata value locations are preserved. This indicates that data was there, but with a small value well below the noise level of the reflectance measurement. This method would have virtually zero impact on any analysis of the data, however it is a change in the values, as is setting all negative data values to 1. It has the benefit of using the very common nodata value of 0. I find this to be an acceptable change. 4 - Apply offset, set nodata=65535An alternative to the above is to change the nodata value to 65535, the max value for uint16. This has no chance to conflict with any valid data values which should not be higher than 11000 (reflectance ranges from 0-1.0). Then when the offset is applied is it clamped to 0, rather than 1. |
Beta Was this translation helpful? Give feedback.
Replies: 13 comments
-
I would like to propose an Option 5 that may be consistent with some of the other missions. Option 5: Change datatype from UInt16 to Int16, apply the offset, set no data 32767/ -32768 and keep the scale at 0.0001. This way negative values will also be retained and users would not have to worry about applying the offset themselves. If this is not possible, my preference would be option 1: a. By modifying the source data range, we may be precluding uses / applications for which the changes to scaling were introduced in the first place. |
Beta Was this translation helpful? Give feedback.
-
I forgot to hit enter, and in the mean time, @piyushrpt put something very close to my thought: to change as little data as possible.
Option 4 is the closest to this, and supports the |
Beta Was this translation helpful? Give feedback.
-
I agree with this. My opinion is that data "re-freers" ("re-sellers" but we we don't sell it) shouldn't try to "fix" data. If it comes with warts, we document and help people with the warts, but we don't remove them. Corollary is that correcting actual errors/blunders is fine. This isn't an error, it's just a design choice that is a little awkward. |
Beta Was this translation helpful? Give feedback.
-
Thanks @piyushrpt, I like that option a lot. I disagree a bit with the idea that we shouldn't try to fix the data. In my opinion the entire goal of this exercise is to make access and use of Sentinel-2 data, because ESA has made decisions for the format and distribution of the data that have put up barriers to that. I think adding the offset falls under "making the data easier to use", especially with @piyushrpt's Option 5 which maintains all of the original values. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
There may be a number of things happening here. I would also look into the values in the SCL band for these pixels. In our experience, using values from SCL classification are almost always needed. If you provide more information regarding the tile you are looking at - I can run some tests and confirm. |
Beta Was this translation helpful? Give feedback.
-
@piyushrpt The tile ID is |
Beta Was this translation helpful? Give feedback.
-
Atleast in this case, all the SCL pixels are correctly labeled as open water and low reflectance observations over open water are not strange. But there could be other occasions, particularly on land where this could be an issue - like you said, this is just one data point. |
Beta Was this translation helpful? Give feedback.
-
After thinking on this more and talking with more folks, I'm now leaning more toward Option 1 - leave the data as is. The reason for this is 2-fold: 2 - The use of scale/offset is common practice and while GDAL/rasterio do not automatically apply scale/offset, higher level tooling often does (e.g., QGIS, TiTiler). Users should always check and apply scale/offset in the data and we should not be encouraging bad practices. Setting the offset to 0 does not make things easier really as I suggested above, because users should be applying scale/offset properly. Note that going forward, regardless of what end up doing, the scale/offset will be properly set both in the STAC metadata and in the COG files themselves. A final decision has not been made yet, we are still looking for more user feedback. |
Beta Was this translation helpful? Give feedback.
-
Agreed with option 1 and leaving the data as is. If you are concerned about breaking a bunch of examples, that is not unfounded. This sentinel2 collection specifically tends to be the go to open source data used in many examples and tutorials. But this is not a huge change to make. As others have said, scales/offsets are common and should be in tutorials anyway to introduce the concept. |
Beta Was this translation helpful? Give feedback.
-
We, at cibolabs.com.au, are comfortable applying the gain and offset ourselves. So, the proposed option 1 is fine. If I understand what you are proposing, then users will have to:
For what it's worth, it will be a big improvement on the current, confusing, situation where we:
Other thoughts:
Finally, thank you. The Sentinel-2 COG collection is awesome. We appreciate your efforts in developing and maintaining it and considering our feedback. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the input @sdtaylor and @tonykgill . @tonykgill correct, the gain and offset would be in the STAC Item and will also be set in the header metadata in the COGs themselves. |
Beta Was this translation helpful? Give feedback.
-
For the preview dataset, we have decided not to apply scale or offset. |
Beta Was this translation helpful? Give feedback.
For the preview dataset, we have decided not to apply scale or offset.