Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant worldcover.file_id mapping #449

Open
scottyhq opened this issue Dec 6, 2024 · 3 comments
Open

Redundant worldcover.file_id mapping #449

scottyhq opened this issue Dec 6, 2024 · 3 comments

Comments

@scottyhq
Copy link
Contributor

scottyhq commented Dec 6, 2024

I noticed in the following case I'm getting two entries for the world_cover sampling that point to the same VRT:

params = {
    "poly": [
        {"lon": -107.04167200088904, "lat": 38.13871558348828},
        {"lon": -106.9906750767, "lat": 38.143321174},
        {"lon": -106.9012192586512, "lat": 38.45076093773869},
        {"lon": -106.88073581223605, "lat": 38.61373659203426},
        {"lon": -106.85459254268015, "lat": 38.821744384031135},
        {"lon": -106.83312685384308, "lat": 38.99253520761745},
        {"lon": -106.8405727056, "lat": 39.0098434489},
        {"lon": -106.8534814495, "lat": 39.0176912842},
        {"lon": -107.057689084, "lat": 39.1253515061},
        {"lon": -107.069284849, "lat": 39.1243151527},
        {"lon": -107.14537429683807, "lat": 39.01866500363045},
        {"lon": -107.08400399803617, "lat": 38.1438656762911},
        {"lon": -107.04167200088904, "lat": 38.13871558348828},
    ],
    "t0": "2019-08-05T20:36:05.101000",
    "t1": "2019-09-09T06:56:31.581000",
    "srt": 0,
    "cnf": 4,
    "ats": 20.0,
    "cnt": 5,
    "len": 40.0,
    "res": 20.0,
    "maxi": 6,
    "atl08_class": ["atl08_ground"],
    "samples": {"worldcover": {"asset": "esa-worldcover-10meter"}},
}

granule_names = ['ATL03_20190805203605_05920406_006_02.h5','ATL03_20190909064801_11180402_006_02.h5']

gfsr = icesat2.atl06p(params, resources=granule_names)
gfsr.attrs
{'file_directory': {
4294967296: '/vsis3/sliderule/data/WORLDCOVER/ESA_WorldCover_10m_2021_v200_Map.vrt', 
0: '/vsis3/sliderule/data/WORLDCOVER/ESA_WorldCover_10m_2021_v200_Map.vrt'}}
client': {'version': 'v4.8.6'},
'server': {'environment': 'v4.8.14-0-g1d1b7edd',
@elidwa
Copy link
Contributor

elidwa commented Dec 6, 2024

After reviewing the code, it is functioning as intended. Points from each granule are sampled in parallel by their dedicated RasterSamplers, with each RasterSampler using a different keySpace. In this case, the keySpace values are 0 and 2^32 (4294967296). The keySpace is used in the creation of fileId, which is included in raster samples to identify the file/raster the value came from.

For the GeoRaster class, it is always the same VRT file, which results in duplicate file entries with different fileIds. However, these are necessary because the sampling code assigns the fileId to each sample based on the keySpace.

To clean up the code, we could add logic to identify that all fileIds in the samples correspond to the same VRT file. Alternatively, we could leave the code as is, as it does not impact functionality.

@scottyhq
Copy link
Contributor Author

scottyhq commented Dec 7, 2024

Ok, thanks @elidwa , you're right it's not a huge issue, just wanted to document it. I'm also noticing different dtypes coming back from the worldcover sampling. I think for the value column these should always be uint8:

...
 13  pflags                  10853 non-null  uint16  
 14  cycle                   10853 non-null  uint8   
 15  geometry                10853 non-null  geometry
 16  worldcover.file_id      10853 non-null  int64   
 17  worldcover.value        10853 non-null  float64 
 18  worldcover.flags        10853 non-null  int64   
 19  worldcover.time         10853 non-null  float64 

edit: i suspect this is in order to handle np.nans

@elidwa
Copy link
Contributor

elidwa commented Dec 9, 2024

@scottyhq, the value is being returned by the server code as a C++ double (64-bit floating point). This decision was made to use a single data type that can represent all possible pixel values across our datasets.

Most of our datasets use pixel data types that are either floating-point values or signed/unsigned integers up to 32 bits. This approach works well with those assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants