-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to pass xarray_kwargs
from STAC catalog through to intake
#110
Comments
Just to clarify, in your STAC catalog are you using the xarray-assets extension to store those arguments? If so, then I don't believe intake-stac currently uses xarray-assets. There's a bit of discussion of this in #90, but I haven't finished that up to actually use xarray-assets. |
Thanks for your response @TomAugspurger! No I am not using |
I think we're still figuring out best practices :) But yes, I think xarray-assets is appropriate to use in this case. As you say, you can store whatever you want on the object and it'll be available in extra_fields. But I think that a bit of structure is helpful here, so that tools like intake-stac have a standardized set of keys to check. I'm happy to hear what you / others think about this though. I think examples would be helpful here. https://github.com/TomAugspurger/xstac/blob/main/examples/daymet/daily/hi/collection.json has an example collection, and https://planetarycomputer.microsoft.com/docs/quickstarts/reading-zarr-data/ has an example using using it to load the data. IMO, intake-stac should be updated to do something similar to import fsspec
import xarray as xr
store = fsspec.get_mapper(asset.href, **asset.extra_fields["xarray:storage_options"])
ds = xr.open_zarr(store, **asset.extra_fields["xarray:open_kwargs"]) |
Ok I think that makes sense. I can incorporate In case it matters, I am currently just needing to access netcdf files but I assume that the same workflow will also work for not-zarr files. I'm more likely at this point to be able to use dask than zarr, but maybe also zarr in the future. I should have said before when I mentioned seeing tools available but being unsure which to use, I've primarily gotten information from following your Pangeo Discourse thread which has been helpful for knowing what people are working on (though a bit confusing at times). Thanks so much for your work on these packages! |
@TomAugspurger In what sort of timeline might you plan to have your |
Sorry for the delay. Let me get #90 updated now, it shouldn't take too long. That's also adding a new API for loading collection-level assets that I'd appreciate some input on. See #90 (comment) (cc @jhamman and @scottyhq if either of you have thoughts). And if that API discussion is a sticking point, I'll see if I can extract the xarray-assets section. |
@kthyng we can continue debugging the issue from #90 (comment) here. I noticed the snippet I shared was incorrect, I had the wrong key. Instead of
I wonder if that would fix it for you? If you're able to share the STAC item (and maybe NetCDF file too) publicly I'd be happy to take a closer look. |
Thanks @TomAugspurger! Right I should have updated that but wasn't sure if you did intend to have it like that since that is the syntax I've seen in ... I tried with Here these should work for you. I updated the file name in the json file to match test.nc — hopefully I'm not missing anything to make it inconsistent. |
@TomAugspurger I thought of a few more things. I have been using |
@TomAugspurger Sorry for all the notes but I wanted to come back to one difference between what we have each been saying with "properties" vs. "extra_fields" (it is "properties" for the item and "extra_fields" for the asset apparently for pystac to accept the inputs). I don't know if the difference is significant but didn't want to neglect it just in case. Here is sample code for how I define an item and then its asset using
|
@kthyng thanks for sharing those files. Can you try with this for the "xarray:open_kwargs": {
"xarray_kwargs": {
"drop_variables": [
"obs",
"driver_timestamp"
]
}
}, I think in your pystac code it'll be properties = {'fields': fields, 'plots': plots, 'varnames': keysused,
'data_vars': list(ds.data_vars.keys()),
'start_datetime': start_date.strftime("%Y-%m-%dT%H"),
'end_datetime': end_date.strftime("%Y-%m-%dT%H"),
'xarray:open_kwargs': {"xarray_kwargs": {'drop_variables': ['obs', 'driver_timestamp']}}}) but that's untested. To unpack what's going on there:
So there's multiple layers, with similar but slightly different names. With that change to the item, I have In [1]: import intake
In [2]: item = intake.open_stac_item("test-item.json")
In [3]: ds = item[list(item)[0]].to_dask()
In [4]: "obs" in ds.variables
Out[4]: False I'll take a look at your question about sat-search a later. |
@TomAugspurger Thanks again, this is really clear and you are helping my a ton, I appreciate it. Ok so I edited the file directly as you suggested and at first I got
Then I installed the dev version of
So looks like it is important to use the newer changes in I also verified that I could modify the code as you wrote with
and then was able to have the keywords come through as checked by:
Ok so this fixes many of my problems but I do have a couple of questions:
(I think this all supersedes the other notes I gave before and about |
About sat-search vs. pystac-client, I'd recommend pystac-client. It's the library being actively maintained these days.
There's an example at https://planetarycomputer.microsoft.com/docs/tutorials/cloudless-mosaic-sentinel2/#Discover-data using Using pystac-client (or sat-search) currently requires a STAC API endpoint to query against. In other words, it doesn't work with a static STAC catalog (a list of STAC items on disk). There's a feature request at stac-utils/pystac-client#66 for pystac-client to work with static STAC catalogs, but it's not possible today.
I'm not sure, because 1.) I've only ever filtered on basic things like The STAC API item search specification is under some flux right now, so this could change over the next few months / years. |
Thanks for all this information @TomAugspurger. The original question has been answered, which is that if you are using The other questions are more long term. Thanks again for your help and I'm sure I'll be back around soon! |
Hi! I am setting up a STAC catalog which is to be searched using
pystac-client
and the result then read into Python withintake
,intake-stac
, andintake-xarray
. I would like to be able to pass throughxarray_kwargs
from my STAC catalog all the way through to the resultingintake
catalog so that I can read in the datasets toxarray
directly using information stored in the STAC catalog entries. I can passxarray_kwargs
into the "metadata" section of an intake entry, but not into a knownxarray_kwargs
attribute that is actually used when the dataset is read in. Is there a way to encode this properly in the STAC catalog so it passes all the way through? Thank you for any help.The text was updated successfully, but these errors were encountered: