-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Design] Integrate xarray-datatree #2352
Comments
Very good write up, thanks a lot! I'll try to comment on this soon. |
Regarding the integration options, I would start with the readers, so option 3. But on top of implementing open_mfdatatree, that would allow developers to free themselves from the file handler system, which can be a bit too constrained for datasets with multiple interdependent files. I think that would make for a clearer api boundary between the reader and the scene, with the reader returning a single datatree with all the data. In my mind, this single datatree would contain a lazy representation of all the contents of the read file(s). |
I'm not sure I see how this transition happens. In my original option 3 I was thinking of a new top-level interface to "play around with" that in one way or another returns a DataTree object. Very likely the first implementation would be use the Scene and like a You are talking about having readers return a DataTree. I think I would consider this a new integration option 4, but would like get implemented near (alongside) option 2 where the Scene would use that DataTree or a merge of multiple readers' DataTrees as its internal data container.
I think we talked about this in the last meeting and that this isn't currently possible due to performance. If we want to move to this in the future (which I'm OK with) then keeping it in mind as we design something that is do-able now is important. The hard part about doing anything with readers is we have to do it one reader at a time and support both DataTree and DataArray readers in the Scene handling, but with no changes to the user-side of things. In my mind this DataTree integration should be user-facing first and as a way to advertise Satpy to more users and more use cases. I'm not sure changing the internals/design of readers as a first step gets us much unless it is in direct support of one of the other user-facing options. |
Only if we use We wouldn't be able to use |
I think what Martin is saying @pnuu is that the Scene would essentially do:
And |
Ok, so what I meant is actually a bit of both. |
Is the DataTree returned by these engines/readers in a Satpy scheme or the file's scheme? I mean, Satpy readers currently rename a lot of things like variable names or dimensions. |
I was planning on a Satpy scheme |
Just for reference, I can confirm that xarray-datatree is at the moment in the process of being included in the xarray repo. |
Overview
A relatively new package called xarray-datatree introduces a new xarray high-level container called a DataTree. A DataTree contains xarray Datasets which themselves contain xarray DataArrays. In current Satpy our main container object is the
Scene
which stores xarray DataArray objects in a flat dictionary structure, but with complexDataID
objects as keys that support complex dictionary lookups withDataQuery
objects. It is my opinion that Satpy could benefit from using DataTree objects in various parts of Satpy. A DataTree could provide an xarray "builtin" (the expectation being that DataTrees are merged into upstream xarray) container that requires little to no special code in Satpy to work with DataArrays/Sets/Trees. This could mean getting rid of the satpyScene
where Satpy is then focused on providing helper functions and accessors on xarray objects for doing the usual tasks that the Scene is used for.This issue is meant to be a discussion of the various options for where DataTree could be used (on a high-level, not reader/writer internals), whether they are a good idea, and what priority should be given to implementing them. Feel free to comment with other ideas or feedback about the options described below.
The fact that this stuff is being discussed does NOT mean that we're breaking backwards compatibility in Satpy any time soon.
Integration Options
Scene.to_xarray_datatree
: Similar to the existingScene.to_xarray_dataset
, we could add ato_xarray_datatree
method that converts a Scene's collection of DataArrays to a DataTree object. There are a couple different ways that a DataTree could be organized so this method may need to take keyword arguments to control the grouping of the DataArrays. See next section.Scene
internal containers: Basically replaceScene._datasets: DatasetDict
with aDataTree
. This should have little to no effect on users as this should be an internal data structure. The question is whether or not this benefits the Scene (and us, the developers). Again this depends on what type of grouping scheme is used. See next section. The other question here is what role does the existingDataID/DataQuery
play with a new internal structure like this.satpy.open_mfdatatree
: See this datatree issue for a short discussion on the initial idea of anopen_mfdatatree
function. In the past some Satpy developers have discussed redesigning the Scene and readers to be more separated. For example, a DataSource that provides data to one or more Reader objects, and these Readers are passed to some type ofload
functionality that reads/loads/generates the proper DataArrays from files and composites if requested. Along these lines it would be nice to provide a genericopen_mfdatatree
function that takes a set of files and a reader name and perhaps optionally a set of products to load/create and then a DataTree is returned. Combined with many of the other discussions going on all over the place regarding xarray accessors, or splitting other parts of the Scene up, this would abstract away a lot of the complexity of readers and composites and everything else...but it would also take a lot of control away from Satpy users used to the Scene andreader_kwargs
andavailable_dataset_ids
.Grouping Schemes
This is all I can think of for now. Let me know what you think.
The text was updated successfully, but these errors were encountered: