-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move ctype/segments from Coordinates to DataSource #357
Comments
Ah, I guess I brought this up before in #266. We've never really discussed it. |
This kept getting pushed off. I think you've finally convinced me. I think my main question would be how the segment information would be propagated through an algorithm pipeline. I think metadata in the xarray data array should suffice, we'd have to think carefully about how that would look. The reason this is important is because fundamentally the question is, what does the data point actually represent? Is it data at a point, or over an area? Is it a probe in the ground, or active over a pixel of an instrument. Is it the value of the sensor at a point in time? or the number of counts in an hour... or a running count. That type of thing. |
AFAIK, this information is not currently propagated in any way, so this would be a new feature. Are you thinking something like a dictionary mapping datasources by name to its segment information? Like a flat dictionary with the same names as we use in the pipeline definitions? I could write out a spec. |
Yeah, some sort of dictionary. We can put in the attrs of the xarray dataarray... or pass it around like we do the coordinates and make it optional... either way the interpolator will know where to pick it up... : {'segment_lengths': {
'lat': 0.5, 'lon': 0.25, 'time': '1,D'
},
'segment_position': {
'lat': 'left', 'lon': 'middle', 'time': 'right'
} Then if it's missing the default is 0 (a point) and 'midpoint'. That would be the basics for uniform grid, or a point sensor with a defined area of effect (i.e. COSMOS). (It would be nice if we could support a circular footprint as well, maybe by setting 'lat_lon': 5 it implies a radius of 5... .). This is a pretty optimized, in terms of storage, approach and probably covers 90% of the cases. For greater generality, e.g. non-uniform grid, say with 'segment_coords': { # or segment_bounds?
'lat': [-0.5, 0.5, 2, 4], ... And then for dependent coordinates you can have a N-D array of coordinates describing the boundary of each point (so size is shape + 1, fence-posts). Actually, you could probably implement I'd have to think more about how this would affect stacked coordinates. That means Now the segment information and the coordinates are completely separated. |
Oh, by "propagated through an algorithm pipeline" you are just talking about how to represent the segments in the datasource and pass them to the interpolator. We're basically on the same page, then. They just need to be passed to the interpolator within the Datasource, that's not a problem. I don't think they need to propagate anywhere else... they are unnecessary once the datasource is evaluated. It seems that there are 4 ways to define the segments for a single dimension:
What do you think about supporting just (1) simple segment length and position and (4) array of bounds? The simple length (1) is the most common, and the bounds (4) is the most flexible and covers all use cases. Both are easy to implement and understand. Plus, (4) is generalizable to non-square N-d boundaries. Whereas fence posts (3) is limited because it assumes that the segments cover the entire space, and a segment lengths array (2) is limited because all of the coordinates must be positioned in the same location within their segment. (Less important, but with the fence posts it is not always possible to index, e.g. |
Currently it is possible to specify the segment type and podpac detects the segment lengths automatically. Do you still want to do that autodetection? |
Uniform segment lengths:
Uniform time segments, lat/lon points:
Uniform time segments, fully defined lat/lon segments:
Autodetect segments (for convenience):
Maps to a helper function, e.g.:
Lastly, as far as a technical spec goes, I might make a light
|
I like your proposal of doing (1) and (4). So, for dependent coordinates, instead of (n, 2) and (m, 2) for lat and lon, you could have (nm, 4) and (nm, 4) to give the lat/lon coordinates of the boxes surrounding the node. And then, as you pointed out, you can have (n*m, x) for a x-sided polygon. Thanks for the examples above. I think it's worth thinking about stacked coordinates as well... |
Okay, let me think about that a little bit. Stacked coordinates with uniform rectangular regions are the same:
Stacked coordinates with nonuniform rectangular regions could be the same, too:
Polygons in general can be the same as the dependent coordinates, except that only 1-d coordinates and 1-d boundary entries are allowed:
(where each For a circle we would need a new type, maybe
So then there is the question of whether the dictionary should be nested to match the coordinates instead, e.g.
I'll have to see what is better in practice. The nested version will make grabbing the segment info for the stacked coordinates in one go easier (e.g.
But on the other hand, there can be stacked coordinates with mixed regions such as
We'll see. |
I really like the I think I prefer the @mlshapiro any opinions? |
see #395 |
I think we should remove segment_lengths from Coordinates.
Users would only be able to evaluate point coordinates. I think this makes sense and would be simpler for users and would be much simpler to code/maintain. (Note that when a user wants to average over segments, an explicit convolution is better and is already implemented.)
Data sources are segmented, not coordinates. The segment information would be moved to the DataSource node alongside the native_coordinates. It would be very clear that the interpolator is responsible for handling segment data sources correctly, and I believe this is the only use-case for segments. I think this would actually be fairly straightforward in most cases (it would make Interpolation ignores ctype #238 less daunting).
We would not need to transform the segments, which I don't believe is possible to do reasonably. See Convert
segment_lengths
when transforming lat and lon coordinates #252.The Coordinates would be so much simpler.
This would really need to be done in 2.0. I note that #252 and #238 are included in 2.0.
The text was updated successfully, but these errors were encountered: