Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move segments from coordinates to datasource #395

Merged
merged 15 commits into from
Apr 20, 2020
Merged

Conversation

jmilloy
Copy link
Collaborator

@jmilloy jmilloy commented Apr 17, 2020

Summary

  • coordinates ctype and segment_lengths are replaced by DataSource boundary
  • coordinates area_bounds is removed
  • rasterio reprojection bounds (and WCS request bounds) are calculated from the coordinates start, stop, and step
  • coordinates geotransform supports uniform ArrayCoordinates1d

Datasource coordinates boundary

The Datasource boundary is a dictionary of coordinate boundaries (one for each dimension). Point coordinates are omitted or None. The trait is validated and tested, but it is unused at this point and raises NotImplementedError in some cases. Note that all of these are coordinate offsets.

# simple centered boxes/ranges
boundary={'lat': 0.5, 'lon': 0.5}
boundary={'time': "1,D"}

# simple non-centered boxes/ranges
boundary={'lat': [0, 0.5], 'lon': [0, 0.5]}
boundary={'time': ["0,D", "1,D"]}

# polygons 
boundary={'lat': [0.0, -0.5, 0.0, 0.5], 'lon': [-0.5, 0.0, 0.5, 0.0]}

# array of boundaries, specified for each coordinate
boundary={"lat": [[-0.1, 0.4], [-0.2, 0.3], [-0.3, 0.2]], "lon": [[-0.1, 0.4], [-0.2, 0.3]]}

get_area_bounds

Area bounds are not currently used anywhere, but the logic to compute area bounds from coordinates and a boundary has been updated and may be useful. See get_area_bounds and associated tests. It supports all of the above boundary types.

jmilloy added 2 commits April 17, 2020 10:21
The area_bounds property is removed as well (it depends on these).
Note that ctype and segment_lengths are not used in any tests or core podpac code. The midpoint default for uniform coordinates is expected for some tests the rely on interpolation, which now fail.
@jmilloy jmilloy self-assigned this Apr 17, 2020
@coveralls
Copy link

coveralls commented Apr 17, 2020

Coverage Status

Coverage decreased (-0.04%) to 91.302% when pulling 0ca7e5c on feature/move-segments into 0fcf40f on develop.

jmilloy added 5 commits April 17, 2020 17:20
…_bounds method, and uses both in interpolation.
…e indexes non-uniform boundaries during evaluation as needed, but interpolation raises an exception for non-uniform or non-centered boundaries.
@mpu-creare
Copy link
Contributor

#396 touches segments / ctypes in a few minor places. I think we should merge that FIRST, as this one probably has a lot of changes / updates.

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 19, 2020

I admit that I only have a cursory understanding, but I think that this is revealing to me the extent to which we are mis-using rasterio reprojection to do interpolation and getting lucky that our defaults usually work.

Basically, we infer the rasterio transform bounds from the coordinates within the coordinates object via the inferred segment_lengths. If the segment lengths cannot be inferred, or if the user supplies their own segment_lengths, the rasterio "interpolation" will be incorrect or raise an exception. (This is why you get an exception when you request coordinates with only one value in either dimension: we cannot infer the segment_lengths, so we silently fall back on the bounds which make an invalid geotransform because the high and low bonud is the same!).

I think I can make it work by inferring the area_bounds in the rasterio interpolate method, which will make everything a lot clearer. It will be clear what's actually going on, and I will be able to raise the right exceptions at that time.

Note that nothing else uses the soon-to-be-removed area_bounds. I believe that rasterio is unable to handle anything but centered rectangular coordinates, and that non-default coordinate boundaries (e.g. ctype left or non-uniform segment_lengths) must use a different interpolator.

@mpu-creare
Copy link
Contributor

See if the new Coordinates.geotransform will serve the needed purpose in the rasterio interpolator.

Perhaps we can return False for the rasterio interpolator in cases where area_bounds cannot be inferred.

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 19, 2020

Perhaps we can return False for the rasterio interpolator in cases where area_bounds cannot be inferred.

Yes, the can_interpolate method is doing a better job that I realized.

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 19, 2020

See if the new Coordinates.geotransform will serve the needed purpose in the rasterio interpolator.

I haven't looked closely, but it is giving be the exception when I don't think it should.

self.coordinates.geotransform
*** TypeError: Only 2-D coordinates that are uniform or rotated have a GDAL transform. These coordinates Coordinates (EPSG:4326)
	lat: ArrayCoordinates1d(lat): Bounds[1.0, 3.0], N[3], ctype['midpoint']
	lon: ArrayCoordinates1d(lon): Bounds[1.0, 3.0], N[3], ctype['midpoint'] do not.

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 19, 2020

Ahh, you check isinstance(self._coord['lat'], UniformCoordinats1d) when you could check self._coord[lat].is_uniform.

jmilloy added 2 commits April 19, 2020 11:42
…g the coordinates start, stop and step.

* The boundary does not currently need to be passed through to the interpolator.
* Adds start, stop, and step properties to the ArrayCoordinates1d (only valid when the coordinates are uniform).
* Coordinates geotransform property supports uniform ArrayCoordinates1d.
* Singleton ArrayCoordinates1d objects are not uniform.
* WCS node also computes bounds using start, stop, and step.
* Datasource boundary defaults are removed, validation and tests added.
@jmilloy jmilloy changed the title WIP: move segments from coordinates to datasource Move segments from coordinates to datasource Apr 19, 2020
@jmilloy jmilloy requested a review from mpu-creare April 19, 2020 16:16
Comment on lines 468 to 470
# get indexed boundary
self._requested_source_boundary = self.get_boundary(self._requested_source_coordinates_index)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call (and the get_boundary method) is only necessary for arrays of non-uniform boundaries. And since we aren't actually using the boundary anywhere in the interpolation, it isn't necessary at all. But I didn't want to forget about it. Otherwise, the datasource eval method is fundamentally unchanged.

Comment on lines 273 to 278
west = c["lon"].bounds[0] - c["lon"].step / 2.0
east = c["lon"].bounds[1] + c["lon"].step / 2.0
south = c["lat"].bounds[0] - c["lat"].step / 2.0
north = c["lat"].bounds[1] + c["lat"].step / 2.0
cols = c["lon"].size
rows = c["lat"].size
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpu-creare This is what we would replace c.geotransform if possible, but I couldn't figure out how exactly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. As a one-liner, it would have been

return rasterio.Affine.from_gdal(*c.geotransform)

But this is pretty clear in case of future debugging...

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 19, 2020

@mpu-creare This is all set. Note that the interpolation tests are completed unchanged, I just broke them up into three files. Let me know if you to change anything in the boundary spec (it is similar but I think better to what we discussed in #395).

@mpu-creare
Copy link
Contributor

Ahh, you check isinstance(self._coord['lat'], UniformCoordinats1d) when you could check self._coord[lat].is_uniform.

OH! That's good to know! I might have to fix that in #396 as well.

Copy link
Contributor

@mpu-creare mpu-creare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look pretty good. A few questions (see below).
Unit tests are failing.
I also want to play with it a little bit on my end.

podpac/core/compositor/tile_compositor.py Show resolved Hide resolved
---------
boundary : float, timedelta, array, None
Boundary offsets in this dimension.
* For a centered uniform boundary, use a single positive float or timedelta offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be step / 2 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise in the offsets below you have to divide boundary by 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it down. I'll make the documentation a little clearer.

# uniform segment_lengths
c = ArrayCoordinates1d([1, 2, 4, 5], segment_lengths=0.5)
# polygon (i.e. there would be corresponding offets for another dimension)
area_bounds = c.get_area_bounds([-0.2, -0.5, 0.7, 0.5])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this one...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we will ever use this one, but it is very general.

For a centered square, you could use boundary = {'lat': [-1.0, -1.0, 1.0, 1.0], 'lon': [-1.0, 1.0, -1.0, 1.0]}. So for lat, you get boundary offsets [-1.0, -1.0, 1.0, 1.0], which appear a bit meaningless at that point.

For area bounds, all you care about is the min and max offset, and currently Coordinates1d get_area_bounds does that conversion (from boundary offsets to area bounds offsets). I guess it could be more clear if full Coordinates get_area_bounds method did the conversion, e.g:

class Coordinates:
    def get_area_bounds(self, boundary):
        offsets = {}
        for dim, value in boundary.items():
            if np.ndarray(vaue).ndim == 0:
               value = make_coord_delta(value)
               offsets[dim] = [-value, value]
            elif np.ndarray(value).ndim == 1:
                value = make_coord_delta_array(value)
                offsets[dim] = [min(value), max(value)]
            else:
                value = np.array([make_coord_delta_array(elem) for elem in value])
                offsets[dim] = np.array(np.min(value, axis=1), np.max(value, axis=1)).T
                
        return {dim: self[dim].get_area_bounds(offsets.get(dim)) for dim in self.udims}

And then Coordinates1d.get_area_bounds would expect offsets, and be a bit clearer:

class Coordinates1d:
    def get_area_bounds(self, offsets):
        # point coordinates
        if offsets is None:
            return self.bounds

        # empty coordinates
        if self.size == 0:
            return self.bounds

        if np.array(offsets).ndim == 1:
            # uniform boundary
            lo_offset, hi_offset = offsets
        else:
            # non-uniform boundary
            L, H = self.argbounds
            lo_offset = offsets[L][0]
            hi_offset = offsets[H][1]

        lo, hi = self.bounds
        lo = add_coord(lo, lo_offset)
        hi = add_coord(hi, hi_offset)

        return lo, hi

That is a little bit less efficient, but maybe worth it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when/where we will need the area_bounds though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is great for future. Nice. Good spec.

podpac/core/data/datasource.py Show resolved Hide resolved
Comment on lines 273 to 278
west = c["lon"].bounds[0] - c["lon"].step / 2.0
east = c["lon"].bounds[1] + c["lon"].step / 2.0
south = c["lat"].bounds[0] - c["lat"].step / 2.0
north = c["lat"].bounds[1] + c["lat"].step / 2.0
cols = c["lon"].size
rows = c["lat"].size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. As a one-liner, it would have been

return rasterio.Affine.from_gdal(*c.geotransform)

But this is pretty clear in case of future debugging...

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 20, 2020

return rasterio.Affine.from_gdal(*c.geotransform)

I think you can see that I tried that and left it commented it out. It is not the same and doesn't work.

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 20, 2020

Okay fixed that typo in the tests.

@mpu-creare
Copy link
Contributor

return rasterio.Affine.from_gdal(*c.geotransform)

I think you can see that I tried that and left it commented it out. It is not the same and doesn't work.

Yeah, I'll need to look into that because it should work. They are doing the same thing, or should be. This means one of the two are wrong.

@mpu-creare
Copy link
Contributor

return rasterio.Affine.from_gdal(*c.geotransform)

I think you can see that I tried that and left it commented it out. It is not the same and doesn't work.

Yeah, I'll need to look into that because it should work. They are doing the same thing, or should be. This means one of the two are wrong.

Yeah, I see the issue. The coordinates.geotransform "naturally" handles the order of the dimensions (is_descending). For example, these give the same results:

coords_src = Coordinates([clinspace(10, 0, 3), clinspace(0, 10, 5)], dims=["lat", "lon"])
rasterio.Affine.from_gdal(*coords_src.geotransform)
Affine(2.5, 0.0, -1.25,
       0.0, -5.0, 12.5)
coords_src = Coordinates([clinspace(0, 10, 3), clinspace(0, 10, 5)], dims=["lat", "lon"])
get_rasterio_transform(coords_src)
Affine(2.5, 0.0, -1.25,
       0.0, -5.0, 12.5)

But this does not give the correct answer (since step is < 0 and is not accounted for in get_rasterio_transform

coords_src = Coordinates([clinspace(10, 0, 3), clinspace(0, 10, 5)], dims=["lat", "lon"])
get_rasterio_transform(coords_src)
Affine(2.5, 0.0, -1.25,
       0.0, -1.6666666666666667, 7.5)

So there is a small bug there... I'll fix it.

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 20, 2020

return rasterio.Affine.from_gdal(*c.geotransform)

I think you can see that I tried that and left it commented it out. It is not the same and doesn't work.

Yeah, I'll need to look into that because it should work. They are doing the same thing, or should be. This means one of the two are wrong.

Yeah, I see the issue. The coordinates.geotransform "naturally" handles the order of the dimensions (is_descending). For example, these give the same results:

coords_src = Coordinates([clinspace(10, 0, 3), clinspace(0, 10, 5)], dims=["lat", "lon"])
rasterio.Affine.from_gdal(*coords_src.geotransform)
Affine(2.5, 0.0, -1.25,
       0.0, -5.0, 12.5)
coords_src = Coordinates([clinspace(0, 10, 3), clinspace(0, 10, 5)], dims=["lat", "lon"])
get_rasterio_transform(coords_src)
Affine(2.5, 0.0, -1.25,
       0.0, -5.0, 12.5)

But this does not give the correct answer (since step is < 0 and is not accounted for in get_rasterio_transform

coords_src = Coordinates([clinspace(10, 0, 3), clinspace(0, 10, 5)], dims=["lat", "lon"])
get_rasterio_transform(coords_src)
Affine(2.5, 0.0, -1.25,
       0.0, -1.6666666666666667, 7.5)

So there is a small bug there... I'll fix it.

Aha! Thanks.

@jmilloy
Copy link
Collaborator Author

jmilloy commented Apr 20, 2020

Well, instead of fixing the bug in get_rasterio_transform, I would prefer to use geotransform here if that works. I feel like when I tried it, the tests didn't pass... maybe even threw an exception, but I could be wrong.

@mpu-creare
Copy link
Contributor

Well, instead of fixing the bug in get_rasterio_transform, I would prefer to use geotransform here if that works. I feel like when I tried it, the tests didn't pass... maybe even threw an exception, but I could be wrong.

Yep, that's the plan! I think tests are now passing...

…Also updating unit tests to catch an error in case of descening 'lat' coordinates.
@mpu-creare mpu-creare merged commit bc7c1d9 into develop Apr 20, 2020
@mpu-creare mpu-creare deleted the feature/move-segments branch April 20, 2020 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants