Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdem.coreg.ICP.fit_pts fails for unknown reason with a regular dataset #423

Closed
erikmannerfelt opened this issue Aug 31, 2023 · 4 comments · Fixed by #585
Closed

xdem.coreg.ICP.fit_pts fails for unknown reason with a regular dataset #423

erikmannerfelt opened this issue Aug 31, 2023 · 4 comments · Fixed by #585
Labels
bug Something isn't working

Comments

@erikmannerfelt
Copy link
Contributor

erikmannerfelt commented Aug 31, 2023

As promised in #422 , here's the other issue hehe.

This is a strange one. I already have a fix, but I don't understand why it helps!

Problem description

When running ICP.fit_pts() I ran into the issue that led to #422; my points were all filtered away for some reason. After some debugging, I found where this happens, but I still don't know why.

First, rasters containing information on the normals are sampled using regular Raster.interp_points calls:

xdem/xdem/coreg/affine.py

Lines 525 to 529 in 3956073

for key, raster in [("nx", normal_east), ("ny", normal_north), ("nz", normal_up)]:
raster.tags["AREA_OR_POINT"] = "Area"
ref_dem[key] = raster.interp_points(
ref_dem[["E", "N"]].values, shift_area_or_point=True, mode="nearest"
)

Then, when the rasters have been sampled, a convoluted nan removal line removes the unsuccessful samples:

xdem/xdem/coreg/affine.py

Lines 536 to 537 in 3956073

for key in points:
points[key] = points[key][~np.any(np.isnan(points[key]), axis=1)].astype("float32")

This led to my ref_dem being empty, as the above interp_points for some reason only returned nans! I figured out a fix for this problem, but I still don't understand the symptom...

The solution

Instead of the sampling in the code (shown above), I changed it to:

normal_east.tags["AREA_OR_POINT"] = "Area"
rowcol = np.round(np.dstack(normal_east.xy2ij(ref_dem["E"], ref_dem["N"], shift_area_or_point=True)).squeeze()).astype(int)

if any(col not in ref_dem for col in ["nx", "ny", "nz"]):
    for key, raster in [("nx", normal_east), ("ny", normal_north), ("nz", normal_up)]:
        ref_dem[key] = raster.data[rowcol[:, 0], rowcol[:, 1]].filled(np.nan)

Basically, I make a more efficient nearest neighbour sampling by first determining the row and column indices, then directly indexing the underlying raster. This should be much more efficient, as that part is only done once instead of three times, and for some reason it fixes my problem with the sampling.

What's the actual problem?

Without a minimal (non-)working example, it's hard for me to figure out exactly what's wrong... Honestly, I think it boils down to one of two potential issues:

  1. There's something wrong with Raster.interp_points in geoutils
  2. There's something wrong with my use of the functionality

Honestly, and argument for making a PR to "fix" this can simply be motivated by performance. If Raster.interp_points is not used in the method, no Rasters have to be instantiated and I can just use the rasterio coords to ij functions. That'd reduce the amount of copying that might happen and it will probably make the code look cleaner. I might open a PR to improve the code and then add an associated geoutils issue for tackling the underlying problem.

Still, it would be great to get to the bottom of why this actually fails... @rhugonnet or @adehecq, have you experienced something similar before with this function?

@erikmannerfelt erikmannerfelt added the bug Something isn't working label Aug 31, 2023
@erikmannerfelt
Copy link
Contributor Author

Also, @rhugonnet and @adehecq, if you have any recommendations of how to make a minimal (non-)working example, I'm all ears. It's a float32 DEM with ICESat-2 points that should work just like in the test. But it doesn't!

@erikmannerfelt
Copy link
Contributor Author

Wait what the heck is the difference between Raster.value_at_coords and Raster.interp_points?? I see that they have almost the same docstring and I've invariably used both interchangeably.

@rhugonnet
Copy link
Contributor

I have a deadline today, will look in more details tomorrow.

Quick answer on your last question:

  • The value_at_coords function extracts the closest point value to the coords or its average (any type of reductor function) within a square window of a certain size (doesn't require to load the entire Raster).
  • The interp_points function performs a regular grid interpolation (nearest, linear, cubic) to extract the values at the coords.

More details in the examples: https://geoutils.readthedocs.io/en/stable/analysis_examples/index.html#point-extraction
But, yes, we definitely need to change the names and improve the docstrings of those functions...

@erikmannerfelt
Copy link
Contributor Author

I have a deadline today, will look in more details tomorrow.

Quick answer on your last question:

* The `value_at_coords` function extracts the closest point value to the coords or its average (any type of reductor function) within a square window of a certain size (doesn't require to load the entire Raster).

* The `interp_points` function performs a regular grid interpolation (nearest, linear, cubic) to extract the values at the coords.

More details in the examples: https://geoutils.readthedocs.io/en/stable/analysis_examples/index.html#point-extraction But, yes, we definitely need to change the names and improve the docstrings of those functions...

Thanks for the info! That makes more sense now. But perhaps they could just be one function with an "interpolate" kwarg to value_at_coords? Either way, a docstring improvement would be great! I'll add an issue in geoutils.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants