Rework affine transformation re-gridding (from 3D point cloud to 2.5D DEM) for accuracy and performance #531

rhugonnet · 2024-05-24T04:12:44Z

This PR updates apply_matrix to use efficient and accurate 3D affine transformation for rasters, for any transformation.

It also modifies the exact affine transformation of a point cloud to use only NumPy instead of OpenCV, which brings us one step closer to not depend on OpenCV (which is a pretty heavy dependency to have). The only remaining functionality relying on OpenCV is the ICP algorithm, which will be addressed in #483.

Problem and summary of previous changes

In apply_matrix for rasters, after applying the affine transformation on the 3D point cloud converted from the DEM (which is an exact conversion, in this direction), we need to then approximate a re-gridded DEM from the transformed point cloud (for which the coordinates do not align with any grid anymore). This is an irregular gridding problem typically done using triangulation (Delauney triangles), and interpolating linearly in each (as done for point clouds by https://gdal.org/programs/gdal_grid.html#gdal-grid, PDAL through GDAL, and in Python by scipy.interpolate.griddata).

In #87 we removed scipy.interpolate.griddata as being too slow for apply_matrix for rasters, and replaced it by a 2-step routine: (1) correct translations with scipy.interpolate.BivariateSpline + (2) correct small rotations with 2D deramping. This increased the speed a lot for everything, but deramping creates a lot of artefacts when approximating a rotation (as explained by @erikmannerfelt), and does not support scaling.

New logic for affine transformation of rasters

This PR keeps the logic of #87 to differentiate the case of having only translations (to accelerate horizontal/vertical grid shifts), but implements a new iterative method to accurately and quickly resolve the regridding for small rotations, and re-introduces griddata (a bit slower but accurate) for the general case of any rotations/scaling. The artefacts @erikmannerfelt previously noticed from the griddata implementation were due to a bad propagation of NaNs during griddata (need to be derived as a function of the distance to the transformed points), so I also fixed this here by implementing a tested point cloud gridding function in GeoUtils: GlacioHack/geoutils#553.

I also optimized the performance and added more flexibility to of a couple point cloud utilities: GlacioHack/geoutils#549, GlacioHack/geoutils#546, GlacioHack/geoutils#547, GlacioHack/geoutils#554.

Schematic of new implementation

The iterative method for small rotations aims to find the (unknown) elevation Z at (unknown) irregular coordinates X, Y which transform exactly onto the (known) regular grid X0, Y0 with (unknown = objective) elevation Z.

Because we know everything is linked by an exact affine transformation and that the initial DEM can be interpolated at any 2D coordinates X,Y to get Z, we can go faster than griddata. We search for coordinates X,Y derived from the inverse affine transform of X0, Y0 and an (unknown) Z, initialized with an initial closest-neighbor search, and then refined iteratively by interpolating the original DEM at coordinates X,Y (and we can re-use the same interpolator, which is quite fast).
The algorithm typically converges (to X0,Y0 within 10e-5) independently for 99% of points in 4/5 iterations, and the last 1% of points take 4/5 more to reach the tolerance.
All grid points converges independently, and this works well even on a 1,000,000 point real-world DEM with outliers, noise, etc (our Longyearbyen example in this case).

Tests

This PR adds a lot of new tests for both implementations (griddata and the iterative one), some that live in GeoUtils with the underlying functionality.
Both methods are able to reproduce the real transformed elevations of the transformed DEM at a precision of 10e-5 for a constant slope DEM 🙂. For varying slopes, more differences arise as expected (near 90° slopes have multiple solutions), but mostly because we cannot check as robustly (2D linear interpolation is not equivalent before/after transformation anymore). The tests also check the spreading of NaNs through both methods to ensure they behave as expected.

Performance

Independently of the nearest neighbour search (detailed below) and data conversions (which we could try to minimize mroe generally at the package scale), the iterative algorithm is much faster than griddata, about 7 times faster for the Longyearbyen DEM! 🥳

Elapsed for iterative algo: 1.36470627784729
Elapsed for griddata: 9.616541385650635

Running a nearest neighbour search between two points clouds is necessary in both methods (to initialize the Z for the iterative affine method, and to get the locations of NaNs for the griddata method), and is currently one of the limiting factors (takes about 5 seconds on the Longyearbyen dataset, so almost as slow as griddata by just a factor 2).

But using scipy.KDTree or refining the max_distance parameter of geopandas.sjoin_nearest could both improve speed a lot for this part of the methods: https://gis.stackexchange.com/questions/222315/finding-nearest-point-in-other-geodataframe-using-geopandas.

UPDATE: Neither using the max_distance parameter of sjoin_nearest, nor using scipy.spatial.KDTree improved the speed at all, so I'm not including any additional changes on this in this PR. There might be a better way to initialize Z than with a nearest neighbour search (also utilizing knowledge from the affine transformation), which would make the "iterative" algorithm much faster overall, but I'm not inspired right now 🤔.

UPDATE 2: Actually using the transformed elevation associated with the original point (even if not aligned very well in X/Y) works as well as a nearest neighbour to initialize, and only takes a couple more iterations (which are ~100x faster than a nearest neighbour search)! Problem solved, and we officially have a re-gridding for affine transforms around 10x faster than the usual irregular point re-gridding and as accurate! 🥳 🤟

Details

This PR:

Replaces OpenCV by NumPy when applying a 4x4 matrix to a 3D point cloud to reduce the need for the optional dependency (tested locally, not added to the tests yet; gives exactly the same result 🎉),
Adds a new implementation that finds a fast nearly exact solution by iteration for small rotations,
Adds a schematic to explain that implementation,
Adds tests to check nodata propagation of "iterative" method,
Optimizes speed of iterative method,
Speeds-up to_pointcloud for subsample=1, see Small fixes and optimizations for to_pointcloud and interp_points geoutils#549,
Adds the griddata implementation for large rotations, see Add point cloud gridding geoutils#553,
Adds tests checking the new implementations of apply_matrix.

I will soon add a Python version of ICP, which will reduce even more the need for having OpenCV as an optional dependency (only if using ICP + asking for method="opencv").

Resolves #110
Resolves #499

rhugonnet · 2024-05-24T04:30:26Z

@erikmannerfelt @adehecq I want to check if you both agree on the way forward to finalize the apply_matrix function to support any affine transform matrix without introducing approximation biases (minimizing them at least).

In apply_matrix, after applying the affine transformation on the 3D point cloud converted from the DEM (which is an exact conversion, in this direction), we need to then approximate a re-gridded DEM from that new point cloud (for which the coordinates do not align with any grid anymore). Currently, we use deramping() for this. However, as @erikmannerfelt pointed out, this is just an approximation for very small rotation angles, which actually introduces some errors. Plus, it does not account for any scaling that could be passed in the affine matrix.

Instead, I think we should re-grid using scipy.interpolate.griddata, which is a delauney triangle-based interpolation, same as used in gdal_grid (https://gdal.org/programs/gdal_grid.html#gdal-grid, and used by PDAL through GDAL too for lidar point clouds). The interpolation in the triangles can be done as "nearest", "linear" or "cubic", and should always perform well when re-gridding a dense point cloud from a DEM back into another DEM, and for any type of affine transformation (translation, rotation, scaling). I'm not sure we could do better than that!

And we would maintain the case of an horizontal shift separate for computational efficiency.

The only delicate part of the new routine should be to re-create NaNs where we want to have NaNs after interpolation (any pixel that does not have a shifted data point), but I think I can figure that out.

adehecq · 2024-05-28T12:33:15Z

I totally agree that we should use an exact regridding for the apply_matrix function and not a workaround. But in the end, you plan to use the same function as before #87, so it will be very slow for other affine transformations than horizontal/vertical shifts, right? Any chance there would be a more efficient option than griddata 3 years later?
And what about scipy.interpolate.LinearNDInterpolator ? Of course, it provides only linear interpolation.

rhugonnet · 2024-05-30T05:00:09Z

So gridddata is actually a wrapper around LinearNDInterpolator, both are the same thing!

I looked into the issue in details over the weekend, and it looks like there is no "exact" solution for the reprojection to a regular 2D grid (for instance, rotating the DEM by 90°, so essentially turning it vertical along one axis, many possible elevation points could intersect a similar regular X/Y grid point).

So it's not trivial...
But still, using griddata to estimate the new DEM still seemed like a bit of a computational waste, because we know the exact affine transform function + we have points on a regular grid initially (so nicely spread out, while griddata is especially useful to work on completely irregular points).
Using these two characteristics, I was able to write a little inversion algorithm to find out the X',Y' coordinates on the original DEM grid that transform into a regular grid X,Y for the transformed DEM! It converges fast towards the solution for small rotations (which is what we have in 95% of the cases).
I finished testing it today on a synthetic DEM + the Longyearbyen DEM, and it works well for both! All pixels (that is, more than 1 million pixel for the Longyearbyen DEM) converge independently in just 4/5 iterations, for rotations smaller than 20°. And this procedure only involves affine transformations (fast, exact matrix multiplication) and 2D interpolation on the DEM regular grid (very fast compared to triangulation of irregular points)! 🥳

The longest computational aspect is the search for a nearest neighbour elevation to initialize the algorithm, but this should also be much faster than a triangulation.

In the next days I will try to make a couple plots to characterize convergence/divergence for rotation amplitude + estimate the speed-up compared to griddata (that I will still add for large rotations, even though the message is: for big rotations, to conserve the 3D shape, don't reproject into a 2.5D DEM or you lose info! do everything with point clouds.)

I will also make a little schematic to explain the algorithm 😉

rhugonnet · 2024-06-03T19:46:18Z

@erikmannerfelt @adehecq This PR is finished and ready for your review! 😄 (see updated description at the top)

rhugonnet · 2024-06-07T04:04:15Z

Alright, I had to fix a couple issues in the documentation + optimize the speed of some functions. Now all done and ready to merge! 😁

xdem/coreg/base.py

tests/test_coreg/test_base.py

adehecq · 2024-06-07T10:02:38Z

Fantastic !! Nice work ! 🙌
Very smart algorithm and fancy scheme! 😄
Although I get what you do (trying to find from which point of the original DEM each regularly gridded point comes from), I still struggle to follow the steps in the code. Maybe an illustration with a 4-5 points line and a rotation, showing what is done at each step, would help but it's also ok if I don't fully understand everything at the moment.
The tests look robust and I trust the outputs anyway.
Just a few minor comments above.

adehecq

Forgot to approve!

rhugonnet · 2024-06-07T15:39:15Z

Glad you like it, thanks for the swift review! 😄

Replace opencv by numpy for affine transformation

03ce388

rhugonnet mentioned this pull request May 24, 2024

Add skip_nodata argument to Raster.to_pointcloud GlacioHack/geoutils#546

Merged

Incremental commit on apply func

fcf81e4

rhugonnet marked this pull request as draft May 30, 2024 04:48

rhugonnet added 6 commits May 31, 2024 13:41

Incremental commit on new apply

f1b70af

Linting

1c05fe3

Add test for nodata propag

3820df0

Improve speed of iterative algo

2ab3eef

Linting

f4da38e

Merge remote-tracking branch 'upstream/main' into rework_apply_matrix

cf568cc

rhugonnet marked this pull request as ready for review June 3, 2024 19:46

rhugonnet requested review from adehecq and erikmannerfelt June 3, 2024 19:46

rhugonnet added 3 commits June 3, 2024 11:52

Update geoutils version dependency

b343d22

Modify interp points input

b18d1fe

Linting

a0b1a12

rhugonnet changed the title ~~Rework affine transformation to be more accurate~~ Rework affine transformation re-gridding (from 3D point cloud to 2.5D DEM) for accuracy and performance Jun 4, 2024

rhugonnet added 5 commits June 3, 2024 17:17

Improve speed of iterative algorithm and clear out the time prints

fcfaafa

Finalize last test

b219360

Update function description

3defc09

Move description to main apply_matrix function

d0efd79

Linting

9b48134

ameliefroessl mentioned this pull request Jun 6, 2024

Dask for coregistration #525

Draft

14 tasks

rhugonnet added 3 commits June 6, 2024 11:09

Merge remote-tracking branch 'upstream/main' into rework_apply_matrix

acdc5d7

Add back pip dependency

85a7439

Fix test matrix definition

4cf00fb

Add full traceback print to stderr on execution failure

8436071

rhugonnet mentioned this pull request Jun 6, 2024

Show full traceback on notebook execution error GlacioHack/geoutils#557

Merged

rhugonnet added 2 commits June 6, 2024 15:12

Increase time out

0b4c8be

Update method to use transformed z as initialization of algo

ea0882e

adehecq reviewed Jun 7, 2024

View reviewed changes

xdem/coreg/base.py Show resolved Hide resolved

adehecq reviewed Jun 7, 2024

View reviewed changes

tests/test_coreg/test_base.py Show resolved Hide resolved

adehecq reviewed Jun 7, 2024

View reviewed changes

tests/test_coreg/test_base.py Outdated Show resolved Hide resolved

adehecq reviewed Jun 7, 2024

View reviewed changes

tests/test_coreg/test_base.py Show resolved Hide resolved

adehecq previously approved these changes Jun 7, 2024

View reviewed changes

Account for amaurys comments

1bf3799

rhugonnet dismissed adehecq’s stale review via 1bf3799 June 7, 2024 15:38

rhugonnet merged commit 4ce2b44 into GlacioHack:main Jun 7, 2024
10 checks passed

rhugonnet deleted the rework_apply_matrix branch June 7, 2024 16:13

This was referenced Aug 28, 2024

Re-structure coregistration affine methods, point-raster pre-processing, binning and fitting, and add pixel interpretation support #530

Merged

Speed-up ICP-related tests #574

Closed

[POC] [STORY] Makefile #564

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework affine transformation re-gridding (from 3D point cloud to 2.5D DEM) for accuracy and performance #531

Rework affine transformation re-gridding (from 3D point cloud to 2.5D DEM) for accuracy and performance #531

rhugonnet commented May 24, 2024 •

edited

Loading

rhugonnet commented May 24, 2024 •

edited

Loading

adehecq commented May 28, 2024

rhugonnet commented May 30, 2024 •

edited

Loading

rhugonnet commented Jun 3, 2024 •

edited

Loading

rhugonnet commented Jun 7, 2024

adehecq commented Jun 7, 2024

adehecq left a comment

rhugonnet commented Jun 7, 2024

Rework affine transformation re-gridding (from 3D point cloud to 2.5D DEM) for accuracy and performance #531

Rework affine transformation re-gridding (from 3D point cloud to 2.5D DEM) for accuracy and performance #531

Conversation

rhugonnet commented May 24, 2024 • edited Loading

Problem and summary of previous changes

New logic for affine transformation of rasters

Schematic of new implementation

Tests

Performance

Details

rhugonnet commented May 24, 2024 • edited Loading

adehecq commented May 28, 2024

rhugonnet commented May 30, 2024 • edited Loading

rhugonnet commented Jun 3, 2024 • edited Loading

rhugonnet commented Jun 7, 2024

adehecq commented Jun 7, 2024

adehecq left a comment

Choose a reason for hiding this comment

rhugonnet commented Jun 7, 2024

rhugonnet commented May 24, 2024 •

edited

Loading

rhugonnet commented May 24, 2024 •

edited

Loading

rhugonnet commented May 30, 2024 •

edited

Loading

rhugonnet commented Jun 3, 2024 •

edited

Loading