Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-structure coregistration affine methods, point-raster pre-processing, binning and fitting, and add pixel interpretation support #530

Merged
merged 30 commits into from
Sep 5, 2024

Conversation

rhugonnet
Copy link
Contributor

@rhugonnet rhugonnet commented May 24, 2024

This PR restructures affine coregistration methods and many generic subfunctions to be consistent and modular.
Primarily, it makes 1/ "point-raster" subsampling, 2/ any binning and/or fitting, and 3/ iteration over methods, the same across all Affine and BiasCorr methods with extensive modularity.
Additionally, this PR updates horizontal shift reprojection from using a different implementation in every method (e.g. Nuth and Kääb used scipy.RectBivariateSpline) to using consistently interp from GeoUtils which supports any resampling method (for which details can be chosen in global config parameters), with consistent nodata propagation built on top, and preserves computational efficiency by allowing to return a single interpolator object re-used (GlacioHack/geoutils#560). This also allows to avoid sub-pixel errors of Rasterio reprojection.
It also re-structures the code of affine methods, defining core functions outside the classes' _fit_rst_rst or _fit_rst_pts, which allows to have consistent pre-processing steps, and clear input/outputs.
Finally, this PR adds support for pixel interpretation for coregistration methods (important for point-raster operations).

Resolves #234
Resolves #574
Resolves #561
Resolves #546
Resolves #573
Resolves #571
Resolves #547
Resolves #327 (except plot but those are mentioned in #139)
Big step forward for #435

Opens #572
Opens #575
Opens #579

Summary of changes

1/ New core processing functions

For consistent binning/fitting:
Restructures the coreg/affine module to support the fit_or_bin logic previously added in coreg/biascorr: However, as in the case of coregistration, the fit function is fixed (that of the method; for instance dh/slopetan = aspect for Nuth and Kääb), the "bin" option alone is impossible and only "fit" or "bin_and_fit" are supported. Therefore, the mirrored structure simply allows to pass a boolean argument bin_before_fit to perform binning before fitting, as well as any fit optimizer, bin statistic, and bin sizes.
To do this, this PR modifies the bin and fit of BiasCorr into a single _bin_or_and_fit_nd function in coreg.base used by all Affine and BiasCorr methods.

For consistent point-raster/raster-raster subsampling:
This pre-processing step is different from that running directly in Coreg.fit(), which simply extracts the transform, crs, and others. This step revolves around subsampling the point-raster (raster-raster, or point-raster) at the same points for a valid subsample (no NaNs) of a certain size. This cannot be done consistently in fit() before _fit_xxx, because auxiliary variables might need to be derived by the method on the full arrays (such as slope, aspect, etc), which propagate NaNs. In particular, the tricky aspect is the case of point-raster, where we need to first interpolate the raster at the coordinates of the points to get an idea of the available valid subsample (depending here also on NaN propagation).
To do this, this PR modifies the subsampling that existed in BiasCorr into a single _preprocess_pts_rst_subsample function in coreg.base relying on two steps: _get_subsample_mask_pts_rst and _subsample_on_mask. In order to allow to return a dh interpolator that can be re-used iteratively for some affine methods, another function _preprocess_pts_rst_subsample_with_dhinterpolator in coreg.affine relies on a different second step _subsample_on_mask_with_dhinterpolator.

For consistent iterative methods:
Methods such as ICP, Nuth and Kääb, etc, all work based on iterations. To have consistent treatment of inputs/outputs and iterative arguments (tolerance to reach, maximum number of iterations), those are now grouped through an _iterate_method function.

Ensuring consistent argument across these core functions and other parameters:
To ensure the argument passed to these three core functions (subsampling, binning/fitting, and iteration) are consistent, new dictionary types are defined for each which defines their key names and value types: InFitOrBinDict ("In" for input), InRandomDict and InIterativeDict. Additionally InAffineDict for affine-related parameters, and InSpecificDict for any method-specific parameters that does not fit other categories.
Same is done for outputs: OutRandomDict, etc.
See #573 for details on the new organization.

2/ Robust and consistent horizontal shift reprojection

The core of fit_ functions of horizontal methods such as NuthKaab and GradientDescending has been modify to rely on the _preprocess_pts_rst_subsample_with_dhinterpolator mentioned above.
This function relies on geoutils.interp_points (directly for point-raster, and through a new function _reproject_horizontal_shift_samecrs for raster-raster) to perform a consistent 2D grid interpolation with nodata propagation supporting any SciPy method, see details here: GlacioHack/geoutils#560.
This PR adds a test checking that the result of this functionality matches GDAL reprojection, which ensures Rasterio sub-pixel reprojection errors are gone.

3/ Re-structuration of affine classes

All code from affine classes is moved out of the _fit_xxx functions into functions that rely on the same steps where possible.
For instance, the NuthKaab method is divided into a _nuthkaab_fit_func to define the function to optimize, a _nuthkaab_fit which wraps _bin_or_and_fit_nd with some initial guess, a _nuthkaab_aux_vars that derives auxiliary variables required for the fit (slope, aspect), and a _nuthkaab_iteration_step that is passed to _iterate_method.

All the _apply_xxx affine functions are removed to avoid unnecessary redundancies/inconsistencies, as the apply_matrix is the same across all affine methods, and since #531 supports resample=False that modifies only the geotransform for horizontal shifts.

4/ Add support for pixel interpretation

Dealing with area_or_point was previously done arbitrarily in each subclass. Using the above interpolation/reprojection (through geoutils.interp_points), we can now simply pass the argument to GeoUtils who accounts for it when comparing point-raster, or raises a warning for raster-raster having a different interpretation.
It is only required for the fit part of coregistration, and thus added to all related functions.

5/ Removal of Tilt, and other subfunctions

With the above changes, the affine Tilt function corresponds exactly to Deramp(order=1) (instead of having its own, separate implementation). Additionally, it is not truly affine (cannot be modelled by an affine matrix), and thus has no place in affine methods. We can simply remove, and users can rely on Deramp instead.
All sub-functions that were created to perform some processing steps in coregistration modules (many times duplicated from another method, or from a GeoUtils functionality) for a specific affine method are now removed. This is largely thanks to GlacioHack/geoutils#560 which re-structures GeoUtils raster functionalities to be called without having to build a Raster object (simply passing array, transform, CRS, etc).
Until now, we were deconstructing/reconstructing Raster objects everywhere to call simple functions (interp_points, coords, etc), which made us lose performance. This is now largely solved too!
A couple exceptions remains: warp_dem used only by BlockwiseCoreg and _mask_as_array used only by the dDEM class, but their use seems more specific and I'm not sure it is covered by anything in GeoUtils right now.

Tests

I did not write new tests on purpose in this PR to ensure I could check that all these changes do not affect our current methods significantly. Improving tests (in particular for affine methods) will be addressed in a next PR.

And indeed, all tests are passing with minor modifications! 🥳

Additionally, I increased the speed of the tests following #574 (ICP tests were extremely slow and cumulated to 5min+, or half of xDEM's test length). Now nearly all the tests of test_coreg/ run on a cropped version of the reference/tba example DEMs.
And all of xDEM's test run in 5min instead of 12min+. 😃

To-do-list

  • Fix resolution error in NuthKaab of Bug with raster-point coregistration #546,
  • Finalize Define consistent nodata propagation for interp_points and allow to return interpolator geoutils#560,
  • Finalize same-CRS SciPy interpolator reprojection to avoid Rasterio sub-pixel errors,
  • Move subsampling out of the "bin_and_fit" functionality,
  • Make "bin_and_or_fit" a generic function applicable to other functions (not only BiasCorr subclasses),
  • Split preprocessing into consistent functions: subsampling supporting raster-raster or point-raster accounting for invalid data in other variables (bias vars, slope, etc), interpolators supporting raster-raster and point-raster inputs,
  • Add a generic iterator function for iterative methods (Nuth and Kääb, etc),
  • Migrate Affine functions out of the classes, and adapt them to work with the new generic core functions,
  • Add is_translation property to trigger error for unsupported resample=False,
  • Consistently pass "area_or_point" to Coreg subfunctions,
  • Finalize mirroring BiasCorr parameters for modular bin and fit,
  • Add nested, typed dictionary structure to organize coregistration parameters and ensure error-free implementations.
  • Add Coreg.info() to summarize coregistration information, inputs and outputs.
  • Save final subsample size for affine classes (pass through preproc function),
  • Use _check_bin_or_fit function for affine classes (opened Add consistent checks for arguments of coregistration classes #575).

Will do in separate PRs

@rhugonnet rhugonnet marked this pull request as draft June 7, 2024 16:14
@rhugonnet rhugonnet changed the title Re-structure AffineCoreg subclasses to be more modular Re-structure AffineCoreg subclasses to be more consistent and modular Aug 9, 2024
@rhugonnet rhugonnet changed the title Re-structure AffineCoreg subclasses to be more consistent and modular Consistent coregistration: re-structure affine methods, group point-raster pre-processing, binning a and pixel interpretation support Aug 28, 2024
@rhugonnet rhugonnet changed the title Consistent coregistration: re-structure affine methods, group point-raster pre-processing, binning a and pixel interpretation support Consistent coregistration: re-structure affine methods, point-raster pre-processing, binning and fitting, and add pixel interpretation support Aug 28, 2024
@rhugonnet rhugonnet changed the title Consistent coregistration: re-structure affine methods, point-raster pre-processing, binning and fitting, and add pixel interpretation support Re-structure coregistration affine methods, point-raster pre-processing, binning and fitting, and add pixel interpretation support Aug 28, 2024
@rhugonnet
Copy link
Contributor Author

rhugonnet commented Aug 28, 2024

@adehecq @erikmannerfelt This PR is ready for your review!
All tests passing locally with the GeoUtils branch GlacioHack/geoutils#560. 🥳

I have two main aspects on which I'm hesitating on how to move forward, and would especially like your opinion on:

  1. Currently I defined a nuth_kaab main function called by NuthKaab._fit_xxx directly (and this for all methods), which is itself made of some of the new core subfunctions (listed in the main description, point 1/) and a method-specific one. This ensure things are consistent, but is annoying to save variables to the Coreg metadata that are only available at a given step. For instance, for an easy one: What is the final subsample size on valid values? Not available right now. A harder one: What is the binning/parameters at a given iteration step? (might not be necessary, but could become a need in the future for plotting or other)
    I see two solutions here to make that easier:
    a) Either we move the content of nuth_kaab to NuthKaab ._fit_xxx, and wrap the subfunctions called in nuth_kaab as methods of Coreg which always pass/save the metadata consistently when they run (you can see I already did this with _bin_or_and_fit_nd for BiasCorr, it is duplicated as a Coreg method in order to pass/write the metadata). It would make it impossible to call the same nuth_kaab separately than through the Coreg class (but I'm not sure that's a need?), but make those new class methods consistent.
    b) Or we define new function inputs/outputs (dictionaries?) for each of these subfunctions, to pass metadata of interest through the functions back to the class (could become a hassle very fast).

  2. The CoregDict is getting busy and all over the place. It seems like it could become a combination of the new dicts created in this PR: RandomDict, FitOrBinDict, IterativeDict (and maybe an output dict on the results?)...
    How should we structure this? Having a nested dictionary would make it well organized, but maybe harder for users to access a specific key easily, even though it's all there. If we did this, we could add an info() method that prints the metadata nicely to partially remedy that. Any other ideas?

@rhugonnet rhugonnet marked this pull request as ready for review August 28, 2024 21:56
@rhugonnet
Copy link
Contributor Author

Agreed on all.
Except maybe for "it is certainly more difficult to understand when reading the code, maintain and debug.". I think in time (getting used to the functions) and maybe with a Wiki page, it will be much easier to maintain and debug a couple consistent core functions, rather than each Coreg subclass doing its own thing 😉.

For computation times: Similar for Raster-Raster, and faster for Point-Raster (due to avoiding rebuilding a Raster to call interp_points, now this is done from the array/transform directly 😄).

Good idea, I will try to write a short Wiki page why this is fresh!

@rhugonnet
Copy link
Contributor Author

Merging like this!
Want to know @erikmannerfelt's opinion before changing Coreg.meta into Coreg.inputs and Coreg.outputs. I actually don't dislike meta too much (data about other data: the coregistration algorithm in this case), as long as it's organized inside.

@rhugonnet rhugonnet merged commit 86f8a57 into GlacioHack:main Sep 5, 2024
21 checks passed
@rhugonnet rhugonnet deleted the reorg_affine branch September 5, 2024 01:10
@rhugonnet
Copy link
Contributor Author

FYI: I've added a Wiki page here on the Coreg structure and steps to implement a new method: https://github.com/GlacioHack/xdem/wiki/Things-to-know-on-coregistration-class-structure

@adehecq
Copy link
Member

adehecq commented Sep 6, 2024

FYI: I've added a Wiki page here on the Coreg structure and steps to implement a new method: https://github.com/GlacioHack/xdem/wiki/Things-to-know-on-coregistration-class-structure

Perfect, thanks a lot ! That's super useful !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment