-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local hypsometric interpolation #36
Local hypsometric interpolation #36
Conversation
Aand I screwed up the double PRs again... I'm too tired to do git properly! |
… added documentation
…ck to handle nans appropriately
For the hypsometry interpolation/extrapolation, I went with the easiest possible approach using the builtin One could go a bit further and add polynomial estimation using RANSAC to filter bad bins. What do you guys think? |
xdem/volume.py
Outdated
return output | ||
|
||
|
||
def interpolate_hypsometric_bins(hypsometric_bins: pd.DataFrame, height_column="median", method="polynomial", order=3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
…it only gives one per default
I'm fine adding scikit-learn with RANSAC, it would be a good addition. Although to be honest I'm not sure it will outperform traditional polynomial by much. I think the issue is mostly with the parametric nature of the polynomial regression, rather than the mitigation of outliers. I had a good performance when implementing 1D lowess, which is nonparametric. It was a while ago and I had to code most of it by hand (https://github.com/iamdonovan/pyddem/blob/master/pyddem/volint_tools.py#L148), not sure what is available now. |
I just added a hypsometry area calculation script. The workflow would now be: ddem_bins = xdem.volume.hypsometric_binning(ddem, reference_dem)
interpolated_bins = xdem.volume.interpolate_hypsometric_bins(ddem_bins)
bin_area = xdem.volume.calculate_hypsometry_area(interpolated_ddem_bins, reference_dem, pixel_size=pixel_size)
# This will be a float
volume_change = (ddem_bins * bin_area).sum() Thoughts? |
It make sense. But what exactly is in ddem_bins and interpolated_ddem_bins (tthere's a typo I think as you called it interpolated_bins elsewhere too)? |
@adehecq value count
(225.28848266601562, 275.2884826660156] -19.098885 724.0
(275.2884826660156, 325.2884826660156] -36.254318 1034.0
(325.2884826660156, 375.2884826660156] -31.608276 617.0
(375.2884826660156, 425.2884826660156] -22.635315 752.0
(425.2884826660156, 475.2884826660156] -19.853256 1294.0
(475.2884826660156, 525.2884826660156] -15.140961 1321.0
(525.2884826660156, 575.2884826660156] -10.407410 997.0
(575.2884826660156, 625.2884826660156] -3.927429 1021.0
(625.2884826660156, 675.2884826660156] 2.108887 791.0
(675.2884826660156, 725.2884826660156] 1.071594 530.0
(725.2884826660156, 775.2884826660156] 0.579987 296.0
(775.2884826660156, 825.2884826660156] -1.488831 302.0
(825.2884826660156, 875.2884826660156] 1.046814 282.0
(875.2884826660156, 925.2884826660156] 0.201599 248.0
(925.2884826660156, 975.2884826660156] -2.662354 161.0 Indeed, The reason ddem_func = scipy.interpolate.interp1d(ddem_bins.index.mid, ddem_bins.values, kind="linear", fill_value="extrapolate")
mean_dem = ref_dem - (ddem_func(ref_dem) / 2) Would it make more sense to make |
But should calculate_hypsometry_area not only depend on the bin edges and a reference DEM? In that case, I would rather just pass the elevation bins, rather than ddem_bins. |
Is this approved, @rhugonnet and @adehecq, or should we discuss something more? |
I've added "equal area" binning in 3b33d4b, according to @rhugonnet's approach in #39. Right now, the terminology is a bit all over the place. This is how I've made the options, but feel free to suggest changes: volume.hypsometric_binning(ddem, ref_dem, bins=50, kind="see below") "equal_height": The elevation range of each bin is 50 georeferenced units. "fixed_count": There are 50 bins in total with equal elevation range. "equal_area": There are 50 bins in total, and all cover almost exactly the same area (#39). To exemplify the outputs (the values don't mean much because I've neither coregistered nor masked the DEMs): In [1]: import xdem
In [2]: ref_dem = xdem.dem.DEM(xdem.examples.FILEPATHS["longyearbyen_ref_dem"])
No metadata could be read from filename.
In [3]: tba_dem = xdem.dem.DEM(xdem.examples.FILEPATHS["longyearbyen_tba_dem"])
No metadata could be read from filename.
In [4]: ddem = ref_dem.data - tba_dem.data
In [5]: xdem.volume.hypsometric_binning(ddem, ref_dem.data, bins=10, kind="equal_height")
Out[5]:
value count
(8.6741304397583, 18.6741304397583] -1.760605 15780.0
(18.6741304397583, 28.6741304397583] -2.291516 45596.0
(28.6741304397583, 38.6741304397583] -1.547386 33645.0
(38.6741304397583, 48.6741304397583] -1.166397 26651.0
(48.6741304397583, 58.6741304397583] -1.175062 27800.0
... ... ...
(978.6741304397583, 988.6741304397583] -1.051819 265.0
(988.6741304397583, 998.6741304397583] -0.553040 203.0
(998.6741304397583, 1008.6741304397583] -0.421906 148.0
(1008.6741304397583, 1018.6741304397583] 0.948212 68.0
(1018.6741304397583, 1028.6741304397583] 0.209137 16.0
[102 rows x 2 columns]
In [6]: xdem.volume.hypsometric_binning(ddem, ref_dem.data, bins=10, kind="fixed_count")
Out[6]:
value count
(8.6741304397583, 109.9881199936731] -1.807365 242028.0
(109.9881199936731, 211.3021095475879] -2.855278 142598.0
(211.3021095475879, 312.6160991015027] -2.900841 183366.0
(312.6160991015027, 413.9300886554175] -3.097565 177362.0
(413.9300886554175, 515.2440782093322] -3.637207 160876.0
(515.2440782093322, 616.5580677632471] -3.866760 144277.0
(616.5580677632471, 717.8720573171619] -3.371338 121107.0
(717.8720573171619, 819.1860468710767] -3.312836 90834.0
(819.1860468710767, 920.5000364249914] -2.082214 41171.0
(920.5000364249914, 1021.8140259789062] -0.364868 8401.0
In [7]: xdem.volume.hypsometric_binning(ddem, ref_dem.data, bins=10, kind="equal_area")
Out[7]:
value count
(8.6741304397583, 51.949240112304686] -1.686300 131202.0
(51.949240112304686, 126.75953216552733] -2.185017 131202.0
(126.75953216552733, 216.52808837890626] -2.811653 131202.0
(216.52808837890626, 287.96279296875] -2.922958 131202.0
(287.96279296875, 362.9493103027344] -3.012344 131202.0
(362.9493103027344, 439.10780029296876] -3.240845 131202.0
(439.10780029296876, 523.1372314453124] -3.678909 131202.0
(523.1372314453124, 615.8244506835937] -3.877625 131202.0
(615.8244506835937, 726.0829772949219] -3.362030 131202.0
(726.0829772949219, 1021.8140268789062] -2.746582 131202.0 EDIT: The text below was written before the name change "equal_count" -> "fixed_count" One potential limitation in this terminology is that "equal_count" and the "count" column are not really related. "equal_count" is an equal bin count, while the "count" column is the pixel/value count for each bin! Is this confusing or should we just leave it as is? |
Crazy how much a 1 minute bathroom break's reflection can matter. What about Especially the kind EDIT: I've replaced all instances of |
Oh no, I just found a large unintended feature in this approach. The "count" is not analogous to "area" when there are nans in the dDEM! That is why I didn't want to mix those terms in the first place, I recall... What are you guys' opinion on this, @rhugonnet and @adehecq . To be clear, the "equal_area" binning kind makes it so that bins are created with an equal amount of finite dDEM values, not an equal glacier area. That might be what we want, but it wasn't what I intended! |
Another suggestion for alternate names: |
Intuitively, I expect the most "performant" one to be a uniform (equal_count) binning accounting for NaNs. That is to say, equal count for the hypsometric binning, independently of what values are passed in the ddem array. |
Please correct me if I'm mistaken, @rhugonnet , but that implies the bins have to be calculated using the reference DEM elevations, not the mean elevations. This may be problematic with large change, like we discussed before: If a glacier has lost 100 m at 1000-1100 m a.s.l. between 1900 and 2000, the 2000 area may be 0, so the apparent volume change will be 0. |
FYI, pandas naming: The problem, I think, is that "fixed_bincount" is a sub-category of equal height! And equal_area is actually also based on height binning (although it translate into area). So using "height" or "area" is a bit confusing to me. You have only 1 category (height), and then several ways of dividing it. binning_type = ['custom','fixed','quantile','count'] #you can bind by a fixed number of height or areas
binning_width = [[],50 m , 10%, 100] For count, we could let the user one/two-line (elevation range/bin_count) and use "fixed" if desired? |
I'm not sure this makes any difference: if the elevation change is 0 at lower elevation, you can propagate that 0 to the rest of the bin without any issue. |
I would say, Romain's last suggestion is the most instuitive? So |
Sorry, @rhugonnet, but I honestly don't understand what you mean. If we have a glacier:
etc., the integrated volume change in the first elevation bin would be 0, assuming one uses the modern hypsometry. How and where is this propagated? |
I agree with "fixed", "quantile" and "count", like @rhugonnet suggested. I'll implement that! |
Done in 5646dca |
The date of the reference DEM just shifts all bins, you don't really lose any elevation change information:
If you use a DEM_post:
Hope this makes sense in relation to the comment ! |
Ooooooh, of course! Wow now I feel stupid. I've made things so hard for myself the whole way I'll correct this (meaning this will be true: "area" == "count_sum" * "pixel_area") and set the default |
Since the latest commit, I daresay this is ready. We solved the issue I had with reference/mean elevations (turns out I was just poorly informed) and the functions are now in a working state. Once merged, I can start implementing them in the DEMCollection class. Is there anything else we should consider first? |
Feel free to merge and move on with the DEMCollection class. Maybe we should have a meeting next week to make sure we agree on the features to implement in xdem. |
Okay! Sounds great to have a discussion about moving forward next week. |
I've added a function
xdem.volume.hypsometric_binning()
which takes two 1D arrays and returns the following dataframe (on a glacier on Svalbard):The index is a
pd.IntervalIndex
instance, with nice convenience properties likedf.index.mid
for the midpoint of the intervals.What do you think of this structure?