Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definitive fix of negative uncertainties from linear extrapolation #446

Merged
merged 7 commits into from
Nov 16, 2023

Conversation

MatteaE
Copy link
Contributor

@MatteaE MatteaE commented Nov 9, 2023

EDIT by @rhugonnet:

  • Also adds support for using plot of 1d/2d binning with a pd.Dataframe containing pd.Interval written as strings.

@rhugonnet
Copy link
Contributor

Amazing, thanks @MatteaE!

I had to trigger the test workflows (because it's your first time contributing, then it does it automatically): All tests passing 😄! (except a Windows one that fails due to different random sampling between OSs from a recent PR, but I'll fix that separately)

We should add a check related to your PR in test_interp_nd_binning_artificial_data: https://github.com/GlacioHack/xdem/blob/main/tests/test_spatialstats.py#L148.
Do you think you can reproduce this issue with some artificial data there, which would fail with current xDEM and pass with your PR?

@MatteaE
Copy link
Contributor Author

MatteaE commented Nov 13, 2023

Cool, thanks a lot @rhugonnet!
I've tried to reproduce the issue with artificial data, but so far I haven't managed to (I find the result of the N-D binning somewhat hard to predict, and the problem with interpolation is quite an edge case). Can the test include some real data as an alternative?

@rhugonnet
Copy link
Contributor

Mmhh.. introducing a new dataset through xdem-data is quite a bit of work.

Maybe a solution in this case would be to simply copy the binning dataframe you had with your real data that triggered the issue in interp_nd_binning directly. This should only be a couple lines of real values to write in the code (and could potentially help us craft an artificial example if we understand the pattern that triggered the issue).

@MatteaE
Copy link
Contributor Author

MatteaE commented Nov 14, 2023

Ok good point, here is the binning data frame which triggers the error - let me know if it is enough!

df_negative_err.csv

@rhugonnet
Copy link
Contributor

Thanks! I also need the locations where it gets negative, I couldn't find it by searching around:

import xdem
import pandas as pd

fn_csv = "/home/atom/downloads/df_negative_err.csv"
df = pd.read_csv(fn_csv)

fn = xdem.spatialstats.interp_nd_binning(df, list_var_names=["slope", "maxc"])

print(fn((0, 0)))
print(fn((48, 2)))
print(fn((100, 50)))
print(fn((100, 0)))
print(fn((0, 50)))
2.0653602539062508
10.163194042968751
5468.51238649639
8.765966447451888
8.574538916015399

@MatteaE
Copy link
Contributor Author

MatteaE commented Nov 15, 2023

Here it is, sorted from most to least negative: df_neg_sorted.csv
Numbers are computed with

fn = xdem.spatialstats.interp_nd_binning(df, list_var_names=["slope", "maxc"],  statistic="nmad", min_count=30)
zscores, dh_err_fun = xdem.spatialstats.two_step_standardization(
    dh_arr, list_var=[slope_arr, maxc_arr], unscaled_error_fun=unscaled_dh_err_fun
)
dh_err = dh.copy(new_array=dh_err_fun((slope.data, maxc.data)))

(Probably a different value of min_count produces a different dh_err_fun and the locations would also change)

@rhugonnet
Copy link
Contributor

Using your data I managed to reproduce the error in an artificial data example, and added it to the tests. Was also the occasion to look at the behaviour in 2/3D, and the fix looks durable. Thanks a lot!
Will merge as soon as tests finish running and pass.

Copy link
Contributor

@rhugonnet rhugonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All passing, merging! Thanks again @MatteaE

@rhugonnet rhugonnet merged commit a72b910 into GlacioHack:main Nov 16, 2023
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Negative values predicted by interp_nd_binning problematic for error analysis
2 participants