Mitch predict_xr #1270

mitchest · 2024-09-30T03:57:02Z

Proposed changes

Updating the predict_xr() function to allow returning the full array of prediciton probabilities our of predict_proba(), which is actually the default behaviour normally

Closes issues (optional)

Closes Issue #000

Checklist

(Replace [ ] with [x] to check off)

[ x] Notebook created using the DEA-notebooks template
[x ] Remove any unused Python packages from Load packages
[ x] Remove any unused/empty code cells
[x ] Remove any guidance cells (e.g. General advice)
[x ] Ensure that all code cells follow the PEP8 standard for code. The jupyterlab_code_formatter tool can be used to format code cells to a consistent style: select each code cell, then click Edit and then one of the Apply X Formatter options (YAPF or Black are recommended).
[x ] Include relevant tags in the final notebook cell (refer to the DEA Tags Index, and re-use tags if possible)
[x ] Clear all outputs, run notebook from start to finish, and save the notebook in the state where all cells have been sequentially evaluated
Test notebook on both the NCI and DEA Sandbox (flag if not working as part of PR and ask for help to solve if needed)
If applicable, update the Notebook currently compatible with the NCI|DEA Sandbox environment only line below the notebook title to reflect the environments the notebook is compatible with
[x ] Check for any spelling mistakes using the DEA Sandbox's built-in spellchecker (double click on markdown cells then right-click on pink highlighted words). For example:

robbibt · 2024-09-30T04:05:03Z

Hey @mitchest, this looks great to me - do you have a screenshot of the outputs of proba_max=True and proba_max=False that you could share here just to clarify what results you get using each option?

mitchest · 2024-09-30T04:07:38Z

Hey @mitchest, this looks great to me - do you have a screenshot of the outputs of proba_max=True and proba_max=False that you could share here just to clarify what results you get using each option?

Yer - no worries

Here's a SS from just the outputs (plot 1/3 is proba_max=False, plot 2 is proba_max=True)

Here's a SS actually using the coastal binary layer to mask another product

* Fix deprecation warnings and speed up code * Change last modified date * Intertidal exposure (#1261) * New files for intertidal exposure notebook * minor editing to Load packages cells * Updated discord and removed slack links * Improved markdown linking to image and gif in Introduction * Incorporated SS and MA reviews * Updated to include RBT reviews * Minor pip install and notebook naming edits. Add notebook to README * adding renamed exposure notebook back into PR * adding global SAR access through microsoft planetary compute (#1263) * adding global SAR access through microsoft planetary compute * Make minor spelling and formatting amendments. * small changes for PR --------- Co-authored-by: geoscience-aman <96451725+geoscience-aman@users.noreply.github.com> * Update USAGE.rst (#1268) Add Swinburne course for 2024 * Minor compatibility change for tide modelling package (#1269) * Mitch predict_xr (#1270) * add probability array output to predict_xr * predict_xr at proba_max args * predict_xr match arg names * xr_predict deal with multiband prob outout * xr_predict merge output probs * clean up comments and spacing * Update USAGE.rst (#1272) Add new reference, Burton et al 2024 Enhancing long-term vegetation monitoring in Australia: a new approach for harmonising the Advanced Very High Resolution Radiometer normalised-difference vegetation (NVDI) with MODIS NDVI * Fix broken code on `unstable` Sandbox image (#1274) * Updates for pyTMD * Fix contours bug due to groupby squeeze * Try loosening pyTMD requirements * Update tests to pass on both stable and unstable sandbox * Fix pansharpening bug --------- Co-authored-by: Aman Chopra <aman.chopra@ga.gov.au> Co-authored-by: geoscience-aman <96451725+geoscience-aman@users.noreply.github.com> Co-authored-by: ClaireP <claire.phillips@ga.gov.au> Co-authored-by: Alex Bradley <55119000+abradley60@users.noreply.github.com> Co-authored-by: Bex Dunn <BexDunn@users.noreply.github.com> Co-authored-by: Mitchell Lyons <mitchell.lyons@gmail.com>

jessjaco · 2024-12-07T00:09:23Z

I noticed reduced performance after this was merged. After reviewing the code it looks like these two lines no longer have an effect, and that clean=True is perhaps broken. I think they need to be moved back up where they were.

dea-notebooks/Tools/dea_tools/classification.py

Lines 365 to 366 in a6e937f

    
           if clean == True: 
        
               out_proba = da.where(da.isfinite(out_proba), out_proba, 0)

robbibt · 2024-12-10T04:31:44Z

@jessjaco By performance, do you mean speed is slower, or that the results are lower quality? Any extra info (e.g. screenshots or timings) would be awesome - we'll have a look!

Are you running your code with max_proba=True and clean=True?

jessjaco · 2024-12-10T18:40:36Z

I was noticing performance issues, and I'm not positive why. But the results also appear to be incorrect. Please see below.

from dea_tools.classification import predict_xr
import numpy as np
import odc.geo.xr

from sklearn.tree import DecisionTreeClassifier
import xarray as xr

# Dummy data & model
X = np.random.rand(1000, 1)
B = 3
e = np.random.rand(1000, 1)

y = ((X * B + e) // 1).astype(int) + 1

model = DecisionTreeClassifier()
model.fit(X, y)

size_1d = 1_000
X_new = np.random.rand(size_1d, size_1d)
X_new_ds = (
    xr.DataArray(
        X_new,
        dims=("x", "y"),
        coords={"x": np.arange(size_1d), "y": np.arange(size_1d)},
    )
    .odc.assign_crs(26912)
    .to_dataset(name="data")
)

prediction = predict_xr(model, X_new_ds, chunk_size=10, proba=True)
print((prediction.Probabilities == 0).any())
# False

X_new_ds_masked = X_new_ds.where(X_new_ds < 0.9)
masked_prediction = predict_xr(
    model, X_new_ds_masked, chunk_size=10, proba=True, clean=True
)

print(X_new_ds_masked.data.isnull().any())
# True and should be

print((masked_prediction.Probabilities == 0).any())
# False, but should be true, since the nans should be converted to zero

mitchest added 5 commits September 30, 2024 10:38

add probability array output to predict_xr

72bb55f

predict_xr at proba_max args

507f53b

predict_xr match arg names

9a00722

xr_predict deal with multiband prob outout

8d11aa6

xr_predict merge output probs

c50ff73

mitchest requested review from vnewey, robbibt and erialC-P September 30, 2024 03:57

mitchest requested review from BexDunn, uchchwhash, Kooie-cate, geoscience-aman, JM-GA, margaretharrison, Ariana-B, amanda2099, supermarkion and erin-telfer as code owners September 30, 2024 03:57

clean up comments and spacing

a6e937f

robbibt approved these changes Oct 9, 2024

View reviewed changes

robbibt merged commit 9710bb5 into develop Oct 9, 2024
1 check passed

robbibt deleted the mitch-predictxr branch October 9, 2024 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitch predict_xr #1270

Mitch predict_xr #1270

mitchest commented Sep 30, 2024

robbibt commented Sep 30, 2024

mitchest commented Sep 30, 2024 •

edited

Loading

jessjaco commented Dec 7, 2024

robbibt commented Dec 10, 2024 •

edited

Loading

jessjaco commented Dec 10, 2024

Mitch predict_xr #1270

Mitch predict_xr #1270

Conversation

mitchest commented Sep 30, 2024

Proposed changes

Closes issues (optional)

Checklist

robbibt commented Sep 30, 2024

mitchest commented Sep 30, 2024 • edited Loading

jessjaco commented Dec 7, 2024

robbibt commented Dec 10, 2024 • edited Loading

jessjaco commented Dec 10, 2024

mitchest commented Sep 30, 2024 •

edited

Loading

robbibt commented Dec 10, 2024 •

edited

Loading