Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitch predict_xr #1270

Merged
merged 6 commits into from
Oct 9, 2024
Merged

Mitch predict_xr #1270

merged 6 commits into from
Oct 9, 2024

Conversation

mitchest
Copy link
Collaborator

Proposed changes

Updating the predict_xr() function to allow returning the full array of prediciton probabilities our of predict_proba(), which is actually the default behaviour normally

Closes issues (optional)

  • Closes Issue #000

Checklist

(Replace [ ] with [x] to check off)

  • [ x] Notebook created using the DEA-notebooks template
  • [x ] Remove any unused Python packages from Load packages
  • [ x] Remove any unused/empty code cells
  • [x ] Remove any guidance cells (e.g. General advice)
  • [x ] Ensure that all code cells follow the PEP8 standard for code. The jupyterlab_code_formatter tool can be used to format code cells to a consistent style: select each code cell, then click Edit and then one of the Apply X Formatter options (YAPF or Black are recommended).
  • [x ] Include relevant tags in the final notebook cell (refer to the DEA Tags Index, and re-use tags if possible)
  • [x ] Clear all outputs, run notebook from start to finish, and save the notebook in the state where all cells have been sequentially evaluated
  • Test notebook on both the NCI and DEA Sandbox (flag if not working as part of PR and ask for help to solve if needed)
  • If applicable, update the Notebook currently compatible with the NCI|DEA Sandbox environment only line below the notebook title to reflect the environments the notebook is compatible with
  • [x ] Check for any spelling mistakes using the DEA Sandbox's built-in spellchecker (double click on markdown cells then right-click on pink highlighted words). For example:

sandbox_spellchecker

@robbibt
Copy link
Member

robbibt commented Sep 30, 2024

Hey @mitchest, this looks great to me - do you have a screenshot of the outputs of proba_max=True and proba_max=False that you could share here just to clarify what results you get using each option?

@mitchest
Copy link
Collaborator Author

mitchest commented Sep 30, 2024

Hey @mitchest, this looks great to me - do you have a screenshot of the outputs of proba_max=True and proba_max=False that you could share here just to clarify what results you get using each option?

Yer - no worries

Here's a SS from just the outputs (plot 1/3 is proba_max=False, plot 2 is proba_max=True)
image

Here's a SS actually using the coastal binary layer to mask another product
image

@robbibt robbibt merged commit 9710bb5 into develop Oct 9, 2024
1 check passed
@robbibt robbibt deleted the mitch-predictxr branch October 9, 2024 02:29
robbibt added a commit that referenced this pull request Oct 9, 2024
* Fix deprecation warnings and speed up code

* Change last modified date

* Intertidal exposure (#1261)

* New files for intertidal exposure notebook

* minor editing to Load packages cells

* Updated discord and removed slack links

* Improved markdown linking to image and gif in Introduction

* Incorporated SS and MA reviews

* Updated to include RBT reviews

* Minor pip install and notebook naming edits. Add notebook to README

* adding renamed exposure notebook back into PR

* adding global SAR access through microsoft planetary compute (#1263)

* adding global SAR access through microsoft planetary compute

* Make minor spelling and formatting amendments.

* small changes for PR

---------

Co-authored-by: geoscience-aman <96451725+geoscience-aman@users.noreply.github.com>

* Update USAGE.rst (#1268)

Add Swinburne course for 2024

* Minor compatibility change for tide modelling package (#1269)

* Mitch predict_xr (#1270)

* add probability array output to predict_xr

* predict_xr at proba_max args

* predict_xr match arg names

* xr_predict deal with multiband prob outout

* xr_predict merge output probs

* clean up comments and spacing

* Update USAGE.rst (#1272)

Add new reference, Burton et al 2024 Enhancing long-term vegetation monitoring in Australia: a new approach for harmonising the Advanced Very High Resolution Radiometer normalised-difference vegetation (NVDI) with MODIS NDVI

* Fix broken code on `unstable` Sandbox image (#1274)

* Updates for pyTMD

* Fix contours bug due to groupby squeeze

* Try loosening pyTMD requirements

* Update tests to pass on both stable and unstable sandbox

* Fix pansharpening bug

---------

Co-authored-by: Aman Chopra <aman.chopra@ga.gov.au>
Co-authored-by: geoscience-aman <96451725+geoscience-aman@users.noreply.github.com>
Co-authored-by: ClaireP <claire.phillips@ga.gov.au>
Co-authored-by: Alex Bradley <55119000+abradley60@users.noreply.github.com>
Co-authored-by: Bex Dunn <BexDunn@users.noreply.github.com>
Co-authored-by: Mitchell Lyons <mitchell.lyons@gmail.com>
@jessjaco
Copy link

jessjaco commented Dec 7, 2024

I noticed reduced performance after this was merged. After reviewing the code it looks like these two lines no longer have an effect, and that clean=True is perhaps broken. I think they need to be moved back up where they were.

if clean == True:
out_proba = da.where(da.isfinite(out_proba), out_proba, 0)

@robbibt
Copy link
Member

robbibt commented Dec 10, 2024

@jessjaco By performance, do you mean speed is slower, or that the results are lower quality? Any extra info (e.g. screenshots or timings) would be awesome - we'll have a look!

Are you running your code with max_proba=True and clean=True?

@jessjaco
Copy link

I was noticing performance issues, and I'm not positive why. But the results also appear to be incorrect. Please see below.

from dea_tools.classification import predict_xr
import numpy as np
import odc.geo.xr

from sklearn.tree import DecisionTreeClassifier
import xarray as xr

# Dummy data & model
X = np.random.rand(1000, 1)
B = 3
e = np.random.rand(1000, 1)

y = ((X * B + e) // 1).astype(int) + 1

model = DecisionTreeClassifier()
model.fit(X, y)

size_1d = 1_000
X_new = np.random.rand(size_1d, size_1d)
X_new_ds = (
    xr.DataArray(
        X_new,
        dims=("x", "y"),
        coords={"x": np.arange(size_1d), "y": np.arange(size_1d)},
    )
    .odc.assign_crs(26912)
    .to_dataset(name="data")
)

prediction = predict_xr(model, X_new_ds, chunk_size=10, proba=True)
print((prediction.Probabilities == 0).any())
# False

X_new_ds_masked = X_new_ds.where(X_new_ds < 0.9)
masked_prediction = predict_xr(
    model, X_new_ds_masked, chunk_size=10, proba=True, clean=True
)

print(X_new_ds_masked.data.isnull().any())
# True and should be

print((masked_prediction.Probabilities == 0).any())
# False, but should be true, since the nans should be converted to zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants