Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openEO random_forest_classifier validation/inference fails in CDSE #562

Closed
Pratichhya opened this issue Oct 30, 2023 · 7 comments · May be fixed by Open-EO/openeo-geopyspark-integrationtests#16
Assignees
Labels
bug CDSE Copernicus Data Space Ecosystem

Comments

@Pratichhya
Copy link

When testing https://open-eo.github.io/openeo-python-client/machine_learning.html, we can train the model and download the model weights too. But, when trying to validate/infer from the trained model, it fails with the following error:

"OpenEO batch job failed: OpenEOApiException(status_code=400, code='Internal', message='No random forest model found for job j-231029b6164049ba92c3eada7603f037', id='no-request')"

However, this is only the case when executing openeo.dataspace.copernicus.eu but works well when tested in openeo.cloud or openeo.vito.be

@soxofaan soxofaan transferred this issue from Open-EO/openeo-python-client Nov 2, 2023
@soxofaan soxofaan added bug CDSE Copernicus Data Space Ecosystem labels Nov 2, 2023
@jdries
Copy link
Contributor

jdries commented Feb 29, 2024

FYI we now have mounted the object storage bucket as a posix directory in the containers.
Hence models can be saved and read from there, just like on YARN, so this should start working again with limited remaining effort.
Do note that the new bucket functionality may still be feature flagged, to be checked with @tcassaert .

@tcassaert
Copy link
Contributor

There's a PR (#684) for this. It's not yet in use.

@EmileSonneveld
Copy link
Contributor

Hey, I re-launched the job of j-231029b6164049ba92c3eada7603f037 on https://openeo.dataspace.copernicus.eu/ and got a result. (My job ID was j-240308cce2b542f691fdab4b0c088f1b)
As I understand @tcassaert his fix did indeed work.

@tcassaert
Copy link
Contributor

This is not on prod yet, so something else made it work then.

@EmileSonneveld
Copy link
Contributor

CDSE staging and prod assign a different href to the model asset. The s3:// version works best.
MicrosoftTeams-image

Both backends have all files:
dev: /OpenEO-data/batch_jobs/j-240327428e7f48269cbf04840296cb9f
staging: /OpenEO-data/batch_jobs/j-240322a516184973a25ae51c55005416
MicrosoftTeams-image (2)
MicrosoftTeams-image (1)

@JeroenVerstraelen
Copy link
Contributor

Currently waiting on https://github.com/eu-cdse/openeo-cdse-infra/issues/115 to deploy the required mounts on CDSE prod. Then this issue should be finished.

@JeroenVerstraelen
Copy link
Contributor

JeroenVerstraelen commented May 6, 2024

@Pratichhya You can use the following code for training and inference.

Training:
https://gist.github.com/JeroenVerstraelen/ddaf818b5ae71f0fca2dfc68271cec1b

Inference:
https://gist.github.com/JeroenVerstraelen/b26d89cf9c84e2dc4435fb47802ebaed

The training extent + polygons need to be adjusted a bit because it currently outputs an empty randomforest model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug CDSE Copernicus Data Space Ecosystem
Projects
None yet
6 participants