-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange classification when using rf_to_strings trees in EE RF? #528
Comments
Hi @matthiasdemuzere 👋 Thanks for the feedback! Glad you are trying this capability out and sorry it is not giving what is expected. Seems like there is some internal dropping of labels going on when creating the label values for the trees. As you noted in your notebook there are the class values We can get the class labels directly from the rf classifier (i.e. I am working to update based on your example and should have it completed soon. |
Hey hey @KMarkert, Thanks for following up. This classification problem indeed has 17 classes, yet not all labels are always present. For another region of interest, the distribution could be very different. So it would be nice indeed to maintain the actual class labels; doing this via a look-up table is probably not a bad idea. Thanks for looking into this, I can always test this out once done. |
This seems to be fixed with 200b39c. Here is the notebook that I tried testing on: https://colab.research.google.com/drive/1WkqJ9mSY9Al4sRmJLdi2Ab4puafIe6BQ?usp=sharing @matthiasdemuzere if you would be so kind as to test on your end and confirm that this bugfix is providing the correct labels. Once confirmed, we can close this issue and submit a PR to merge the bugfix into master. |
Hey @KMarkert, I have tried the notebook, and there is seems to work as expected now. Yet I also tried with another example, adding more input features (33 instead of 6), a larger training sample, and more RF trees (well, I tried both 10 and 30 - the latter being the default I normally use). At least when using 30 trees, the class numbers seem very off again in the classified image. Should I update the notebook with the latter case, so that you can have another look? |
Yes, please share the notebook that is not producing the expected results. I can look into the issue with scaling to more trees/features. The bug fix is pulling the labels directly from the sklearn classifier object so I am really curious as to why (or how) the output labels are not correct... |
Ok, so I prepared a notebook using my extended data (more input features, more training labels, more trees). See here. Interestingly, this result looks fine, yet different then the result obtained when running with my local python install? As this is a conda environment, I thought there was maybe a conflict when pip installing the geemap fix. So I reinstalled this, yet I still get wrong labels in the classified image (eg. water is 12 instead of 17). I checked whether the python version influences the result, by making a conda python 3.7 install. But also with this I get a wrong result when executing the locally ... So I finally checked my local So I just copied your ml.py routine in my install and used that. That seems to work? So long story short: your bug fix seems to work (great!), yet I seem unable to install this bug fix properly via pip? |
Glad to hear the bug fix was able to produce the expected results! It seems like challenge now is installing the package locally from a specific git branch. Without having a look at your environment, my first guess is that there is a locally cached geemap v0.8.17 that pip is installing vs downloading the specific branch you are pointing to. This sometimes happens when there isn't a version change for new code (see here), but this is just speculation and could be any number of things 🤷♂️. I imagine once we merge the bug fix to the master branch and release a new version, this install issue will be resolved with updating the package locally. So, I will submit the PR to get the bug fix merged with the master branch. If this bug persists after a new release and update on your end then please feel free to reopen. Thanks for finding this bug and helping work through it! |
Great, thanks for following up on this. Once it is in the official repo, I'll continue testing this. |
@matthiasdemuzere Run geemap.update_package() once to install the package development version. |
Environment Information
Description
I am very enthusiastic about this new ml package, and the ability to train a random forest locally, and upload the trees into EE.
I gave it a spin with a reduced set of my own data (a 17 class classification problem). Yet for some reason, the classified classes come out all wrong, with the classified image even showing class numbers that are not in the training data.
I am not sure where the problem lies ... as I only use a reduced dataset, with limited trees, I don't think my decision tree strings are already too large, as indicated by @KMarkert? So I get the feeling something goes wrong when parsing the trees to strings with
rf_to_strings
?What I Did
I followed the tutorial notebook, and just plugged in my data where needed:
The text was updated successfully, but these errors were encountered: