-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
balance dataset #14
Comments
Balancing approaches are implemented in #22. Both @bw4sz and I have experimented with versions of this both with and without floors/ceilings and not seen any improvement (see also #21). This feels weird, but may be related to Focal Loss having already addressed the class imbalance to the degree possible. |
I want to keep coming back to this. It just feels too vital to let go. I've had no success outside of completely balanced data, but i think the sampling process is leading to a ton of inter-run variability and overall just feels like we are denying the model atleast some reasonable prevalence information. Especially when we use site-level metadata, it feels like the overfitting argument here is passed, we are already providing site-specific info. |
One of the challenges of this research program is that each decision seems to cascade and effect others. I certainly tested the sampler when we added into species classification. Now, on first try, I cannot find any difference between balanced and unbalanced ('raw') sampling. This either means that the sampler is not working as intended, or that some other innovation in the mean time has rendered it irrelevant. I will continue to follow this. |
Only by the smallest amount does balancing with a ceiling win now. I find the code confusing though, needs more thought.
Basically by undersampling the top class, we oversample the bottom. Which is strange because replacement = True is default. |
try balancing without any floor or ceiling resampling value
The text was updated successfully, but these errors were encountered: