Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Introduce BalancedBagging #2

Merged
merged 15 commits into from
Sep 27, 2023
Merged

✨ Introduce BalancedBagging #2

merged 15 commits into from
Sep 27, 2023

Conversation

EssamWisam
Copy link
Collaborator

Description
This PR adds BalancedModel which allows parallel composition of random undersampling models followed by a probabilistic classification model. In other words, it's the EasyEnsemble algorithm but for a generic probabilistic classifier rather than Adaboost.

@ablaom could you please review the additions when you get the chance.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

- '1.8'
- 'nightly'
os:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 1 and not 1.0 to get the latest stable release. And include 1.6 as this is the Long Term Stable release.

Copy link
Collaborator Author

@EssamWisam EssamWisam Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom, for some reason Imbalance had julia = "1.8" when we released it. I changed it to 1.6 but we won't be able to test on 1.6 so long as Imbalance is a dependency or the next release isn't released yet. For now will keep 1.8 and 1 until Imbalance's next release.

Project.toml Outdated Show resolved Hide resolved
Project.toml Outdated Show resolved Hide resolved
"""
function group_inds(y::AbstractVector{T}) where {T}
result = LittleDict{T,AbstractVector{Int}}()
freeze(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freeze is not mutating. So this line achieves nothing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean the first line is not mutating? so freeze achieves nothing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean that you do not use the output of line 8 (you do not assign it to any variable used later) so this line might as well not exist. My impression was you thought that freeze(result) mutated result but it does not. It just returns a new immutable dictionary with faster look-up and smaller allocation than result.

src/BalancedBagging.jl Outdated Show resolved Hide resolved
src/BalancedBagging.jl Outdated Show resolved Hide resolved
src/BalancedBagging.jl Outdated Show resolved Hide resolved
src/BalancedBagging.jl Outdated Show resolved Hide resolved
src/BalancedBagging.jl Outdated Show resolved Hide resolved
end

"""
$(MMI.doc_header(BalancedBaggingClassifier))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to explicitly require specification of a hyperparameter, because it has not default (in this case model),then you cannot use the doc_header shortcut, because it says that the zero-argument constructor works: https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/#MLJModelInterface.doc_header

So you will have to customise this part. Ditto the other wrapper.

Copy link
Collaborator Author

@EssamWisam EssamWisam Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved for this wrapper. The other wrapper doesn't have an MLJ-compliant docstring, only the quick start in the README and basic documentation. Would you like me to add one for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, wrappers don't need to have compliant doc-strings.

src/BalancedBagging.jl Outdated Show resolved Hide resolved
- `X`: any table of input features (e.g., a `DataFrame`) so long as elements in each column
are subtypes of either the `Finite` or `Infinite` scientific types.

- `y`: the binary target, which can be any `AbstractVector` where `length(unique(y)) == 2`
Copy link
Member

@ablaom ablaom Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't see this needs to be binary, but again this depends on the atomic classifier being wrapped.

Copy link
Collaborator Author

@EssamWisam EssamWisam Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom I throw an error when y is not binary. The resampling scheme proposed by the paper assumes binary data.

In this paper, we examine only binary classification problems.

It's not trivial to generalize but I may look into that later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Noted, thanks.

src/BalancedBagging.jl Outdated Show resolved Hide resolved
Comment on lines 197 to 199
- `predict(mach, Xnew)`: return predictions of the target given
features `Xnew` having the same scitype as `X` above. Predictions
are probabilistic, but uncalibrated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `predict(mach, Xnew)`: return predictions of the target given
features `Xnew` having the same scitype as `X` above. Predictions
are probabilistic, but uncalibrated.
- `predict(mach, Xnew)`: return predictions of the target given
features `Xnew` having the same scitype as `X` above. Predictions
have the same form as that of the classifier, `model`

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom model has to be probabilistic. I thought we agreed in the meeting to write that they are uncalibrated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right!

Comment on lines 201 to 202
- `predict_mode(mach, Xnew)`: instead return the mode of each
prediction above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `predict_mode(mach, Xnew)`: instead return the mode of each
prediction above.
- `predict_mode(mach, Xnew)`: if `model` is a `Probabilistic` model, return the mode of each
prediction above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ablaom, I constraint that the model has to be probabilistic so I will ignore "if model is a Probabilistic model".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go it!

src/BalancedBagging.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/utils.jl Outdated Show resolved Hide resolved
Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

@EssamWisam
Copy link
Collaborator Author

Tests aren't failing, Github just reached it's limit with me for today.

@ablaom
Copy link
Member

ablaom commented Sep 24, 2023

Is this a daily limit, then? It seems we don't get a reset until October 1st, and this effects all JuliaAI private repos. I think you can go public to resolve early.

@EssamWisam
Copy link
Collaborator Author

Is this a daily limit, then? It seems we don't get a reset until October 1st, and this effects all JuliaAI private repos. I think you can go public to resolve early.

Good point. Let me make is public immediately.

@EssamWisam EssamWisam merged commit 84572f3 into main Sep 27, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants