Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMFRegressor #1166

Merged
merged 77 commits into from
Jul 3, 2023
Merged

AMFRegressor #1166

merged 77 commits into from
Jul 3, 2023

Conversation

kenzabenjelloun
Copy link
Contributor

Hi ! I am working on the AMF Regressor. This is what has been done so far, I tried to keep the same structure as the classifier. There is still a bug somewhere that I can't seem to find.. Thank you for your help !

AlexandreChaussard and others added 30 commits December 30, 2022 16:51
- Adding a "mondrian" folder in the "tree" folder for better file structure
- Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"
- Removing the "__repr__" method of AMF
- Removing the @Setter and @getter
- Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end
- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input
- `predict_proba_one` outputs a dictionary of scores with matching labels
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
- Leaving `__all__` in alphabetical order for the classifiers
- Removing type parameters in the description of `log_2_sum` of math utils
- Replacing java-like getters and setters by python-like properties and setter
- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
- Fixing import order in __init__ file of ensemble
- Using LaTeX formulation in AMFClassifier description
- Making all nodes related methods private (it shouldn't be used outside)
- Docstring syntax update and fixes
- Importing river.base instead of typing module for better readability
- Adding a short description to the MondrianTreeClassifier
- Renaming MondrianTreeLeaf into MondrianLeaf
- Reordering functions in MondrianTreeClassifier for better readability
- Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy)
- Fixing assignment issue to the parent during upward procedure
- Fixing type assignment to the root branch of the tree
- Fixing arg-type for list of intensities
- Fixing arg-type issue with current samples proceeding
- Fixing dirichlet arg-type issue
- Fixing some typing issues
- Removing call-overload as int in the memories features range list
- Correcting output of predict function
suggestions and style issues fix
AlexandreChaussard and others added 12 commits January 15, 2023 22:22
fix remaining tests and remove duplicated method call
- Adding examples for AMF & Mondrian Tree Classifiers
- Reordering __init__ in alphabetical order
- Cleaning the comments
- Adding string representation for nodes
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
@MaxHalford
Copy link
Member

Hey there, great work! I hope you still managed to find time to enjoy your weekend :)

I'm guessing Saulo will be ok reviewing this PR, although I know he's a bit busy at the moment.

One first comment: since we merged Alexandre's PR, we reorganized stuff and create a new forest module. Your fork is missing these changes. At the root of the directory, you can run make rebase to pull the changes in the main branch. You might have some conflicts to solve, which you will have to do manually. Let me know if this isn't clear. This is actually a great way to learn about git conflicts if this is your first time.

@smastelini
Copy link
Member

Hi @kenzabenjelloun, thanks for the PR!

As @MaxHalford mentioned, there are some conflicts to solve. Once everything is in place, I will start reviewing the code :D

Kenza Ben jelloun added 2 commits January 24, 2023 18:13
Copy link
Member

@smastelini smastelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kenzabenjelloun, thanks for bringing the AMF Regressor! I can't wait to put this model to the test :)

I have left some comments in your code, as well as some questions for further clarification.

Some time ago, you mentioned there was a bug that affected performance. Were you able to track this down?

river/ensemble/aggregated_mondrian_forest.py Outdated Show resolved Hide resolved
river/forest/aggregated_mondrian_forest.py Outdated Show resolved Hide resolved
Comment on lines +259 to +262
split_pure
Controls if nodes that contains only sample of the same class should be
split ("pure" nodes). Default is `False`, namely pure nodes are not split,
but `True` can be sometimes better.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this applies to regression, since there are no classes.

Comment on lines +266 to +269
Note
----
All the parameters of ``AMFRegressor`` become **read-only** after the first call
to ``partial_fit``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure this applies in this case.

river/forest/aggregated_mondrian_forest.py Outdated Show resolved Hide resolved
super().__init__(*args, **kwargs)

self.n_samples = 0
self.mean = 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use an instance of stats.Mean() instead, to avoid hardcoding the mean update.

river/tree/mondrian/mondrian_tree_regressor.py Outdated Show resolved Hide resolved
river/tree/mondrian/mondrian_tree_nodes.py Show resolved Hide resolved
@smastelini
Copy link
Member

Hi @kenzabenjelloun, was there any progress in finding the performance bug?

I just fixed a conflict with the main branch and resolved some comments you already addressed.

@smastelini smastelini changed the base branch from main to amf-reg July 3, 2023 14:47
@smastelini
Copy link
Member

Hi @kenzabenjelloun and @MaxHalford. I've changed the target branch of this PR to a new branch. If you agree, I will merge the PR and take on the remaining work.

@kenzabenjelloun
Copy link
Contributor Author

hi @smastelini ! yes, perfect thank you

@smastelini smastelini merged commit c16f393 into online-ml:amf-reg Jul 3, 2023
smastelini added a commit that referenced this pull request Jul 6, 2023
* AMF Classifier & Mondrian Tree Classifier implementation

* [Pull request Update]
- Adding a "mondrian" folder in the "tree" folder for better file structure
- Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"

* [Pull Request]
- Removing the "__repr__" method of AMF
- Removing the @Setter and @getter
- Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end

* Updating docstring

* [Pull request]
- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input
- `predict_proba_one` outputs a dictionary of scores with matching labels

* [Fix] Reability

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] Language

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] Language

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] math package implementation usage

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Pull request]
- Leaving `__all__` in alphabetical order for the classifiers
- Removing type parameters in the description of `log_2_sum` of math utils
- Replacing java-like getters and setters by python-like properties and setter

* - Adding support for random state (seed)
- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)

* [Ignoring testing environment]

* Fixing style & typos

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Pull request]
- Fixing import order in __init__ file of ensemble
- Using LaTeX formulation in AMFClassifier description
- Making all nodes related methods private (it shouldn't be used outside)
- Docstring syntax update and fixes
- Importing river.base instead of typing module for better readability
- Adding a short description to the MondrianTreeClassifier
- Renaming MondrianTreeLeaf into MondrianLeaf
- Reordering functions in MondrianTreeClassifier for better readability

* Pre-commit clean up

* Pre-commit clean up

* [MyPy issue]
- Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy)
- Fixing assignment issue to the parent during upward procedure
- Fixing type assignment to the root branch of the tree
- Fixing arg-type for list of intensities
- Fixing arg-type issue with current samples proceeding
- Fixing dirichlet arg-type issue
- Fixing some typing issues
- Removing call-overload as int in the memories features range list
- Correcting output of predict function

* Fixing MyPy issues (detyping)

* suggestions and style issues fix

* addingnecessary files, classes and methods for regressor

* minor import modifications

* minor list to typing.List and dict to typing.Dict modifs

* minor modifs to pass tests

* minor changes

* changing names

* Fixing predict function to support the "model not trained" situation instead of raising an exception

* more style suggestions

* testing

* regressor fix

* fixing docstring

* [Pull request Update]
- Fixing some TODOs from Mastelini suggestions
- Factorizing a bit of code from nodes that should be shared with regressor
- Removing branch structure as of now for future changes

* Removing all "array-like" structure for full dict support

* Pre-commit hookups fixes

* regressor fix

* Delete tests.py

* [Pull request]
- Adding suggestions from Mastelini on keys usage
- Removing useless initialization of scores in the MondrianTreeClassifier

* bug fix

* fix conflicts

* refactored, but has bugs

* remove mypy skip

* tests

* tests

* cleanup

* better, but not fixed

* minor fix

* [Fixes]
- Fixing scoring bug (no propagation of counts)
- Removing unused parameters in docs
- Replacing type union of Python 3.10 in 3.9 annotations
- Adding little description for MondrianBranch

* Pre-commit hookups fixes

* fix some tests

* Reworking intensities

* fix remaining tests and remove duplicated method call

* [Pull request]
- Adding examples for AMF & Mondrian Tree Classifiers
- Reordering __init__ in alphabetical order
- Cleaning the comments
- Adding string representation for nodes

* Hiding MondrianTree from users visibility

* Fixing import on Mondrian Tree example

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* tests

* merge fix

* merge fix

* docstring fixes

---------

Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu>
Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu>
Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>
smastelini added a commit that referenced this pull request Jul 11, 2023
* AMFRegressor (#1166)

* AMF Classifier & Mondrian Tree Classifier implementation

* [Pull request Update]
- Adding a "mondrian" folder in the "tree" folder for better file structure
- Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"

* [Pull Request]
- Removing the "__repr__" method of AMF
- Removing the @Setter and @getter
- Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end

* Updating docstring

* [Pull request]
- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input
- `predict_proba_one` outputs a dictionary of scores with matching labels

* [Fix] Reability

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] Language

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] Language

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] math package implementation usage

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Pull request]
- Leaving `__all__` in alphabetical order for the classifiers
- Removing type parameters in the description of `log_2_sum` of math utils
- Replacing java-like getters and setters by python-like properties and setter

* - Adding support for random state (seed)
- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)

* [Ignoring testing environment]

* Fixing style & typos

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Pull request]
- Fixing import order in __init__ file of ensemble
- Using LaTeX formulation in AMFClassifier description
- Making all nodes related methods private (it shouldn't be used outside)
- Docstring syntax update and fixes
- Importing river.base instead of typing module for better readability
- Adding a short description to the MondrianTreeClassifier
- Renaming MondrianTreeLeaf into MondrianLeaf
- Reordering functions in MondrianTreeClassifier for better readability

* Pre-commit clean up

* Pre-commit clean up

* [MyPy issue]
- Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy)
- Fixing assignment issue to the parent during upward procedure
- Fixing type assignment to the root branch of the tree
- Fixing arg-type for list of intensities
- Fixing arg-type issue with current samples proceeding
- Fixing dirichlet arg-type issue
- Fixing some typing issues
- Removing call-overload as int in the memories features range list
- Correcting output of predict function

* Fixing MyPy issues (detyping)

* suggestions and style issues fix

* addingnecessary files, classes and methods for regressor

* minor import modifications

* minor list to typing.List and dict to typing.Dict modifs

* minor modifs to pass tests

* minor changes

* changing names

* Fixing predict function to support the "model not trained" situation instead of raising an exception

* more style suggestions

* testing

* regressor fix

* fixing docstring

* [Pull request Update]
- Fixing some TODOs from Mastelini suggestions
- Factorizing a bit of code from nodes that should be shared with regressor
- Removing branch structure as of now for future changes

* Removing all "array-like" structure for full dict support

* Pre-commit hookups fixes

* regressor fix

* Delete tests.py

* [Pull request]
- Adding suggestions from Mastelini on keys usage
- Removing useless initialization of scores in the MondrianTreeClassifier

* bug fix

* fix conflicts

* refactored, but has bugs

* remove mypy skip

* tests

* tests

* cleanup

* better, but not fixed

* minor fix

* [Fixes]
- Fixing scoring bug (no propagation of counts)
- Removing unused parameters in docs
- Replacing type union of Python 3.10 in 3.9 annotations
- Adding little description for MondrianBranch

* Pre-commit hookups fixes

* fix some tests

* Reworking intensities

* fix remaining tests and remove duplicated method call

* [Pull request]
- Adding examples for AMF & Mondrian Tree Classifiers
- Reordering __init__ in alphabetical order
- Cleaning the comments
- Adding string representation for nodes

* Hiding MondrianTree from users visibility

* Fixing import on Mondrian Tree example

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* tests

* merge fix

* merge fix

* docstring fixes

---------

Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu>
Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu>
Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>

* fix exponential sampling

* lint

* mypy

* AMF Reg

* documentation

* expose exponential sampling

* fix mypy

* release notes

* Update docs/releases/unreleased.md

Co-authored-by: Max Halford <maxhalford25@gmail.com>

* Update river/forest/aggregated_mondrian_forest.py

Co-authored-by: Max Halford <maxhalford25@gmail.com>

* Update river/utils/random.py

Co-authored-by: Max Halford <maxhalford25@gmail.com>

* fix docstrings

* add missing import

---------

Co-authored-by: kenzabenjelloun <74252706+kenzabenjelloun@users.noreply.github.com>
Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu>
Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com>
Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu>
Co-authored-by: Max Halford <maxhalford25@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants