-
-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMFRegressor #1166
AMFRegressor #1166
Conversation
- Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"
- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
- Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter
- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
- Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability
- Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy) - Fixing assignment issue to the parent during upward procedure - Fixing type assignment to the root branch of the tree - Fixing arg-type for list of intensities - Fixing arg-type issue with current samples proceeding - Fixing dirichlet arg-type issue - Fixing some typing issues - Removing call-overload as int in the memories features range list - Correcting output of predict function
suggestions and style issues fix
…instead of raising an exception
fix remaining tests and remove duplicated method call
- Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Hey there, great work! I hope you still managed to find time to enjoy your weekend :) I'm guessing Saulo will be ok reviewing this PR, although I know he's a bit busy at the moment. One first comment: since we merged Alexandre's PR, we reorganized stuff and create a new |
Hi @kenzabenjelloun, thanks for the PR! As @MaxHalford mentioned, there are some conflicts to solve. Once everything is in place, I will start reviewing the code :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kenzabenjelloun, thanks for bringing the AMF Regressor! I can't wait to put this model to the test :)
I have left some comments in your code, as well as some questions for further clarification.
Some time ago, you mentioned there was a bug that affected performance. Were you able to track this down?
split_pure | ||
Controls if nodes that contains only sample of the same class should be | ||
split ("pure" nodes). Default is `False`, namely pure nodes are not split, | ||
but `True` can be sometimes better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this applies to regression, since there are no classes.
Note | ||
---- | ||
All the parameters of ``AMFRegressor`` become **read-only** after the first call | ||
to ``partial_fit``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure this applies in this case.
super().__init__(*args, **kwargs) | ||
|
||
self.n_samples = 0 | ||
self.mean = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use an instance of stats.Mean()
instead, to avoid hardcoding the mean update.
Hi @kenzabenjelloun, was there any progress in finding the performance bug? I just fixed a conflict with the main branch and resolved some comments you already addressed. |
Hi @kenzabenjelloun and @MaxHalford. I've changed the target branch of this PR to a new branch. If you agree, I will merge the PR and take on the remaining work. |
hi @smastelini ! yes, perfect thank you |
* AMF Classifier & Mondrian Tree Classifier implementation * [Pull request Update] - Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py" * [Pull Request] - Removing the "__repr__" method of AMF - Removing the @Setter and @getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end * Updating docstring * [Pull request] - Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels * [Fix] Reability Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] math package implementation usage Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter * - Adding support for random state (seed) - Replacing Overflow from infinity to maximum possible float (so it makes computations still possible) * [Ignoring testing environment] * Fixing style & typos Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability * Pre-commit clean up * Pre-commit clean up * [MyPy issue] - Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy) - Fixing assignment issue to the parent during upward procedure - Fixing type assignment to the root branch of the tree - Fixing arg-type for list of intensities - Fixing arg-type issue with current samples proceeding - Fixing dirichlet arg-type issue - Fixing some typing issues - Removing call-overload as int in the memories features range list - Correcting output of predict function * Fixing MyPy issues (detyping) * suggestions and style issues fix * addingnecessary files, classes and methods for regressor * minor import modifications * minor list to typing.List and dict to typing.Dict modifs * minor modifs to pass tests * minor changes * changing names * Fixing predict function to support the "model not trained" situation instead of raising an exception * more style suggestions * testing * regressor fix * fixing docstring * [Pull request Update] - Fixing some TODOs from Mastelini suggestions - Factorizing a bit of code from nodes that should be shared with regressor - Removing branch structure as of now for future changes * Removing all "array-like" structure for full dict support * Pre-commit hookups fixes * regressor fix * Delete tests.py * [Pull request] - Adding suggestions from Mastelini on keys usage - Removing useless initialization of scores in the MondrianTreeClassifier * bug fix * fix conflicts * refactored, but has bugs * remove mypy skip * tests * tests * cleanup * better, but not fixed * minor fix * [Fixes] - Fixing scoring bug (no propagation of counts) - Removing unused parameters in docs - Replacing type union of Python 3.10 in 3.9 annotations - Adding little description for MondrianBranch * Pre-commit hookups fixes * fix some tests * Reworking intensities * fix remaining tests and remove duplicated method call * [Pull request] - Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes * Hiding MondrianTree from users visibility * Fixing import on Mondrian Tree example Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * tests * merge fix * merge fix * docstring fixes --------- Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu> Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com> Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu> Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>
* AMFRegressor (#1166) * AMF Classifier & Mondrian Tree Classifier implementation * [Pull request Update] - Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py" * [Pull Request] - Removing the "__repr__" method of AMF - Removing the @Setter and @getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end * Updating docstring * [Pull request] - Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels * [Fix] Reability Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] math package implementation usage Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter * - Adding support for random state (seed) - Replacing Overflow from infinity to maximum possible float (so it makes computations still possible) * [Ignoring testing environment] * Fixing style & typos Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability * Pre-commit clean up * Pre-commit clean up * [MyPy issue] - Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy) - Fixing assignment issue to the parent during upward procedure - Fixing type assignment to the root branch of the tree - Fixing arg-type for list of intensities - Fixing arg-type issue with current samples proceeding - Fixing dirichlet arg-type issue - Fixing some typing issues - Removing call-overload as int in the memories features range list - Correcting output of predict function * Fixing MyPy issues (detyping) * suggestions and style issues fix * addingnecessary files, classes and methods for regressor * minor import modifications * minor list to typing.List and dict to typing.Dict modifs * minor modifs to pass tests * minor changes * changing names * Fixing predict function to support the "model not trained" situation instead of raising an exception * more style suggestions * testing * regressor fix * fixing docstring * [Pull request Update] - Fixing some TODOs from Mastelini suggestions - Factorizing a bit of code from nodes that should be shared with regressor - Removing branch structure as of now for future changes * Removing all "array-like" structure for full dict support * Pre-commit hookups fixes * regressor fix * Delete tests.py * [Pull request] - Adding suggestions from Mastelini on keys usage - Removing useless initialization of scores in the MondrianTreeClassifier * bug fix * fix conflicts * refactored, but has bugs * remove mypy skip * tests * tests * cleanup * better, but not fixed * minor fix * [Fixes] - Fixing scoring bug (no propagation of counts) - Removing unused parameters in docs - Replacing type union of Python 3.10 in 3.9 annotations - Adding little description for MondrianBranch * Pre-commit hookups fixes * fix some tests * Reworking intensities * fix remaining tests and remove duplicated method call * [Pull request] - Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes * Hiding MondrianTree from users visibility * Fixing import on Mondrian Tree example Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * tests * merge fix * merge fix * docstring fixes --------- Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu> Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com> Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu> Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com> * fix exponential sampling * lint * mypy * AMF Reg * documentation * expose exponential sampling * fix mypy * release notes * Update docs/releases/unreleased.md Co-authored-by: Max Halford <maxhalford25@gmail.com> * Update river/forest/aggregated_mondrian_forest.py Co-authored-by: Max Halford <maxhalford25@gmail.com> * Update river/utils/random.py Co-authored-by: Max Halford <maxhalford25@gmail.com> * fix docstrings * add missing import --------- Co-authored-by: kenzabenjelloun <74252706+kenzabenjelloun@users.noreply.github.com> Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu> Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com> Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu> Co-authored-by: Max Halford <maxhalford25@gmail.com>
Hi ! I am working on the AMF Regressor. This is what has been done so far, I tried to keep the same structure as the classifier. There is still a bug somewhere that I can't seem to find.. Thank you for your help !