AMFRegressor #1166

kenzabenjelloun · 2023-01-22T17:53:33Z

Hi ! I am working on the AMF Regressor. This is what has been done so far, I tried to keep the same structure as the classifier. There is still a bug somewhere that I can't seem to find.. Thank you for your help !

- Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"

@getter

- Removing the "__repr__" method of AMF - Removing the @Setter and @getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end

- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

- Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter

- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

- Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability

- Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy) - Fixing assignment issue to the parent during upward procedure - Fixing type assignment to the root branch of the tree - Fixing arg-type for list of intensities - Fixing arg-type issue with current samples proceeding - Fixing dirichlet arg-type issue - Fixing some typing issues - Removing call-overload as int in the memories features range list - Correcting output of predict function

suggestions and style issues fix

…instead of raising an exception

fix remaining tests and remove duplicated method call

- Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

MaxHalford · 2023-01-22T18:09:40Z

Hey there, great work! I hope you still managed to find time to enjoy your weekend :)

I'm guessing Saulo will be ok reviewing this PR, although I know he's a bit busy at the moment.

One first comment: since we merged Alexandre's PR, we reorganized stuff and create a new forest module. Your fork is missing these changes. At the root of the directory, you can run make rebase to pull the changes in the main branch. You might have some conflicts to solve, which you will have to do manually. Let me know if this isn't clear. This is actually a great way to learn about git conflicts if this is your first time.

smastelini · 2023-01-23T14:33:00Z

Hi @kenzabenjelloun, thanks for the PR!

As @MaxHalford mentioned, there are some conflicts to solve. Once everything is in place, I will start reviewing the code :D

river/ensemble/__init__.py

smastelini

Hi @kenzabenjelloun, thanks for bringing the AMF Regressor! I can't wait to put this model to the test :)

I have left some comments in your code, as well as some questions for further clarification.

Some time ago, you mentioned there was a bug that affected performance. Were you able to track this down?

river/ensemble/aggregated_mondrian_forest.py

river/forest/aggregated_mondrian_forest.py

smastelini · 2023-01-26T23:01:27Z

river/forest/aggregated_mondrian_forest.py

+    split_pure
+        Controls if nodes that contains only sample of the same class should be
+        split ("pure" nodes). Default is `False`, namely pure nodes are not split,
+        but `True` can be sometimes better.


I am not sure this applies to regression, since there are no classes.

smastelini · 2023-01-26T23:03:40Z

river/forest/aggregated_mondrian_forest.py

+    Note
+    ----
+    All the parameters of ``AMFRegressor`` become **read-only** after the first call
+    to ``partial_fit``.


I'm also not sure this applies in this case.

river/forest/aggregated_mondrian_forest.py

smastelini · 2023-01-26T23:13:20Z

river/tree/mondrian/mondrian_tree_nodes.py

+        super().__init__(*args, **kwargs)
+
+        self.n_samples = 0
+        self.mean = 0.0


You can use an instance of stats.Mean() instead, to avoid hardcoding the mean update.

river/tree/mondrian/mondrian_tree_regressor.py

river/tree/mondrian/mondrian_tree_nodes.py

smastelini · 2023-02-05T12:46:19Z

Hi @kenzabenjelloun, was there any progress in finding the performance bug?

I just fixed a conflict with the main branch and resolved some comments you already addressed.

smastelini · 2023-07-03T14:48:32Z

Hi @kenzabenjelloun and @MaxHalford. I've changed the target branch of this PR to a new branch. If you agree, I will merge the PR and take on the remaining work.

kenzabenjelloun · 2023-07-03T17:11:37Z

hi @smastelini ! yes, perfect thank you

@getter

* AMF Classifier & Mondrian Tree Classifier implementation * [Pull request Update] - Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py" * [Pull Request] - Removing the "__repr__" method of AMF - Removing the @Setter and @getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end * Updating docstring * [Pull request] - Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels * [Fix] Reability Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] math package implementation usage Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter * - Adding support for random state (seed) - Replacing Overflow from infinity to maximum possible float (so it makes computations still possible) * [Ignoring testing environment] * Fixing style & typos Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability * Pre-commit clean up * Pre-commit clean up * [MyPy issue] - Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy) - Fixing assignment issue to the parent during upward procedure - Fixing type assignment to the root branch of the tree - Fixing arg-type for list of intensities - Fixing arg-type issue with current samples proceeding - Fixing dirichlet arg-type issue - Fixing some typing issues - Removing call-overload as int in the memories features range list - Correcting output of predict function * Fixing MyPy issues (detyping) * suggestions and style issues fix * addingnecessary files, classes and methods for regressor * minor import modifications * minor list to typing.List and dict to typing.Dict modifs * minor modifs to pass tests * minor changes * changing names * Fixing predict function to support the "model not trained" situation instead of raising an exception * more style suggestions * testing * regressor fix * fixing docstring * [Pull request Update] - Fixing some TODOs from Mastelini suggestions - Factorizing a bit of code from nodes that should be shared with regressor - Removing branch structure as of now for future changes * Removing all "array-like" structure for full dict support * Pre-commit hookups fixes * regressor fix * Delete tests.py * [Pull request] - Adding suggestions from Mastelini on keys usage - Removing useless initialization of scores in the MondrianTreeClassifier * bug fix * fix conflicts * refactored, but has bugs * remove mypy skip * tests * tests * cleanup * better, but not fixed * minor fix * [Fixes] - Fixing scoring bug (no propagation of counts) - Removing unused parameters in docs - Replacing type union of Python 3.10 in 3.9 annotations - Adding little description for MondrianBranch * Pre-commit hookups fixes * fix some tests * Reworking intensities * fix remaining tests and remove duplicated method call * [Pull request] - Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes * Hiding MondrianTree from users visibility * Fixing import on Mondrian Tree example Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * tests * merge fix * merge fix * docstring fixes --------- Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu> Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com> Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu> Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>

@getter

* AMFRegressor (#1166) * AMF Classifier & Mondrian Tree Classifier implementation * [Pull request Update] - Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py" * [Pull Request] - Removing the "__repr__" method of AMF - Removing the @Setter and @getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end * Updating docstring * [Pull request] - Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels * [Fix] Reability Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] Language Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Fix] math package implementation usage Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter * - Adding support for random state (seed) - Replacing Overflow from infinity to maximum possible float (so it makes computations still possible) * [Ignoring testing environment] * Fixing style & typos Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * [Pull request] - Fixing import order in __init__ file of ensemble - Using LaTeX formulation in AMFClassifier description - Making all nodes related methods private (it shouldn't be used outside) - Docstring syntax update and fixes - Importing river.base instead of typing module for better readability - Adding a short description to the MondrianTreeClassifier - Renaming MondrianTreeLeaf into MondrianLeaf - Reordering functions in MondrianTreeClassifier for better readability * Pre-commit clean up * Pre-commit clean up * [MyPy issue] - Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy) - Fixing assignment issue to the parent during upward procedure - Fixing type assignment to the root branch of the tree - Fixing arg-type for list of intensities - Fixing arg-type issue with current samples proceeding - Fixing dirichlet arg-type issue - Fixing some typing issues - Removing call-overload as int in the memories features range list - Correcting output of predict function * Fixing MyPy issues (detyping) * suggestions and style issues fix * addingnecessary files, classes and methods for regressor * minor import modifications * minor list to typing.List and dict to typing.Dict modifs * minor modifs to pass tests * minor changes * changing names * Fixing predict function to support the "model not trained" situation instead of raising an exception * more style suggestions * testing * regressor fix * fixing docstring * [Pull request Update] - Fixing some TODOs from Mastelini suggestions - Factorizing a bit of code from nodes that should be shared with regressor - Removing branch structure as of now for future changes * Removing all "array-like" structure for full dict support * Pre-commit hookups fixes * regressor fix * Delete tests.py * [Pull request] - Adding suggestions from Mastelini on keys usage - Removing useless initialization of scores in the MondrianTreeClassifier * bug fix * fix conflicts * refactored, but has bugs * remove mypy skip * tests * tests * cleanup * better, but not fixed * minor fix * [Fixes] - Fixing scoring bug (no propagation of counts) - Removing unused parameters in docs - Replacing type union of Python 3.10 in 3.9 annotations - Adding little description for MondrianBranch * Pre-commit hookups fixes * fix some tests * Reworking intensities * fix remaining tests and remove duplicated method call * [Pull request] - Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes * Hiding MondrianTree from users visibility * Fixing import on Mondrian Tree example Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> * tests * merge fix * merge fix * docstring fixes --------- Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu> Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com> Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br> Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu> Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com> * fix exponential sampling * lint * mypy * AMF Reg * documentation * expose exponential sampling * fix mypy * release notes * Update docs/releases/unreleased.md Co-authored-by: Max Halford <maxhalford25@gmail.com> * Update river/forest/aggregated_mondrian_forest.py Co-authored-by: Max Halford <maxhalford25@gmail.com> * Update river/utils/random.py Co-authored-by: Max Halford <maxhalford25@gmail.com> * fix docstrings * add missing import --------- Co-authored-by: kenzabenjelloun <74252706+kenzabenjelloun@users.noreply.github.com> Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu> Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com> Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu> Co-authored-by: Max Halford <maxhalford25@gmail.com>

AlexandreChaussard and others added 30 commits December 30, 2022 16:51

AMF Classifier & Mondrian Tree Classifier implementation

79566eb

[Pull request Update]

80371f6

- Adding a "mondrian" folder in the "tree" folder for better file structure - Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"

[Pull Request]

9b91b9d

- Removing the "__repr__" method of AMF - Removing the @Setter and @getter - Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end

Updating docstring

01af4a2

[Pull request]

c5fe718

- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input - `predict_proba_one` outputs a dictionary of scores with matching labels

[Fix] Reability

545ffaa

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

[Fix] Language

6a93dea

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

[Fix] Language

c0466c5

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

[Fix] math package implementation usage

ecdfd2c

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

[Pull request]

ad27aae

- Leaving `__all__` in alphabetical order for the classifiers - Removing type parameters in the description of `log_2_sum` of math utils - Replacing java-like getters and setters by python-like properties and setter

Merge branch 'main' into main

5bccff8

- Adding support for random state (seed)

5cff1e3

- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)

[Ignoring testing environment]

ff2e8f8

Fixing style & typos

f64894c

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

Merge branch 'main' of https://github.com/AlexandreChaussard/river

0956cb8

Pre-commit clean up

614da35

Pre-commit clean up

f98b578

Fixing MyPy issues (detyping)

0e00a65

suggestions and style issues fix

d8eac6a

Suggestions by @smastelini

e8258b3

suggestions and style issues fix

addingnecessary files, classes and methods for regressor

d394624

minor import modifications

0fe0d49

minor list to typing.List and dict to typing.Dict modifs

155d3ea

minor modifs to pass tests

36e655b

minor changes

6a59bb2

changing names

33bb2a2

Fixing predict function to support the "model not trained" situation …

99d16e8

…instead of raising an exception

Merging suggestions

10c7ae0

AlexandreChaussard and others added 12 commits January 15, 2023 22:22

Fixing some PyTests

dcf8dda

Reworking intensities

1f83869

fix remaining tests and remove duplicated method call

55366d8

Fixing feature shuffle issue (ordering in features)

02a1396

fix remaining tests and remove duplicated method call

Merge branch 'online-ml:main' into main

e7f506c

[Pull request]

bc33142

- Adding examples for AMF & Mondrian Tree Classifiers - Reordering __init__ in alphabetical order - Cleaning the comments - Adding string representation for nodes

Hiding MondrianTree from users visibility

83f96c9

Fixing import on Mondrian Tree example

b7457fc

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

tests

1327d82

tests

84f234b

update according to main

86db9b9

removing tests file

0043a71

kenzabenjelloun requested review from MaxHalford and smastelini as code owners January 22, 2023 17:53

format change according to latest classifier update in river

e662c3b

MaxHalford reviewed Jan 24, 2023

View reviewed changes

river/ensemble/__init__.py Outdated Show resolved Hide resolved

Kenza Ben jelloun added 2 commits January 24, 2023 18:13

merge fix

3ebd4fb

merge fix

b64ef0e

smastelini requested changes Jan 26, 2023

View reviewed changes

Kenza Ben jelloun and others added 2 commits January 29, 2023 19:51

docstring fixes

ac568bf

Merge branch 'main' into regressor

0125d42

smastelini changed the base branch from main to amf-reg July 3, 2023 14:47

smastelini merged commit c16f393 into online-ml:amf-reg Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMFRegressor #1166

AMFRegressor #1166

kenzabenjelloun commented Jan 22, 2023

MaxHalford commented Jan 22, 2023

smastelini commented Jan 23, 2023

smastelini left a comment

smastelini Jan 26, 2023

smastelini Jan 26, 2023

smastelini Jan 26, 2023

smastelini commented Feb 5, 2023

smastelini commented Jul 3, 2023

kenzabenjelloun commented Jul 3, 2023

AMFRegressor #1166

AMFRegressor #1166

Conversation

kenzabenjelloun commented Jan 22, 2023

MaxHalford commented Jan 22, 2023

smastelini commented Jan 23, 2023

smastelini left a comment

Choose a reason for hiding this comment

smastelini Jan 26, 2023

Choose a reason for hiding this comment

smastelini Jan 26, 2023

Choose a reason for hiding this comment

smastelini Jan 26, 2023

Choose a reason for hiding this comment

smastelini commented Feb 5, 2023

smastelini commented Jul 3, 2023

kenzabenjelloun commented Jul 3, 2023