Skip to content

Commit

Permalink
AMFRegressor (#1166)
Browse files Browse the repository at this point in the history
* AMF Classifier & Mondrian Tree Classifier implementation

* [Pull request Update]
- Adding a "mondrian" folder in the "tree" folder for better file structure
- Using "random.choices" instead of the "sample_discrete" functions in "utils.py", and removing "sample_discrete" from the "utils.py"

* [Pull Request]
- Removing the "__repr__" method of AMF
- Removing the @Setter and @getter
- Removing the "loss" parameter of the classifiers since only the "log-loss" is being used in the end

* Updating docstring

* [Pull request]
- Making `learn_one` and `predict_proba_one` accepting all kinds of supported labels for `y` as input
- `predict_proba_one` outputs a dictionary of scores with matching labels

* [Fix] Reability

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] Language

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] Language

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Fix] math package implementation usage

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Pull request]
- Leaving `__all__` in alphabetical order for the classifiers
- Removing type parameters in the description of `log_2_sum` of math utils
- Replacing java-like getters and setters by python-like properties and setter

* - Adding support for random state (seed)
- Replacing Overflow from infinity to maximum possible float (so it makes computations still possible)

* [Ignoring testing environment]

* Fixing style & typos

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* [Pull request]
- Fixing import order in __init__ file of ensemble
- Using LaTeX formulation in AMFClassifier description
- Making all nodes related methods private (it shouldn't be used outside)
- Docstring syntax update and fixes
- Importing river.base instead of typing module for better readability
- Adding a short description to the MondrianTreeClassifier
- Renaming MondrianTreeLeaf into MondrianLeaf
- Reordering functions in MondrianTreeClassifier for better readability

* Pre-commit clean up

* Pre-commit clean up

* [MyPy issue]
- Trying to fix the left-right issue uppercast (that shouldn't be a problem normally, but mypy keeps being unhappy)
- Fixing assignment issue to the parent during upward procedure
- Fixing type assignment to the root branch of the tree
- Fixing arg-type for list of intensities
- Fixing arg-type issue with current samples proceeding
- Fixing dirichlet arg-type issue
- Fixing some typing issues
- Removing call-overload as int in the memories features range list
- Correcting output of predict function

* Fixing MyPy issues (detyping)

* suggestions and style issues fix

* addingnecessary files, classes and methods for regressor

* minor import modifications

* minor list to typing.List and dict to typing.Dict modifs

* minor modifs to pass tests

* minor changes

* changing names

* Fixing predict function to support the "model not trained" situation instead of raising an exception

* more style suggestions

* testing

* regressor fix

* fixing docstring

* [Pull request Update]
- Fixing some TODOs from Mastelini suggestions
- Factorizing a bit of code from nodes that should be shared with regressor
- Removing branch structure as of now for future changes

* Removing all "array-like" structure for full dict support

* Pre-commit hookups fixes

* regressor fix

* Delete tests.py

* [Pull request]
- Adding suggestions from Mastelini on keys usage
- Removing useless initialization of scores in the MondrianTreeClassifier

* bug fix

* fix conflicts

* refactored, but has bugs

* remove mypy skip

* tests

* tests

* cleanup

* better, but not fixed

* minor fix

* [Fixes]
- Fixing scoring bug (no propagation of counts)
- Removing unused parameters in docs
- Replacing type union of Python 3.10 in 3.9 annotations
- Adding little description for MondrianBranch

* Pre-commit hookups fixes

* fix some tests

* Reworking intensities

* fix remaining tests and remove duplicated method call

* [Pull request]
- Adding examples for AMF & Mondrian Tree Classifiers
- Reordering __init__ in alphabetical order
- Cleaning the comments
- Adding string representation for nodes

* Hiding MondrianTree from users visibility

* Fixing import on Mondrian Tree example

Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>

* tests

* merge fix

* merge fix

* docstring fixes

---------

Co-authored-by: AlexandreChaussard <alexandre.chaussard@telecom-sudparis.eu>
Co-authored-by: Alexandre Chaussard <78101027+AlexandreChaussard@users.noreply.github.com>
Co-authored-by: Saulo Martiello Mastelini <mastelini@usp.br>
Co-authored-by: Kenza Ben jelloun <kenza.ben_jelloun@telecom-sudparis.eu>
Co-authored-by: Saulo Martiello Mastelini <saulomastelini@gmail.com>
  • Loading branch information
6 people committed Jul 6, 2023
1 parent 10a2028 commit 0386737
Show file tree
Hide file tree
Showing 5 changed files with 697 additions and 5 deletions.
113 changes: 112 additions & 1 deletion river/forest/aggregated_mondrian_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import random

from river import base
from river.tree.mondrian import MondrianTreeClassifier
from river.tree.mondrian import MondrianTreeClassifier, MondrianTreeRegressor


class AMFLearner(base.Ensemble, abc.ABC):
Expand Down Expand Up @@ -217,3 +217,114 @@ def predict_proba_one(self, x):
@property
def _multiclass(self):
return True


class AMFRegressor(AMFLearner, base.Regressor):
"""Aggregated Mondrian Forest regressor for online learning.
This algorithm is truly online, in the sense that a single pass is performed, and that
predictions can be produced anytime.
Each node in a tree predicts according to the average of the labels
it contains. The prediction for a sample is computed as the aggregated predictions
of all the subtrees along the path leading to the leaf node containing the sample.
The aggregation weights are exponential weights with learning rate ``step`` and loss
``loss`` when ``use_aggregation`` is ``True``.
This computation is performed exactly thanks to a context tree weighting algorithm.
More details can be found in the paper cited in references below.
The final predictions are the average of the predictions of each of the
``n_estimators`` trees in the forest.
Parameters
----------
n_estimators
The number of trees in the forest.
step
Step-size for the aggregation weights.
use_aggregation
Controls if aggregation is used in the trees. It is highly recommended to
leave it as `True`.
split_pure
Controls if nodes that contains only sample of the same class should be
split ("pure" nodes). Default is `False`, namely pure nodes are not split,
but `True` can be sometimes better.
seed
Random seed for reproducibility.
Note
----
All the parameters of ``AMFRegressor`` become **read-only** after the first call
to ``partial_fit``.
References
----------
[^1]: J. Mourtada, S. Gaiffas and E. Scornet, *AMF: Aggregated Mondrian Forests for Online Learning*, arXiv:1906.10529, 2019
"""

def __init__(
self,
n_estimators: int = 10,
step: float = 1.0,
use_aggregation: bool = True,
split_pure: bool = False,
seed: int = None,
):

super().__init__(
n_estimators=n_estimators,
step=step,
loss="least-squares",
use_aggregation=use_aggregation,
split_pure=split_pure,
seed=seed,
)

self.iteration = 0

def _initialize_trees(self):
"""Initialize the forest."""

self.data: list[MondrianTreeRegressor] = []
for _ in range(self.n_estimators):
# We don't want to have the same stochastic scheme for each tree, or it'll break the randomness
# Hence we introduce a new seed for each, that is derived of the given seed by a deterministic process
seed = self._rng.randint(0, 9999999)

tree = MondrianTreeRegressor(
self.step,
self.use_aggregation,
self.split_pure,
self.iteration,
seed,
)
self.data.append(tree)

def learn_one(self, x, y):
# Checking if the forest has been created
if not self.is_trained():
self._initialize_trees()

# we fit all the trees using the new sample
for tree in self:
tree.learn_one(x, y)

self.iteration += 1

return self

def predict_one(self, x):

# Checking that the model has been trained once at least
if not self.is_trained():
return None

prediction = 0
for tree in self:
tree.use_aggregation = self.use_aggregation
prediction += tree.predict_one(x)
prediction = prediction / self.n_estimators

return prediction
5 changes: 3 additions & 2 deletions river/tree/mondrian/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@
implementations for the Mondrian trees.
Note that this module is not exposed in the tree module, and is instead used by the
AMFClassifier class in the ensemble module.
AMFClassifier and AMFRegressor classes in the ensemble module.
"""
from __future__ import annotations

from .mondrian_tree import MondrianTree
from .mondrian_tree_classifier import MondrianTreeClassifier
from .mondrian_tree_regressor import MondrianTreeRegressor

__all__ = ["MondrianTree", "MondrianTreeClassifier"]
__all__ = ["MondrianTree", "MondrianTreeClassifier", "MondrianTreeRegressor"]
1 change: 1 addition & 0 deletions river/tree/mondrian/mondrian_tree_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,7 @@ def predict_proba_one(self, x):

# If the tree hasn't seen any sample, then it should return
# the default empty dict

if not self._is_initialized:
return {}

Expand Down
151 changes: 149 additions & 2 deletions river/tree/mondrian/mondrian_tree_nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,9 +288,9 @@ def update_downwards(
Parameters
----------
x
Sample to proceed (as a list).
Sample to proceed.
y
Class of the sample x_t.
Class of the sample x.
dirichlet
Dirichlet parameter of the tree.
use_aggregation
Expand Down Expand Up @@ -366,3 +366,150 @@ class MondrianBranchClassifier(MondrianNodeClassifier, MondrianBranch):

def __init__(self, parent, time, depth, feature, threshold, *children):
super().__init__(parent, time, depth, feature, threshold, *children)


class MondrianNodeRegressor(MondrianNode):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)

self.n_samples = 0
self.mean = 0.0

def replant(self, leaf: MondrianNodeRegressor, copy_all: bool = False):
"""Transfer information from a leaf to a new branch."""
self.weight = leaf.weight # type: ignore
self.log_weight_tree = leaf.log_weight_tree # type: ignore
self.mean = leaf.mean

if copy_all:
self.memory_range_min = leaf.memory_range_min
self.memory_range_max = leaf.memory_range_max
self.n_samples = leaf.n_samples

def predict(self) -> base.typing.RegTarget:
"""Return the prediction of the node."""
return self.mean

def loss(self, sample_value: base.typing.RegTarget) -> float:
"""Compute the loss of the node.
Parameters
----------
sample_value
A given value.
"""

r = self.predict() - sample_value
return r * r / 2

def update_weight(
self,
sample_value: base.typing.RegTarget,
use_aggregation: bool,
step: float,
) -> float:
"""Update the weight of the node given a label and the method used.
Parameters
----------
sample_value
Label of a given sample.
use_aggregation
Whether to use aggregation or not during computation (given by the tree).
step
Step parameter of the tree.
"""

loss_t = self.loss(sample_value)
if use_aggregation:
self.weight -= step * loss_t
return loss_t

def update_downwards(
self,
x,
sample_value: base.typing.RegTarget,
use_aggregation: bool,
step: float,
do_update_weight: bool,
):
"""Update the node when running a downward procedure updating the tree.
Parameters
----------
x
Sample to proceed (as a list).
sample_value
Label of the sample x.
use_aggregation
Should it use the aggregation or not
step
Step of the tree.
do_update_weight
Should we update the weights of the node as well.
"""

# Updating the range of the feature values known by the node
# If it is the first sample, we copy the features vector into the min and max range
if self.n_samples == 0:
for feature in x:
x_f = x[feature]
self.memory_range_min[feature] = x_f
self.memory_range_max[feature] = x_f
# Otherwise, we update the range
else:
for feature in x:
x_f = x[feature]
if x_f < self.memory_range_min[feature]:
self.memory_range_min[feature] = x_f
if x_f > self.memory_range_max[feature]:
self.memory_range_max[feature] = x_f

# One more sample in the node
self.n_samples += 1

if do_update_weight:
self.update_weight(sample_value, use_aggregation, step)

# Update the mean of the labels in the node online
self.mean = (self.n_samples * self.mean + sample_value) / (self.n_samples + 1)


class MondrianLeafRegressor(MondrianNodeRegressor, MondrianLeaf):
"""Mondrian Tree Regressor leaf node.
Parameters
----------
parent
Parent node.
time
Split time of the node.
depth
The depth of the leaf.
"""

def __init__(self, parent, time, depth):
super().__init__(parent, time, depth)


class MondrianBranchRegressor(MondrianNodeRegressor, MondrianBranch):
"""Mondrian Tree Regressor branch node.
Parameters
----------
parent
Parent node of the branch.
time
Split time characterizing the branch.
depth
Depth of the branch in the tree.
feature
Feature of the branch.
threshold
Acceptation threshold of the branch.
*children
Children nodes of the branch.
"""

def __init__(self, parent, time, depth, feature, threshold, *children):
super().__init__(parent, time, depth, feature, threshold, *children)
Loading

0 comments on commit 0386737

Please sign in to comment.