Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mars.learn.ensemble.{BlockwiseVotingClassifier, BlockwiseVotingRegressor} #2390

Merged
merged 2 commits into from
Aug 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/source/reference/learn/reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,22 @@ Matrix Decomposition
decomposition.PCA
decomposition.TruncatedSVD

.. _ensemble_ref:

Ensemble Methods
================

.. automodule:: mars.learn.metrics
:no-members:
:no-inherited-members:

.. currentmodule:: mars.learn

.. autosummary::
:toctree: generated/

ensemble.BlockwiseVotingClassifier
ensemble.BlockwiseVotingRegressor

.. _metrics_ref:

Expand All @@ -98,6 +114,15 @@ Classification metrics
metrics.auc
metrics.roc_curve

Regression metrics
------------------

.. autosummary::
:toctree: generated/

metrics.r2_score


Pairwise metrics
----------------

Expand Down
2 changes: 1 addition & 1 deletion mars/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import os
from typing import NamedTuple, Optional

version_info = (0, 8, 0, 'a2')
version_info = (0, 8, 0, 'a3')
_num_index = max(idx if isinstance(v, int) else 0
for idx, v in enumerate(version_info))
__version__ = '.'.join(map(str, version_info[:_num_index + 1])) + \
Expand Down
4 changes: 4 additions & 0 deletions mars/dataframe/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1303,7 +1303,9 @@ def between(self, left, right, inclusive="both"):
--------
>>> import mars.dataframe as md
>>> s = md.Series([2, 0, 4, 8, np.nan])

Boundary values are included by default:

>>> s.between(1, 4).execute()
0 True
1 False
Expand All @@ -1313,6 +1315,7 @@ def between(self, left, right, inclusive="both"):
dtype: bool

With `inclusive` set to ``"neither"`` boundary values are excluded:

>>> s.between(1, 4, inclusive="neither").execute()
0 True
1 False
Expand All @@ -1322,6 +1325,7 @@ def between(self, left, right, inclusive="both"):
dtype: bool

`left` and `right` can be any scalar value:

>>> s = md.Series(['Alice', 'Bob', 'Carol', 'Eve'])
>>> s.between('Anna', 'Daniel').execute()
0 False
Expand Down
56 changes: 55 additions & 1 deletion mars/learn/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def score(self, X, y, sample_weight=None, session=None, run_kwargs=None):

Returns
-------
score : float
score : Tensor
Mean accuracy of self.predict(X) wrt. y.
"""
from .metrics import accuracy_score
Expand All @@ -52,6 +52,60 @@ def score(self, X, y, sample_weight=None, session=None, run_kwargs=None):
return result


class RegressorMixin:
"""Mixin class for all regression estimators in scikit-learn."""
_estimator_type = "regressor"

def score(self, X, y, sample_weight=None):
"""Return the coefficient of determination :math:`R^2` of the
prediction.

The coefficient :math:`R^2` is defined as :math:`(1 - \\frac{u}{v})`,
where :math:`u` is the residual sum of squares ``((y_true - y_pred)
** 2).sum()`` and :math:`v` is the total sum of squares ``((y_true -
y_true.mean()) ** 2).sum()``. The best possible score is 1.0 and it
can be negative (because the model can be arbitrarily worse). A
constant model that always predicts the expected value of `y`,
disregarding the input features, would get a :math:`R^2` score of
0.0.

Parameters
----------
X : array-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
``(n_samples, n_samples_fitted)``, where ``n_samples_fitted``
is the number of samples used in the fitting for the estimator.

y : array-like of shape (n_samples,) or (n_samples, n_outputs)
True values for `X`.

sample_weight : array-like of shape (n_samples,), default=None
Sample weights.

Returns
-------
score : Tensor
:math:`R^2` of ``self.predict(X)`` wrt. `y`.

Notes
-----
The :math:`R^2` score used when calling ``score`` on a regressor uses
``multioutput='uniform_average'`` from version 0.23 to keep consistent
with default value of :func:`~sklearn.metrics.r2_score`.
This influences the ``score`` method of all the multioutput
regressors (except for
:class:`~sklearn.multioutput.MultiOutputRegressor`).
"""

from .metrics import r2_score
y_pred = self.predict(X)
return r2_score(y, y_pred, sample_weight=sample_weight)

def _more_tags(self): # noqa: R0201 # pylint: disable=no-self-use
return {'requires_y': True}


class BaseEstimator(SklearnBaseEstimator):
def _validate_data(self, X, y=None, reset=True,
validate_separately=False, **check_params):
Expand Down
16 changes: 16 additions & 0 deletions mars/learn/ensemble/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Copyright 1999-2021 Alibaba Group Holding Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from ._blockwise import BlockwiseVotingClassifier, BlockwiseVotingRegressor
Loading