Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

493 add mondrian cp #504

Merged
merged 86 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from 74 commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
45a65c1
ADD: initia Mondrian class
vincentblot28 Jul 25, 2024
c7c209a
ENH: add docstring to class
vincentblot28 Jul 25, 2024
0ead65c
ADD: typing docstring and linting
vincentblot28 Aug 1, 2024
3a6fa2d
TST: first test for mondrian
vincentblot28 Aug 1, 2024
103ace5
FIX: define not allowed method insteand of allowed
vincentblot28 Aug 1, 2024
258b2d1
TST: test for bad cv and mapie estimator
vincentblot28 Aug 1, 2024
ecd452b
FIX: use model predict instead of mapie prediciton in predict
vincentblot28 Aug 2, 2024
5e06b31
TST: bad groups, predict_proba, alpha none
vincentblot28 Aug 2, 2024
f9687cf
TST: check groups can be lists
vincentblot28 Aug 2, 2024
7763f5b
FIX: linting
vincentblot28 Aug 2, 2024
b764605
TST: same reuslts as classical if only one group
vincentblot28 Aug 2, 2024
d5015ad
FIX: typing
vincentblot28 Aug 2, 2024
7577ffc
ADD: docstring to tests
vincentblot28 Aug 5, 2024
ec47e49
FIX: linting
vincentblot28 Aug 5, 2024
2dbb7c0
FIX: checks for NCS were not working
vincentblot28 Aug 5, 2024
06fb35e
FIX: topk name anddistinction between task for valid estimators
vincentblot28 Aug 5, 2024
9c41479
FIX: replace isinstance by type to avoid confusion with child class
vincentblot28 Aug 5, 2024
986e2c1
FIX: indent in test in docstring
vincentblot28 Aug 5, 2024
d39af29
FIX: typing
vincentblot28 Aug 5, 2024
098230e
UPD: update history.rst
vincentblot28 Aug 5, 2024
70c351b
Merge branch 'master' into 493-add-mondrian-cp
vincentblot28 Aug 5, 2024
ca74087
FIX: typing
vincentblot28 Aug 5, 2024
32eb959
Merge branch '493-add-mondrian-cp' of github.com:scikit-learn-contrib…
vincentblot28 Aug 5, 2024
dd1fe50
FIX: typing
vincentblot28 Aug 5, 2024
44f7476
ADD: documentation
vincentblot28 Aug 6, 2024
aaa7f32
DOC: fix latex and add figure to mondrian
vincentblot28 Aug 6, 2024
56ea922
FIX: change image name
vincentblot28 Aug 6, 2024
1e2ccb5
ENH: rewrite quantile in italic
vincentblot28 Aug 6, 2024
c0532e4
FIX: typo in docstring
vincentblot28 Aug 8, 2024
53fb8b2
ENH: put public emthods at the begining of the file
vincentblot28 Aug 8, 2024
791d750
ENH: add in docstring that groups must be integers
vincentblot28 Aug 8, 2024
325c2a9
ENH remove MapieCalibrator
vincentblot28 Aug 9, 2024
ad8faab
ENH: remove MapieMultilabelClassifier
vincentblot28 Aug 9, 2024
f9b79e2
UPD: test with calibration and multilabel as wrong methods
vincentblot28 Aug 9, 2024
0651841
NEH: change kwargs to predcit_params and fit_params
vincentblot28 Aug 9, 2024
3b26142
ENH: rename Mondrian to MondrianCP
vincentblot28 Aug 9, 2024
48ebe09
UPD: class docstring with constraints
vincentblot28 Aug 9, 2024
dc5a371
FIX: Call MondrianCP in docstring test
vincentblot28 Aug 9, 2024
6646a07
ENH: add single method for cehck group length
vincentblot28 Aug 9, 2024
7ecd6a8
ENH: define output shape outside of the loop
vincentblot28 Aug 9, 2024
884c341
FIX: typing for n classes
vincentblot28 Aug 9, 2024
e32c8a0
ENH rename _check_mapie_classifier in _check_cv
vincentblot28 Aug 9, 2024
05d74a6
ENH: move check_alpha at begninning of predict
vincentblot28 Aug 9, 2024
518f78b
FIX: definiiton of n_classes
vincentblot28 Aug 9, 2024
cc48cb1
ENH remove old tests
vincentblot28 Aug 9, 2024
96e3358
FIX: coveage with frong fit_params in fit_params in tests
vincentblot28 Aug 9, 2024
d0842bb
Update mapie/tests/test_mondrian.py
vincentblot28 Aug 19, 2024
85fe875
Update mapie/tests/test_mondrian.py
vincentblot28 Aug 19, 2024
6f4b06c
Update mapie/tests/test_mondrian.py
vincentblot28 Aug 19, 2024
8f44c33
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
f4a0a45
Update doc/theoretical_description_mondrian.rst
vincentblot28 Aug 19, 2024
2ac857e
Update doc/theoretical_description_mondrian.rst
vincentblot28 Aug 19, 2024
94415c1
Update HISTORY.rst
vincentblot28 Aug 19, 2024
381a8ec
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
2aa9728
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
e58300b
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
791abba
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
999eb25
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
9c85ecb
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
c0646db
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
b4d5dd8
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
b9a9ca7
Update mapie/mondrian.py
vincentblot28 Aug 19, 2024
6330872
FIX: linting and docstring
vincentblot28 Aug 19, 2024
b4b9934
STY: skip lines in fit definition
vincentblot28 Aug 19, 2024
0e65abc
STY: docstring style
vincentblot28 Aug 19, 2024
5844fbe
ENH: test test_same_results_if_only_one_group for multiple values of …
vincentblot28 Aug 20, 2024
ccc1e2d
FIX: minor typo
vincentblot28 Aug 20, 2024
4b51a0a
ADD: mondrian to API.rst
vincentblot28 Aug 20, 2024
70f6f34
DOC: add tutorial notebook
vincentblot28 Aug 20, 2024
147142d
ADD: mondrian tutorial to index.rst
vincentblot28 Aug 20, 2024
8773dfc
UPD: odc
vincentblot28 Aug 21, 2024
aa47dae
UPD: doc
vincentblot28 Aug 21, 2024
cc6e39c
ADD: readme file for mondrian
vincentblot28 Aug 21, 2024
2acb98d
ADD: readme
vincentblot28 Aug 21, 2024
5ba97d6
UPD: use copy model to prefit
Sep 2, 2024
d7e88c5
FIX: lint problem with group at None
Sep 2, 2024
5979dcb
ENH: check group lenght in check fit params
vincentblot28 Sep 2, 2024
d6aa546
ENH: rename groups into partition
vincentblot28 Sep 2, 2024
2fa047d
ENH: rename groups into partition in tests
vincentblot28 Sep 2, 2024
9b85022
FIX: test in Mondrian docstring
vincentblot28 Sep 2, 2024
36b03c9
FIX: alpha value in docstring test
vincentblot28 Sep 2, 2024
ed374e6
DEL: delete unused notebook
vincentblot28 Sep 2, 2024
7672b2e
TST: test that estimator don't fail if given many alphas
vincentblot28 Sep 2, 2024
39c5c06
FIX: legend inside plot in tuto + rename group into partition in tuto
vincentblot28 Sep 2, 2024
f937bc8
ENH: increase figure size for tuto
vincentblot28 Sep 2, 2024
12e71f3
FIX: sections titles in tutorial
vincentblot28 Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ History
0.8.x (2024-xx-xx)
------------------

* Add Mondrian Conformal Prediction for regression and classification
* Add `** predict_params` in fit and predict method for Mapie Regression
* Update the ts-changepoint notebook with the tutorial
* Change import related to conformity scores into ts-changepoint notebook
Expand Down
1 change: 1 addition & 0 deletions doc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ clean:
-rm -rf examples_classification/
-rm -rf examples_multilabel_classification/
-rm -rf examples_calibration/
-rm -rf examples_mondrian/
-rm -rf generated/*
-rm -rf modules/generated/*

Expand Down
10 changes: 10 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,13 @@ Resampling

subsample.BlockBootstrap
subsample.Subsample


Mondrian
==========

.. autosummary::
:toctree: generated/
:template: class.rst

mondrian.MondrianCP
6 changes: 4 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,13 +316,15 @@
"../examples/regression",
"../examples/classification",
"../examples/multilabel_classification",
"../examples/calibration"
"../examples/calibration",
"../examples/mondrian",
],
"gallery_dirs": [
"examples_regression",
"examples_classification",
"examples_multilabel_classification",
"examples_calibration"
"examples_calibration",
"examples_mondrian",
],
"doc_module": "mapie",
"backreferences_dir": os.path.join("generated"),
Expand Down
Binary file added doc/images/mondrian.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,14 @@
examples_multilabel_classification/1-quickstart/plot_tutorial_multilabel_classification
notebooks_multilabel_classification

.. toctree::
:maxdepth: 2
:hidden:
:caption: MONDRIAN

theoretical_description_mondrian
examples_mondrian/1-quickstart/plot_main-tutorial-mondrian-regression

.. toctree::
:maxdepth: 2
:hidden:
Expand Down
46 changes: 46 additions & 0 deletions doc/theoretical_description_mondrian.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
.. title:: Theoretical Description Mondrian : contents

.. _theoretical_description_mondrian:

#######################
Theoretical Description
#######################

Mondrian conformal prediction (MCP) [1] is a method that allows to build prediction sets with a group-conditional
coverage guarantee. The coverage guarantee is given by:

.. math::
P \{Y_{n+1} \in \hat{C}_{n, \alpha}(X_{n+1}) | G_{n+1} = g\} \geq 1 - \alpha

where :math:`G_{n+1}` is the group of the new test point :math:`X_{n+1}` and :math:`g`
is a group in the set of groups :math:`\mathcal{G}`.

MCP can be used with any split conformal predictor and can be particularly useful when one have a prior
knowledge about existing groups wheter the information is directly included in the features
of the data or not.
In a classifcation setting, the groups can be defined as the predicted classes of the data. Doing so,
one can ensure that, for each predicted class, the coverage guarantee is satisfied.

In order to achieve the group-conditional coverage guarantee, MCP simply classifies the data
according to the groups and then applies the split conformal predictor to each group separately.

The quantile of each group is defined as:

.. math::
\widehat{q}^g =Quantile\left(s_1, ..., s_{n^g} ,\frac{\lceil (n^{(g)} + 1)(1-\alpha)\rceil}{n^{(g)}} \right)

Where :math:`s_1, ..., s_{n^g}` are the conformity scores of the training points in group :math:`g` and :math:`n^{(g)}`
is the number of training points in group :math:`g`.

The following figure (from [1]) explains the process of Mondrian conformal prediction:

.. image:: images/mondrian.png
:width: 600
:align: center

References
----------

[1] Vladimir Vovk, David Lindsay, Ilia Nouretdinov, and Alex Gammerman.
Mondrian confidence machine.
Technical report, Royal Holloway University of London, 2003
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions examples/mondrian/1-quickstart/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. _mondrian_examples_1:

1. Quickstart examples
----------------------

The following examples present the main functionalities of MAPIE through basic quickstart regression problems.
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
r"""
=============================================
Tutorial for tabular regression with Mondrian
=============================================

In this tutorial, we compare the prediction intervals estimated by MAPIE on a
simple, one-dimensional, ground truth function with classical conformal
prediction intervals versus Mondrian conformal prediction intervals.
The function is a sinusoidal function with added noise, and the data is
grouped in 10 groups. The goal is to estimate the prediction intervals
for new data points, and to compare the coverage of the prediction intervals
by groups.
Throughout this tutorial, we will answer the following questions:


- How to use MAPIE to estimate prediction intervals for a regression problem?
- How to use Mondrian conformal prediction intervals for regression?
- How to compare the coverage of the prediction intervals by groups?
"""

import os
import warnings

import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

from mapie.metrics import regression_coverage_score_v2
from mapie.mondrian import MondrianCP
from mapie.regression import MapieRegressor

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
warnings.filterwarnings("ignore")


##############################################################################
# 1. Create the noisy dataset with 10 groups, each of those groups having
# a different level of noise.
# -------------------------------------------------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not urgent, but if you have the time, the titles are not displayed properly -> you will have to see what's wrong or if it's intentional

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



n_points = 100000
np.random.seed(0)
X = np.linspace(0, 10, n_points).reshape(-1, 1)
group_size = n_points // 10
groups_list = []
for i in range(10):
groups_list.append(np.array([i] * group_size))
groups = np.concatenate(groups_list)

noise_0_1 = np.random.normal(0, 0.1, group_size)
noise_1_2 = np.random.normal(0, 0.5, group_size)
noise_2_3 = np.random.normal(0, 1, group_size)
noise_3_4 = np.random.normal(0, .4, group_size)
noise_4_5 = np.random.normal(0, .2, group_size)
noise_5_6 = np.random.normal(0, .3, group_size)
noise_6_7 = np.random.normal(0, .6, group_size)
noise_7_8 = np.random.normal(0, .7, group_size)
noise_8_9 = np.random.normal(0, .8, group_size)
noise_9_10 = np.random.normal(0, .9, group_size)

y = np.concatenate(
[
np.sin(X[groups == 0, 0] * 2) + noise_0_1,
np.sin(X[groups == 1, 0] * 2) + noise_1_2,
np.sin(X[groups == 2, 0] * 2) + noise_2_3,
np.sin(X[groups == 3, 0] * 2) + noise_3_4,
np.sin(X[groups == 4, 0] * 2) + noise_4_5,
np.sin(X[groups == 5, 0] * 2) + noise_5_6,
np.sin(X[groups == 6, 0] * 2) + noise_6_7,
np.sin(X[groups == 7, 0] * 2) + noise_7_8,
np.sin(X[groups == 8, 0] * 2) + noise_8_9,
np.sin(X[groups == 9, 0] * 2) + noise_9_10,
], axis=0
)


##############################################################################
# We plot the dataset with the groups as colors.


plt.scatter(X, y, c=groups)
plt.show()


##############################################################################
# 2. Split the dataset into a training set, a calibration set, and a test set.


X_train_temp, X_test, y_train_temp, y_test = train_test_split(
X, y, test_size=0.2, random_state=0
)
groups_train_temp, groups_test, _, _ = train_test_split(
groups, y, test_size=0.2, random_state=0
)
X_cal, X_train, y_cal, y_train = train_test_split(
X_train_temp, y_train_temp, test_size=0.5, random_state=0
)
groups_cal, groups_train, _, _ = train_test_split(
groups_train_temp, y_train_temp, test_size=0.5, random_state=0
)


##############################################################################
# We plot the training set, the calibration set, and the test set.


f, ax = plt.subplots(1, 3, figsize=(15, 5))
ax[0].scatter(X_train, y_train, c=groups_train)
ax[0].set_title("Train set")
ax[1].scatter(X_cal, y_cal, c=groups_cal)
ax[1].set_title("Calibration set")
ax[2].scatter(X_test, y_test, c=groups_test)
ax[2].set_title("Test set")
plt.show()


##############################################################################
# 3. Fit a random forest regressor on the training set.


rf = RandomForestRegressor(n_estimators=100)
rf.fit(X_train, y_train)


##############################################################################
# 4. Fit a MapieRegressor and a MondrianCP on the calibration set.


mapie_regressor = MapieRegressor(rf, cv="prefit")
mondrian_regressor = MondrianCP(MapieRegressor(rf, cv="prefit"))
mapie_regressor.fit(X_cal, y_cal)
mondrian_regressor.fit(X_cal, y_cal, groups=groups_cal)


##############################################################################
# 5. Predict the prediction intervals on the test set with both methods.


_, y_pss_split = mapie_regressor.predict(X_test, alpha=.1)
_, y_pss_mondrian = mondrian_regressor.predict(
X_test, groups=groups_test, alpha=.1
)


##############################################################################
# 6. Compare the coverage by groups, plot both methods side by side.


coverages = {}
for group in np.unique(groups_test):
coverages[group] = {}
coverages[group]["split"] = regression_coverage_score_v2(
y_test[groups_test == group], y_pss_split[groups_test == group]
)
coverages[group]["mondrian"] = regression_coverage_score_v2(
y_test[groups_test == group], y_pss_mondrian[groups_test == group]
)


# Plot the coverage by groups, plot both methods side by side
plt.bar(
np.arange(len(coverages)) * 2,
[float(coverages[group]["split"]) for group in coverages],
label="Split"
)
plt.bar(
np.arange(len(coverages)) * 2 + 1,
[float(coverages[group]["mondrian"]) for group in coverages],
label="Mondrian"
)
plt.xticks(
np.arange(len(coverages)) * 2 + .5,
[f"Group {group}" for group in coverages],
rotation=45
)
plt.hlines(0.9, -1, 21, label="90% coverage", color="black", linestyle="--")
plt.ylabel("Coverage")
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
plt.show()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to fix fig size before (text is cut off at the moment).

Screenshot 2024-08-26 at 16 33 48

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

4 changes: 4 additions & 0 deletions examples/mondrian/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.. _mondrian_examples:

Mondrian examples
=======================
Loading
Loading