Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for learning curves #155

Merged
merged 18 commits into from
Apr 6, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
help:
@echo "venv - Create virtual environment and install all necessary dependencies."
@echo "venv_ext - Create virtual environment and install all necessary dependencies"
@echo " and the dependencies needed for examples with external code."
@echo "test - Run tests. Requires virtual env set up."

venv:
python3 -m venv venv ;\
. venv/bin/activate ;\
pip3 install --upgrade pip ;\
pip3 install cython ;\
pip3 install -r requirements.txt ;\
pip3 install -e .

venv_ext:
python3 -m venv venv ;\
. venv/bin/activate ;\
pip3 install --upgrade pip ;\
pip3 install cython ;\
pip3 install -r requirements.txt ;\
pip3 install -r examples/external/requirements_external.txt ;\
pip3 install -e .

test:
flake8 moabb ;\
. venv/bin/activate ;\
python -m unittest moabb.tests
7 changes: 7 additions & 0 deletions examples/external/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# External examples

Examples in this folder require additional dependencies
that are not contained in the `requirements.txt` file.
You can either check manually which dependencies are
needed for each example or install all external dependencies
from `requirements_external.txt`.
147 changes: 147 additions & 0 deletions examples/external/plot_learning_curve_p300_external.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
"""
===========================
Within Session P300 with learning curve
===========================

This Example shows how to perform a within session analysis while also
creating learning curves for a P300 dataset.
Additionally, we will evaluate external code. Make sure to have tdlda installed, which
can be found in requirements_external.txt

We will compare three pipelines :

- Riemannian Geometry
- Jumping Means based Linear Discriminant Analysis
- Time-Decoupled Linear Discriminant Analysis

We will use the P300 paradigm, which uses the AUC as metric.
"""
# Authors: Jan Sosulski
#
# License: BSD (3-clause)

from sklearn.pipeline import make_pipeline
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from pyriemann.tangentspace import TangentSpace
from pyriemann.estimation import XdawnCovariances
from moabb.evaluations import WithinSessionEvaluation
from moabb.paradigms import P300
from moabb.datasets import BNCI2014009
import moabb
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
from tdlda import Vectorizer as JumpingMeansVectorizer
from tdlda import TimeDecoupledLda

# getting rid of the warnings about the future (on s'en fout !)
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=RuntimeWarning)


moabb.set_log_level("info")

##############################################################################
# Create pipelines
# ----------------
#
# Pipelines must be a dict of sklearn pipeline transformer.
processing_sampling_rate = 128
pipelines = {}

# we have to do this because the classes are called 'Target' and 'NonTarget'
# but the evaluation function uses a LabelEncoder, transforming them
# to 0 and 1
labels_dict = {"Target": 1, "NonTarget": 0}

# Riemannian geometry based classification
pipelines["RG + LDA"] = make_pipeline(
XdawnCovariances(nfilter=5, estimator="lwf", xdawn_estimator="scm"),
TangentSpace(),
LDA(solver="lsqr", shrinkage="auto"),
)

# Simple LDA pipeline using averaged feature values in certain time intervals
jumping_mean_ivals = [
[0.10, 0.139],
[0.14, 0.169],
[0.17, 0.199],
[0.20, 0.229],
[0.23, 0.269],
[0.27, 0.299],
[0.30, 0.349],
[0.35, 0.409],
[0.41, 0.449],
[0.45, 0.499],
]
jmv = JumpingMeansVectorizer(
fs=processing_sampling_rate, jumping_mean_ivals=jumping_mean_ivals
)

pipelines["JM + LDA"] = make_pipeline(jmv, LDA(solver="lsqr", shrinkage="auto"))

# Time-decoupled Covariance classifier, needs information about number of
# channels and time intervals
c = TimeDecoupledLda(N_channels=16, N_times=10)
# TD-LDA needs to know about the used jumping means intervals
c.preproc = jmv
pipelines["JM + TD-LDA"] = make_pipeline(jmv, c)


##############################################################################
# Evaluation
# ----------
#
# We define the paradigm (P300) and use all three datasets available for it.
# The evaluation will return a dataframe containing AUCs for each permutation
# and dataset size.

paradigm = P300(resample=processing_sampling_rate)
dataset = BNCI2014009()
# Remove the slicing of the subject list to evaluate multiple subjects
dataset.subject_list = dataset.subject_list[0:1]
datasets = [dataset]
overwrite = True # set to True if we want to overwrite cached results
data_size = dict(policy="ratio", value=np.geomspace(0.02, 1, 6))
# When the training data is sparse, peform more permutations than when we have a lot of data
n_perms = np.floor(np.geomspace(20, 2, len(data_size["value"]))).astype(int)
print(n_perms)
# Guarantee reproducibility
np.random.seed(7536298)
evaluation = WithinSessionEvaluation(
paradigm=paradigm,
datasets=datasets,
data_size=data_size,
n_perms=n_perms,
suffix="examples_lr",
overwrite=overwrite,
)


results = evaluation.process(pipelines)
# %%
##############################################################################
# Plot Results
# ----------------
#
# Here we plot the results.

fig, ax = plt.subplots(facecolor="white", figsize=[8, 4])

n_subs = len(dataset.subject_list)

if n_subs > 1:
r = results.groupby(["pipeline", "subject", "data_size"]).mean().reset_index()
else:
r = results

sns.pointplot(data=r, x="data_size", y="score", hue="pipeline", ax=ax, palette="Set1")

errbar_meaning = "subjects" if n_subs > 1 else "permutations"
title_str = f"Errorbar shows Mean-CI across {errbar_meaning}"
ax.set_xlabel("Amount of training samples")
ax.set_ylabel("ROC AUC")
ax.set_title(title_str)
fig.tight_layout()
plt.show()
1 change: 1 addition & 0 deletions examples/external/requirements_external.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-e git://github.com/jsosulski/tdlda#egg=tdlda
116 changes: 116 additions & 0 deletions examples/plot_learning_curve_motor_imagery.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
"""
===========================
Within Session Motor Imagery with learning curve
===========================

This Example show how to perform a within session motor imagery analysis on the
very popular dataset 2a from the BCI competition IV.

We will compare two pipelines :

- CSP + LDA
- Riemannian Geometry + Logistic Regression

We will use the LeftRightImagery paradigm. this will restrict the analysis
to two classes (left hand versus righ hand) and use AUC as metric.
"""
# Authors: Alexandre Barachant <alexandre.barachant@gmail.com>
jsosulski marked this conversation as resolved.
Show resolved Hide resolved
#
# License: BSD (3-clause)

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from mne.decoding import CSP
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

import moabb
from moabb.datasets import BNCI2014001
from moabb.evaluations import WithinSessionEvaluation
from moabb.paradigms import LeftRightImagery


moabb.set_log_level("info")

##############################################################################
# Create pipelines
# ----------------
#
# Pipelines must be a dict of sklearn pipeline transformer.
#
# The csp implementation from MNE is used. We selected 8 CSP components, as
# usually done in the litterature.
#
# The riemannian geometry pipeline consists in covariance estimation, tangent
# space mapping and finaly a logistic regression for the classification.

pipelines = {}

pipelines["CSP + LDA"] = make_pipeline(CSP(n_components=8), LDA(solver="lsqr", shrinkage="auto"))

pipelines["RG + LR"] = make_pipeline(
Covariances(), TangentSpace(), LogisticRegression(solver="lbfgs")
)

##############################################################################
# Evaluation
# ----------
#
# We define the paradigm (LeftRightImagery) and the dataset (BNCI2014001).
# The evaluation will return a dataframe containing a single AUC score for
# each subject / session of the dataset, and for each pipeline.
#
# Results are saved into the database, so that if you add a new pipeline, it
# will not run again the evaluation unless a parameter has changed. Results can
# be overwrited if necessary.

paradigm = LeftRightImagery()
dataset = BNCI2014001()
dataset.subject_list = dataset.subject_list[:1]
datasets = [dataset]
overwrite = True # set to True if we want to overwrite cached results
# Evaluate for a specific number of training samples per class
data_size = dict(policy="per_class", value=np.array([5, 10, 30, 50]))
# When the training data is sparse, peform more permutations than when we have a lot of data
n_perms = np.floor(np.geomspace(20, 2, len(data_size["value"]))).astype(int)
evaluation = WithinSessionEvaluation(
paradigm=paradigm,
datasets=datasets,
suffix="examples",
overwrite=overwrite,
data_size=data_size,
n_perms=n_perms,
)

results = evaluation.process(pipelines)

print(results.head())
# %%
##############################################################################
# Plot Results
# ----------------
#
# Here we plot the results.

fig, ax = plt.subplots(facecolor="white", figsize=[8, 4])

n_subs = len(dataset.subject_list)

if n_subs > 1:
r = results.groupby(["pipeline", "subject", "data_size"]).mean().reset_index()
else:
r = results

sns.pointplot(data=r, x="data_size", y="score", hue="pipeline", ax=ax, palette="Set1")

errbar_meaning = "subjects" if n_subs > 1 else "permutations"
title_str = f"Errorbar shows Mean-CI across {errbar_meaning}"
ax.set_xlabel("Amount of training samples")
ax.set_ylabel("ROC AUC")
ax.set_title(title_str)
fig.tight_layout()
plt.show()
Loading