Show illustration of `StandardScaler()` in lecture notebook #446

ArturoAmorQ · 2021-08-27T15:03:18Z

Using the notebook introduced in #432, we find that Q4 and Q5 from M1.2 have scores under 0.65. Both questions ask for the effect of StandardScaler() on data but one is visual (Q4) and the other one (Q5) is not.

To address this issue while improving the intuition of scaling, I would suggest showing in this notebook an illustration similar to this one:

Or even adding the piece of code I used for creating that image, so that people can play on their own (the disadvantage would be making this particular notebook too heavy...)

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
plt.rcParams["figure.figsize"] = (10,7)
plt.rcParams.update({'font.size': 16}) 

centers = [[0, 2], [3, 0.5]]
X, _ = make_blobs(n_samples=100, n_features=2, centers=centers, 
                 cluster_std=0.5, center_box=(1, 10.0), 
                 shuffle=True, random_state=0)
 
# plot the training points
fig, ax = plt.subplots()
plt.scatter(X[:, 0], X[:, 1])
ax.spines['left'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_position('zero')
ax.spines['top'].set_color('none')
plt.show()

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X)
X_scaled = scaler.transform(X)

fig, ax = plt.subplots()
plt.scatter(X_scaled[:, 0], X_scaled[:, 1])
ax.spines['left'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['bottom'].set_position('zero')
ax.spines['top'].set_color('none')
plt.show()

What do you think?

The text was updated successfully, but these errors were encountered:

## August 31th, 2021 ### Gael * TODO: Jeremy's renewal, Chiara's replacement, Mathis's consulting gig ### Olivier - input feature names: main PR [#18010](scikit-learn/scikit-learn#18010) that links into sub PRs - remaining (need review): [#20853](scikit-learn/scikit-learn#20853) (found a bug in `OvOClassifier.n_features_in_`) - reviewing `get_feature_names_out`: [#18444](scikit-learn/scikit-learn#18444) - next: give feedback to Chiara on ARM wheel building [#20711](scikit-learn/scikit-learn#20711) (needed for the release) - next: assist Adrin for the release process - next: investigate regression in loky that blocks the cloudpickle release [#432](cloudpipe/cloudpickle#432) - next: come back to intel to write a technical roadmap for a possible collaboration ### Julien - Was on holidays - Planned week @ Nexedi, Lille, from September 13th to 17th - Reviewed PRs - [`#20567`](scikit-learn/scikit-learn#20567) Common Private Loss module - [`#18310`](scikit-learn/scikit-learn#18310) ENH Add option to centered ICE plots (cICE) - Others PRs prior to holidays - [`#20254`](scikit-learn/scikit-learn#20254) - Adapted benchmarks on `pdist_aggregation` to test #20254 against sklearnex - Adapting PR for `fast_euclidean` and `fast_sqeuclidean` on user-facing APIs - Next: comparing against scipy's - Next: Having feedback on [#20254](scikit-learn/scikit-learn#20254) would also help - Next: I need to block time to study Cython code. ### Mathis - `sklearn_benchmarks` - Adapting benchmark script to run on Margaret - Fix issue with profiling files too big to be deployed on Github Pages - Ensure deterministic benchmark results - Working on declarative pipeline specification - Next: run long HPO benchmarks on Margaret ### Arturo - Finished MOOC! - Finished filling [Loïc's notes](https://notes.inria.fr/rgSzYtubR6uSOQIfY9Fpvw#) to find questions with score under 60% (Issue [#432](INRIA/scikit-learn-mooc#432)) - started addressing easy-to-fix questions, resulting in gitlab MRs [#21](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/21) and [#22](https://gitlab.inria.fr/learninglab/mooc-scikit-learn/mooc-scikit-learn-coordination/-/merge_requests/22) - currently working on expanding the notes up to 70% - Continued cross-linking forum posts with issues in GitHub, resulting in [#444](INRIA/scikit-learn-mooc#444), [#445](INRIA/scikit-learn-mooc#445), [#446](INRIA/scikit-learn-mooc#446), [#447](INRIA/scikit-learn-mooc#447) and [#448](INRIA/scikit-learn-mooc#448) ### Jérémie - back from holidays, catching up - Mathis' benchmarks - trying to find what's going on with ASV benchmarks (asv should display the versions of all build and runtime depndencies for each run) ### Guillaume - back from holidays - Next: - release with Adrin - check the PR and issue trackers ### TODO / Next - Expand Loïc’s notes up to 70% (Arturo) - Create presentation to discuss my experience doing the MOOC (Arturo) - Help with the scikit-learn release (Olivier, Guillaume) - HR: Jeremy's renewal, Chiara's replacement (Gael) - Mathis's consulting gig (Olivier, Gael, Mathis)

ArturoAmorQ changed the title ~~Show illustration of StandardScaler() in lecture notebook~~ Show illustration of StandardScaler() in lecture notebook Aug 27, 2021

ArturoAmorQ pushed a commit to ArturoAmorQ/scikit-learn-mooc that referenced this issue Sep 13, 2021

Fix INRIA#446

9f45230

glemaitre mentioned this issue Sep 16, 2021

Add illustration of the effect of scaling data #454

Merged

lesteve closed this as completed in #454 Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show illustration of `StandardScaler()` in lecture notebook #446

Show illustration of `StandardScaler()` in lecture notebook #446

ArturoAmorQ commented Aug 27, 2021 •

edited

Loading

Show illustration of StandardScaler() in lecture notebook #446

Show illustration of StandardScaler() in lecture notebook #446

Comments

ArturoAmorQ commented Aug 27, 2021 • edited Loading

Show illustration of `StandardScaler()` in lecture notebook #446

Show illustration of `StandardScaler()` in lecture notebook #446

ArturoAmorQ commented Aug 27, 2021 •

edited

Loading