WIP: Add MNIST classification example notebook #442

gtauzin · 2020-08-01T13:44:29Z

Signed-off-by: Guillaume Tauzin guillaumetauzin.ut@gmail.com

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Description
Add the MNIST full-blown ML example

Checklist

I have read the guidelines for contributing.
My code follows the code style of this project. I used flake8 to check my Python changes.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed. I used pytest to check this on Python tests.

examples/MNIST_classification.ipynb

gtauzin · 2020-08-01T14:17:38Z

@ulupo Please don't make modification before it is ready. Merging and resolving conflicts on jupyter notebooks is hard enough.

I will let you know once it's ready for your review! Thanks:)

ulupo · 2020-08-01T14:18:29Z

@gtauzin noted. I was just reacting to a request for review, apologies!

gtauzin · 2020-08-01T14:19:29Z

Oops, sorry, I was not aware I made a request. No worries!

gtauzin · 2020-08-01T16:34:55Z

@ulupo: I am now getting the dataset from openML and I adapted the notebook to the plotting API. I have also added some large-scale feature generation and a grid search :)

Can you tell me what you think about the content I suggest?

ulupo

Just a first few minor comments. Will look at the more important things tomorrow!

examples/MNIST_classification.ipynb

ulupo

I'm generally happy with the content and I'll be happy to help refine the presentation too. The notebook does a great job at showing how to construct a highly nontrivial pipeline with a great number of different features created using TDA.

My main comment on the content is the following: I wonder if we could be a bit more sophisticated towards the end by showing how to use scikit-learn tools for feature importance/feature selection as can be found e.g. here or here. Currently, a form of feature selection is illustrated at the end but it seems to amount to testing a subset of the 672 univariate models (RF on a single persistent entropy feature), to see which univariate model is the best; one could then rank them according to performance in validation, and only include the top N in a final multi-variate model which one would then train again. But features which are very correlated might perform similarly well, and our feature selection would not necessarily optimize for a "globally good" list of complementary features. More generally, the user might wonder how to perform feature selection on the multivariate problem directly.

ulupo · 2020-08-10T12:37:00Z

examples/MNIST_classification.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "feature_union_filtrations.fit(X_train[:20])\n",


I imagine taking 20 samples is just for illustrative purposes.

ulupo · 2020-08-10T12:58:17Z

examples/MNIST_classification.ipynb

+    "                    for n_iterations in n_iterations_dilation_list] \\\n",
+    "                 + [SignedDistanceFiltration(n_iterations=n_iterations) \n",
+    "                    for n_iterations in n_iterations_signed_list] \\\n",
+    "                 + ['passthrough']\n",


We should remember to comment on the meaning of the passthrough option here, i.e. explain that it just captures peristence homology of a binary image which really is just homology.

ulupo · 2020-08-10T13:02:24Z

examples/MNIST_classification.ipynb

+    "diagram_steps = [[Binarizer(threshold=0.4), \n",
+    "                  filtration, \n",
+    "                  CubicalPersistence(homology_dimensions=[0, 1]), \n",
+    "                  Scaler(metric='bottleneck')] \n",


I find it a little strange that the scaler alone improves results substantially. Normally, I'd expect a scaler to be followed by a filter, but if it's not then can't the model weight take care of the different scales between homology dimensions?

ulupo · 2020-08-10T13:06:58Z

examples/MNIST_classification.ipynb

+    "]\n",
+    "\n",
+    "#\n",
+    "feature_union = make_union(*[PersistenceEntropy()] + [Amplitude(**metric, order=None) \n",


Isn't there a missing pair of brackets here? I.e. should this not be:

feature_union = make_union(*[[PersistenceEntropy()] + [Amplitude(**metric, order=None) for metric in metric_list]])

?

gtauzin requested a review from ulupo August 1, 2020 13:44

gtauzin self-assigned this Aug 1, 2020

gtauzin changed the title ~~Add first version of MNIST notebook~~ WIP: Add first version of MNIST notebook Aug 1, 2020

ulupo suggested changes Aug 1, 2020

View reviewed changes

examples/MNIST_classification.ipynb Outdated Show resolved Hide resolved

gtauzin force-pushed the master branch from 50d9322 to 88f636b Compare August 1, 2020 15:01

gtauzin changed the title ~~WIP: Add first version of MNIST notebook~~ WIP: Add MNIST classification example notebook Aug 1, 2020

ulupo reviewed Aug 1, 2020

View reviewed changes

ulupo mentioned this pull request Aug 2, 2020

Allow plot_heatmap to take boolean arguments #444

Merged

9 tasks

ulupo mentioned this pull request Aug 10, 2020

Fix y-axis in HeatKernel plots, add default titles to plot methods #453

Merged

9 tasks

ulupo reviewed Aug 10, 2020

View reviewed changes

ulupo mentioned this pull request Aug 14, 2020

Bump package dependencies to latest available in conda #457

Merged

9 tasks

gtauzin closed this Sep 2, 2020

gtauzin force-pushed the master branch from 6bc29de to 60eaa0f Compare September 2, 2020 09:59

gtauzin mentioned this pull request Sep 4, 2020

Add MNIST classification example notebook #477

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add MNIST classification example notebook #442

WIP: Add MNIST classification example notebook #442

gtauzin commented Aug 1, 2020 •

edited

Loading

gtauzin commented Aug 1, 2020 •

edited

Loading

ulupo commented Aug 1, 2020 •

edited

Loading

gtauzin commented Aug 1, 2020 •

edited

Loading

gtauzin commented Aug 1, 2020

ulupo left a comment

ulupo left a comment

ulupo Aug 10, 2020

ulupo Aug 10, 2020

ulupo Aug 10, 2020

ulupo Aug 10, 2020

WIP: Add MNIST classification example notebook #442

WIP: Add MNIST classification example notebook #442

Conversation

gtauzin commented Aug 1, 2020 • edited Loading

gtauzin commented Aug 1, 2020 • edited Loading

ulupo commented Aug 1, 2020 • edited Loading

gtauzin commented Aug 1, 2020 • edited Loading

gtauzin commented Aug 1, 2020

ulupo left a comment

Choose a reason for hiding this comment

ulupo left a comment

Choose a reason for hiding this comment

ulupo Aug 10, 2020

Choose a reason for hiding this comment

ulupo Aug 10, 2020

Choose a reason for hiding this comment

ulupo Aug 10, 2020

Choose a reason for hiding this comment

ulupo Aug 10, 2020

Choose a reason for hiding this comment

gtauzin commented Aug 1, 2020 •

edited

Loading

gtauzin commented Aug 1, 2020 •

edited

Loading

ulupo commented Aug 1, 2020 •

edited

Loading

gtauzin commented Aug 1, 2020 •

edited

Loading