Enabling DeepSMILE on large encoded datasets #637

vale-salvatelli · 2022-01-19T15:56:29Z

This PR contains two changes necessary to run DeepSMILE on a large dataset when using the Innereye SSL checkpoint (or any other encoder with high dimension):

option to encode in chunks (this prevents OOM error when performing the encoding)
option to load the cached encoded dataset in CPU (this prevents OOM when loading from the cache)

It also changes how the PNG images are loaded all over the histo pipeline to make the loading faster (see https://hi-ml.readthedocs.io/en/latest/loading_images.html)

Please follow the guidelines for PRs contained here. Checklist:

[ x] Ensure that your PR is small, and implements one change.
[ x] Add unit tests for all functions that you introduced or modified.
[ x] Run PyCharm's code cleanup tools on your Python files.
Link the correct GitHub issue for tracking.
[x ] Update the Changelog file: Describe your change in terms of
Added/Changed/Removed/... in the "Upcoming" section.
When merging your PR, replace the default merge message with a description of your PR,
and if needed a motivation why that change was required.

ant0nsc

Looking good, just a few minor comments

InnerEye/ML/Histopathology/datamodules/base_module.py

InnerEye/ML/configs/histo_configs/classification/BaseMIL.py

InnerEye/ML/configs/histo_configs/classification/DeepSMILEPanda.py

InnerEye/ML/deep_learning_config.py

…a.py Co-authored-by: Anton Schwaighofer <antonsc@microsoft.com>

Co-authored-by: Anton Schwaighofer <antonsc@microsoft.com>

InnerEye/ML/Histopathology/datamodules/base_module.py

CHANGELOG.md

…InnerEye-DeepLearning into vsalva/chunk_encoding

Tests/ML/histopathology/models/test_transforms.py

vale-salvatelli added 20 commits December 20, 2021 16:49

fixing PandaInnereSSLMIL

3729a8f

updating checkpoint downloader

2d927a3

checkpoint for inference found

b8fd6c8

fixing outputs paths

7c8e7d2

fixing test

39829b8

update changelog

6b9ab94

Merge branch 'main' into vsalva/use_panda_checkpoint

137dc99

implementing PR feedback, thanks Anton and Daniel

c932ef7

typo

7d8d86c

updating to latest CRCk checkpoint, new augmentations

8dc5a48

moving checkpoint ids to file

9a010c6

Merge branch 'main' into vsalva/use_panda_checkpoint

bc21f96

first draft

62e3831

extend test

ffa0dc2

all works with large batch size

eff2349

making cpu memory an option

5290945

clean up chunk size parameter

80c6447

add TODO

97ae348

fixing conflicts

da8b44f

update changelog

cd94e9f

vale-salvatelli changed the title ~~Enabling DeepSMILE on large datasets~~ WIP: Enabling DeepSMILE on large datasets Jan 19, 2022

vale-salvatelli added 4 commits January 19, 2022 17:25

making clarity on cachemode vs precache mode

2498652

fix typo

a5ce7f8

update test after refactoring

505635c

update test after refactoring

c891dde

vale-salvatelli changed the title ~~WIP: Enabling DeepSMILE on large datasets~~ Enabling DeepSMILE on large datasets Jan 19, 2022

vale-salvatelli changed the title ~~Enabling DeepSMILE on large datasets~~ Enabling DeepSMILE on large encoded datasets Jan 19, 2022

remove typo in tests

d1d7b27

vale-salvatelli requested review from ant0nsc and dccastro January 20, 2022 08:53

ant0nsc reviewed Jan 20, 2022

View reviewed changes

vale-salvatelli and others added 4 commits January 20, 2022 09:45

change optional type

e59fef2

Update InnerEye/ML/configs/histo_configs/classification/DeepSMILEPand…

7644da5

…a.py Co-authored-by: Anton Schwaighofer <antonsc@microsoft.com>

Update InnerEye/ML/configs/histo_configs/classification/DeepSMILEPand…

7cc9c8b

…a.py Co-authored-by: Anton Schwaighofer <antonsc@microsoft.com>

Update InnerEye/ML/configs/histo_configs/classification/BaseMIL.py

4550982

Co-authored-by: Anton Schwaighofer <antonsc@microsoft.com>

dccastro reviewed Jan 20, 2022

View reviewed changes

vale-salvatelli added 11 commits January 20, 2022 14:32

change load_image function

7461666

Merge branch 'vsalva/chunk_encoding' of https://github.com/microsoft/…

953244d

…InnerEye-DeepLearning into vsalva/chunk_encoding

revert some changes to avoid inconsistencies in type

3e06759

implement PR feedback

2249ac4

making realistic test cases in test_tile_id_coverage

198ab9b

minor fixes

afe1d14

fix test and extend location cases

c3e1eef

remove generic to cuda

62354a6

fix naming error GPU

14c9746

trying adding some reproducibility to failing test

91dfabc

Merge branch 'main' into vsalva/chunk_encoding

d9757c9

ant0nsc approved these changes Jan 25, 2022

View reviewed changes

Tests/ML/histopathology/models/test_transforms.py Show resolved Hide resolved

dccastro approved these changes Jan 25, 2022

View reviewed changes

Merge branch 'main' into vsalva/chunk_encoding

ffb9535

vale-salvatelli merged commit fb258d5 into main Jan 25, 2022

vale-salvatelli deleted the vsalva/chunk_encoding branch January 25, 2022 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling DeepSMILE on large encoded datasets #637

Enabling DeepSMILE on large encoded datasets #637

vale-salvatelli commented Jan 19, 2022 •

edited

Loading

ant0nsc left a comment

Enabling DeepSMILE on large encoded datasets #637

Enabling DeepSMILE on large encoded datasets #637

Conversation

vale-salvatelli commented Jan 19, 2022 • edited Loading

ant0nsc left a comment

Choose a reason for hiding this comment

vale-salvatelli commented Jan 19, 2022 •

edited

Loading