✅ Reuse models and datasets in tests #641

blaginin · 2023-07-17T15:46:26Z

This PR addresses part of issue #603 by improving model weights and datasets in tests. Previously, weights for each model were downloaded to f"{tmp_path}/weights.pth" and replaced each other. The only available dataset, Kather, was downloaded into TIATOOLBOX_HOME, but the entire home directory was deleted during test execution. This approach has several drawbacks:

Tests are extremely slow. On every test run, it downloads ~8 GB of weight files which never change.
Tests cannot be run in parallel as they depend on the same f"{tmp_path}/weights.pth" file.
Tests heavily load TIA's infrastructure. For each test run, the server has to transfer almost 10 GB of static assets. This might work while the number of contributors is small, but what happens when it increases?
Tests were inconsistent with their models caching. Why, for example, PatchPredictor cached their pretrained_models, but UNetModel did not?

Now, all weights are downloaded to TIATOOLBOX_HOME and cached between runs. Storing more data in memory (+8 GB) makes tests much faster, especially if you're not in Coventry or have slow internet. In rare cases where you specifically need to test behaviour without cache, you can set TIATOOLBOX_HOME to a temporary directory.

# Conflicts: # tests/models/test_arch_mapde.py # tests/models/test_arch_micronet.py # tests/models/test_patch_predictor.py # tiatoolbox/models/architecture/__init__.py

codecov · 2023-07-17T16:20:33Z

Codecov Report

Merging #641 (4297c99) into develop (e8b3d91) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 4297c99 differs from pull request most recent head 096bc31. Consider uploading reports for the commit 096bc31 to get more accurate results

@@           Coverage Diff            @@
##           develop     #641   +/-   ##
========================================
  Coverage    99.78%   99.78%           
========================================
  Files           65       65           
  Lines         7072     7076    +4     
  Branches      1395     1397    +2     
========================================
+ Hits          7057     7061    +4     
  Misses           7        7           
  Partials         8        8

Impacted Files	Coverage Δ
tiatoolbox/models/dataset/info.py	`100.00% <ø> (ø)`
tiatoolbox/models/architecture/__init__.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

tiatoolbox/models/architecture/__init__.py

blaginin · 2023-07-18T20:46:14Z

@shaneahmed, I reviewed the PR with Ruff and fixed the error. Regarding the execution time, the first time you run the code, it doesn't have any effect as it will be downloading files into the cache. However, the second time, you will see a speedup.

On my local machine with high-speed internet, the difference is 13.75 minutes vs 10.89 minutes. However, if I use 4G, the difference becomes more noticeable: 1 hour 10 minutes vs 36 minutes.

shaneahmed

Thanks @blaginin

Adds the ability to run tests using several workers using [pytest-xdist](https://github.com/pytest-dev/pytest-xdist), significantly improving processing time. For example, on M1 Max (no CUDA), processing time dropped **from 14 minutes to 4 minutes 💨💨💨.** <img width="1614" alt="image" src="https://github.com/TissueImageAnalytics/tiatoolbox/assets/19199204/fbb607b0-3bf1-48c3-b14a-be4acf2b1ec3"> However, this optimization comes at a cost. Previously, tests depended on serial execution. For example, segmentation and prediction methods used to rely on "output" as a folder to store intermediate results. If many functions modified this folder at the same time, the result would be unpredictable. To address this, I made some tweaks alongside #641 and #673 so that functions will not depend on each other. If we merge this pull request, we will need to start checking that new tests are ready for parallel execution. **Depends on #641 and #673

@TomastpPereira

## 1.5.0 (2023-12-15) ### Major Updates and Feature Improvements - Adds the bokeh visualization tool. #684 - The tool allows a user to launch a server on their machine to visualise whole slide images, overlay the results of deep learning algorithms or to select a patch from whole slide image and run TIAToolbox deep learning engines. - This tool powers the TIA demos server. For details please see https://tiademos.dcs.warwick.ac.uk/. - Extends Annotation to Support Init from WKB #639 - Adds `IOConfig` for NuClick in `pretrained_model.yaml` #709 - Adds functions to save the TIAToolbox Engine outputs to Zarr and AnnotationStore files. #724 - Adds Support for QuPath Annotation Imports #721 ### Changes to API - Adds `model.to(device)` and `model.load_model_from_file()` functionality to make it compatible with PyTorch API. #733 - Replaces `pretrained` with `weights` to make the engines compatible with the new PyTorch API. #621 - Adds support for high-level imports for various utility functions and classes such as `WSIReader`, `PatchPredictor` and `imread` #606, #607, - Adds `tiatoolbox.typing` for type hints. #619 - Fixes incorrect file size saved by `save_tiles`, issue with certain WSIs raised by @TomastpPereira - TissueMasker transform now returns mask instead of a list. #748 - Fixes #732 ### Bug Fixes and Other Changes - Fixes `pixman` incompability error on Colab #601 - Removes `shapely.speedups`. The module no longer has any affect in Shapely >=2.0. #622 - Fixes errors in the slidegraph example notebook #608 - Fixes bugs in WSI Registration #645, #670, #693 - Fixes the situation where PatchExtractor.get_coords() can return patch coords which lie fully outside the bounds of a slide. #712 - Fixes #710 - Fixes #738 raised by @xiachenrui ### Development related changes - Replaces `flake8` and `isort` with `ruff` #625, #666 - Adds `mypy` checks to `root` and `utils` package. This will be rolled out in phases to other modules. #723 - Adds a module to detect file types using magic number/signatures #616 - Uses `poetry` for version updates instead of `bump2version`. #638 - Removes `setup.cfg` and uses `pyproject.toml` for project configurations. - Reduces runtime for some unit tests e.g., #627, #630, #631, #629 - Reuses models and datasets in tests on GitHub actions by utilising cache #641, #644 - Set up parallel tests locally #671 **Full Changelog:** v1.4.0...v1.5.0 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: mostafajahanifar <74412979+mostafajahanifar@users.noreply.github.com> Co-authored-by: John Pocock <John-P@users.noreply.github.com> Co-authored-by: DavidBAEpstein <David.Epstein@warwick.ac.uk> Co-authored-by: David Epstein <22086916+DavidBAEpstein@users.noreply.github.com> Co-authored-by: Ruqayya Awan <18444369+ruqayya@users.noreply.github.com> Co-authored-by: Mark Eastwood <20169086+measty@users.noreply.github.com> Co-authored-by: adamshephard <39619155+adamshephard@users.noreply.github.com> Co-authored-by: adamshephard <adam.shephard@warwick.ac.uk> Co-authored-by: Abdol <a@fkrtech.com> Co-authored-by: Jiaqi-Lv <60471431+Jiaqi-Lv@users.noreply.github.com> Co-authored-by: Abishek <abishekraj6797@gmail.com> Co-authored-by: Dmitrii Blaginin <blaginin@mbp.lan>

@TomastpPereira

## 1.5.0 (2023-12-15) ### Major Updates and Feature Improvements - Adds the bokeh visualization tool. #684 - The tool allows a user to launch a server on their machine to visualise whole slide images, overlay the results of deep learning algorithms or to select a patch from whole slide image and run TIAToolbox deep learning engines. - This tool powers the TIA demos server. For details please see https://tiademos.dcs.warwick.ac.uk/. - Extends Annotation to Support Init from WKB #639 - Adds `IOConfig` for NuClick in `pretrained_model.yaml` #709 - Adds functions to save the TIAToolbox Engine outputs to Zarr and AnnotationStore files. #724 - Adds Support for QuPath Annotation Imports #721 ### Changes to API - Adds `model.to(device)` and `model.load_model_from_file()` functionality to make it compatible with PyTorch API. #733 - Replaces `pretrained` with `weights` to make the engines compatible with the new PyTorch API. #621 - Adds support for high-level imports for various utility functions and classes such as `WSIReader`, `PatchPredictor` and `imread` #606, #607, - Adds `tiatoolbox.typing` for type hints. #619 - Fixes incorrect file size saved by `save_tiles`, issue with certain WSIs raised by @TomastpPereira - TissueMasker transform now returns mask instead of a list. #748 - Fixes #732 ### Bug Fixes and Other Changes - Fixes `pixman` incompability error on Colab #601 - Removes `shapely.speedups`. The module no longer has any affect in Shapely >=2.0. #622 - Fixes errors in the slidegraph example notebook #608 - Fixes bugs in WSI Registration #645, #670, #693 - Fixes the situation where PatchExtractor.get_coords() can return patch coords which lie fully outside the bounds of a slide. #712 - Fixes #710 - Fixes #738 raised by @xiachenrui ### Development related changes - Replaces `flake8` and `isort` with `ruff` #625, #666 - Adds `mypy` checks to `root` and `utils` package. This will be rolled out in phases to other modules. #723 - Adds a module to detect file types using magic number/signatures #616 - Uses `poetry` for version updates instead of `bump2version`. #638 - Removes `setup.cfg` and uses `pyproject.toml` for project configurations. - Reduces runtime for some unit tests e.g., #627, #630, #631, #629 - Reuses models and datasets in tests on GitHub actions by utilising cache #641, #644 - Set up parallel tests locally #671 **Full Changelog:** v1.4.0...v1.5.0

blaginin added 2 commits July 17, 2023 16:45

✅Reuse models and datasets in tests

0d77e99

Merge branch 'dev' into tests-model-weights

e62a48a

# Conflicts: # tests/models/test_arch_mapde.py # tests/models/test_arch_micronet.py # tests/models/test_patch_predictor.py # tiatoolbox/models/architecture/__init__.py

blaginin self-assigned this Jul 17, 2023

✅ Add missing tests

5a2d5ed

blaginin marked this pull request as ready for review July 18, 2023 10:26

Merge branch 'develop' into tests-model-weights

4297c99

blaginin requested a review from shaneahmed July 18, 2023 10:45

shaneahmed reviewed Jul 18, 2023

View reviewed changes

tiatoolbox/models/architecture/__init__.py Show resolved Hide resolved

🎨 Remove unused import

096bc31

shaneahmed added the dev tools Changes/Updates in Development tools label Jul 19, 2023

shaneahmed approved these changes Jul 19, 2023

View reviewed changes

shaneahmed merged commit 3614f61 into develop Jul 19, 2023

shaneahmed deleted the tests-model-weights branch July 19, 2023 10:57

shaneahmed added this to the Release v1.5.0 milestone Jul 19, 2023

blaginin mentioned this pull request Jul 31, 2023

💚 Parallel tests #671

Merged

shaneahmed mentioned this pull request Dec 15, 2023

🔖 Release 1.5.0 #757

Merged

shaneahmed mentioned this pull request Dec 15, 2023

🔖 Release 1.5.0 #761

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✅ Reuse models and datasets in tests #641

✅ Reuse models and datasets in tests #641

blaginin commented Jul 17, 2023 •

edited

Loading

codecov bot commented Jul 17, 2023 •

edited

Loading

blaginin commented Jul 18, 2023

shaneahmed left a comment

✅ Reuse models and datasets in tests #641

✅ Reuse models and datasets in tests #641

Conversation

blaginin commented Jul 17, 2023 • edited Loading

codecov bot commented Jul 17, 2023 • edited Loading

Codecov Report

blaginin commented Jul 18, 2023

shaneahmed left a comment

Choose a reason for hiding this comment

blaginin commented Jul 17, 2023 •

edited

Loading

codecov bot commented Jul 17, 2023 •

edited

Loading