Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡ Faster Filtering in Patch Dataset #571

Merged
merged 10 commits into from
Apr 6, 2023
Merged

Conversation

measty
Copy link
Collaborator

@measty measty commented Mar 22, 2023

  • API Change for PatchExtractor.filter_coordinates. Instead of coordinate_resolution, coordinate_units, & mask_resolution, it takes slide_shape and a callable function func to validate the coordinates as input. New implementation is about ~200 times fast.

I've been using WSIPatchDataset for something recently, and noticed that it uses filter_coordinates method of PatchExtractor, instead of the faster filter_coordinates_fast. I can't see any reason why it shouldn't use the faster version - I am guessing that its just a case of, when the faster version was added, WSIPatchDataset was never updated to use it.

So, this PR updates it to use the faster method, and also includes a few other small improvements like allowing the min_mask_ratio argument to be used in the WSIPatchDataset in the same way it is available for patch extractor, so that behaviour between PatchExtractor and WSIPatchDataset is consistent.

Changes in this PR:

  1. WSIPatchDataset now uses filter_coordinates_fast, consistent with PatchExtractor.
  2. filter_coordinates_fast now uses the relative scale between the wsi dimensions at the requested patch resolution, and the mask dimensions, as the scale factor. This is units-agnostic and gives consistent results regardless of mode. Also doesn't make an even lower-res thumbnail of an already low-res mask as it did previously, which was unnecessary and could have had some wierd effects with masks which were already very low-res.
  3. the documented post_proc argument now exists and will be used if given. Changed the documentation to reflect its functionality (ie its just called as is on the patch before it is returned)
  4. Allows min_mask_ratio to be used in WSIPatchDataset, consistent with usage in PatchExtractor.

Rough performance comparison:
~0.6 sec using filter_coordinates_fast
~4 min using filter_coordinates
consistent with what was reported in original PR which added filter_coordinates_fast to PatchExtractor.

The PR also replicates the ability to provide a custom 'filter func' from filter_coordinates into filter_coordinates_fast, so in theory we can now get rid of filter_coordinates as it does nothing filter_coordinates_fast doesn't do.

@measty measty changed the title faster filtering in patch dataset ENH: faster filtering in patch dataset Mar 22, 2023
@measty measty marked this pull request as draft March 22, 2023 09:45
@John-P John-P changed the title ENH: faster filtering in patch dataset ⚡ ENH: Faster Filtering in Patch Dataset Mar 22, 2023
@John-P John-P added the enhancement New feature or request label Mar 22, 2023
@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Merging #571 (0635e0d) into develop (41b3c66) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #571      +/-   ##
===========================================
- Coverage    99.63%   99.63%   -0.01%     
===========================================
  Files           62       62              
  Lines         6626     6609      -17     
  Branches      1081     1073       -8     
===========================================
- Hits          6602     6585      -17     
  Misses          15       15              
  Partials         9        9              
Impacted Files Coverage Δ
tiatoolbox/models/dataset/classification.py 100.00% <100.00%> (ø)
tiatoolbox/tools/patchextraction.py 100.00% <100.00%> (ø)
tiatoolbox/tools/registration/wsi_registration.py 99.50% <100.00%> (ø)
tiatoolbox/wsicore/wsireader.py 99.42% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@measty measty marked this pull request as ready for review March 30, 2023 19:43
@blaginin
Copy link
Collaborator

It feels like a good idea to mark filter_coordinates as deprecated to remove it in the future versions

@measty
Copy link
Collaborator Author

measty commented Mar 31, 2023

It feels like a good idea to mark filter_coordinates as deprecated to remove it in the future versions

We were considering just removing it and renaming the current filter_coordinates_fast to be filter_coordinates, though depreciation may be a good option instead. Perhaps @shaneahmed could comment on which approach would be preferable?

@shaneahmed
Copy link
Member

It feels like a good idea to mark filter_coordinates as deprecated to remove it in the future versions

We were considering just removing it and renaming the current filter_coordinates_fast to be filter_coordinates, though depreciation may be a good option instead. Perhaps @shaneahmed could comment on which approach would be preferable?

I would suggest to replace it as this was the initial intention. filter_coordinates_fast was not intended for use.

@shaneahmed shaneahmed changed the title ⚡ ENH: Faster Filtering in Patch Dataset ⚡ Faster Filtering in Patch Dataset Apr 6, 2023
Copy link
Member

@shaneahmed shaneahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @measty

@shaneahmed shaneahmed merged commit dd9ca06 into develop Apr 6, 2023
@shaneahmed shaneahmed deleted the faster-patch-dataset branch April 6, 2023 14:48
@shaneahmed shaneahmed added this to the Release v1.4.0 milestone Apr 10, 2023
This was referenced May 5, 2023
shaneahmed added a commit that referenced this pull request May 5, 2023
## 1.4.0 (2023-04-24)

### Major Updates and Feature Improvements

- Adds Python 3.11 support \[experimental\] #500
  - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188
- Removes Python 3.7 support
  - This allows upgrading all the dependencies which were dependent on an older version of Python.
- Adds Neighbourhood Querying Support To AnnotationStore #540
  - This enables easy and efficient querying of annotations within a neighbourhood of other annotations.
- Adds `MultiTaskSegmentor` engine #424
- Fixes an issue with stain augmentation to apply augmentation to only tissue regions.
  - #546 contributed by @navidstuv
- Filters logger output to stdout instead of stderr.
  - Fixes #255
- Allows import of some modules at higher level for improved usability
  - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader`
  - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta`
  - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN
- Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571
- Updates documentation for `License` for clarity on source code and model weights license.

### Changes to API

- Updates SCCNN architecture to make it consistent with other models. #544

### Bug Fixes and Other Changes

- Fixes Parsing Missing Omero Version NGFF Metadata #568
  - Fixes #535 raised by @benkamphaus
- Fixes reading of DICOM WSIs at the correct level #564
  - Fixes #529
- Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code
- Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580
- Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes.
- Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given.
  - Partly Fixes #532 raised by @blaginin
- Fixes local tests for Windows/Linux
- Fixes `flake8`, `deepsource` errors.
- Uses `logger` instead of `warnings` and `print` statements to properly log runs.

### Development related changes

- Upgrades dependencies which are dependent on Python 3.7
- Moves `requirements*.txt` files to `requirements` folder
- Removes `tox`
- Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort`
- Adds `joblib` and `numba` as dependencies.
shaneahmed added a commit that referenced this pull request May 5, 2023
## 1.4.0 (2023-04-24)

### Major Updates and Feature Improvements

- Adds Python 3.11 support \[experimental\] #500
  - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188
- Removes Python 3.7 support
  - This allows upgrading all the dependencies which were dependent on an older version of Python.
- Adds Neighbourhood Querying Support To AnnotationStore #540
  - This enables easy and efficient querying of annotations within a neighbourhood of other annotations.
- Adds `MultiTaskSegmentor` engine #424
- Fixes an issue with stain augmentation to apply augmentation to only tissue regions.
  - #546 contributed by @navidstuv
- Filters logger output to stdout instead of stderr.
  - Fixes #255
- Allows import of some modules at higher level for improved usability
  - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader`
  - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta`
  - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN
- Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571
- Updates documentation for `License` for clarity on source code and model weights license.

### Changes to API

- Updates SCCNN architecture to make it consistent with other models. #544

### Bug Fixes and Other Changes

- Fixes Parsing Missing Omero Version NGFF Metadata #568
  - Fixes #535 raised by @benkamphaus
- Fixes reading of DICOM WSIs at the correct level #564
  - Fixes #529
- Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code
- Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580
- Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes.
- Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given.
  - Partly Fixes #532 raised by @blaginin
- Fixes local tests for Windows/Linux
- Fixes `flake8`, `deepsource` errors.
- Uses `logger` instead of `warnings` and `print` statements to properly log runs.

### Development related changes

- Upgrades dependencies which are dependent on Python 3.7
- Moves `requirements*.txt` files to `requirements` folder
- Removes `tox`
- Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort`
- Adds `joblib` and `numba` as dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants