-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
⚡ Faster Filtering in Patch Dataset #571
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #571 +/- ##
===========================================
- Coverage 99.63% 99.63% -0.01%
===========================================
Files 62 62
Lines 6626 6609 -17
Branches 1081 1073 -8
===========================================
- Hits 6602 6585 -17
Misses 15 15
Partials 9 9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
It feels like a good idea to mark |
We were considering just removing it and renaming the current filter_coordinates_fast to be filter_coordinates, though depreciation may be a good option instead. Perhaps @shaneahmed could comment on which approach would be preferable? |
I would suggest to replace it as this was the initial intention. |
…s/tiatoolbox into faster-patch-dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @measty
## 1.4.0 (2023-04-24) ### Major Updates and Feature Improvements - Adds Python 3.11 support \[experimental\] #500 - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188 - Removes Python 3.7 support - This allows upgrading all the dependencies which were dependent on an older version of Python. - Adds Neighbourhood Querying Support To AnnotationStore #540 - This enables easy and efficient querying of annotations within a neighbourhood of other annotations. - Adds `MultiTaskSegmentor` engine #424 - Fixes an issue with stain augmentation to apply augmentation to only tissue regions. - #546 contributed by @navidstuv - Filters logger output to stdout instead of stderr. - Fixes #255 - Allows import of some modules at higher level for improved usability - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader` - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta` - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN - Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571 - Updates documentation for `License` for clarity on source code and model weights license. ### Changes to API - Updates SCCNN architecture to make it consistent with other models. #544 ### Bug Fixes and Other Changes - Fixes Parsing Missing Omero Version NGFF Metadata #568 - Fixes #535 raised by @benkamphaus - Fixes reading of DICOM WSIs at the correct level #564 - Fixes #529 - Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code - Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580 - Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes. - Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given. - Partly Fixes #532 raised by @blaginin - Fixes local tests for Windows/Linux - Fixes `flake8`, `deepsource` errors. - Uses `logger` instead of `warnings` and `print` statements to properly log runs. ### Development related changes - Upgrades dependencies which are dependent on Python 3.7 - Moves `requirements*.txt` files to `requirements` folder - Removes `tox` - Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort` - Adds `joblib` and `numba` as dependencies.
## 1.4.0 (2023-04-24) ### Major Updates and Feature Improvements - Adds Python 3.11 support \[experimental\] #500 - Python 3.11 is not fully supported by `pytorch` pytorch/pytorch#86566 and `openslide` openslide/openslide-python#188 - Removes Python 3.7 support - This allows upgrading all the dependencies which were dependent on an older version of Python. - Adds Neighbourhood Querying Support To AnnotationStore #540 - This enables easy and efficient querying of annotations within a neighbourhood of other annotations. - Adds `MultiTaskSegmentor` engine #424 - Fixes an issue with stain augmentation to apply augmentation to only tissue regions. - #546 contributed by @navidstuv - Filters logger output to stdout instead of stderr. - Fixes #255 - Allows import of some modules at higher level for improved usability - `WSIReader` can now be imported as `from tiatoolbox.wsicore import WSIReader` - `WSIMeta` can now be imported as `from tiatoolbox.wsicore import WSIMeta` - `HoVerNet`, `HoVerNetPlus`, `IDaRS`, `MapDe`, `MicroNet`, `NuClick`, `SCCNN` can now be imported as \`from tiatoolbox.models import HoVerNet, HoVerNetPlus, IDaRS, MapDe, MicroNet, NuClick, SCCNN - Improves `PatchExtractor` performance. Updates `WSIPatchDataset` to be consistent. #571 - Updates documentation for `License` for clarity on source code and model weights license. ### Changes to API - Updates SCCNN architecture to make it consistent with other models. #544 ### Bug Fixes and Other Changes - Fixes Parsing Missing Omero Version NGFF Metadata #568 - Fixes #535 raised by @benkamphaus - Fixes reading of DICOM WSIs at the correct level #564 - Fixes #529 - Fixes `scipy`, `matplotlib`, `scikit-image` deprecated code - Fixes breaking changes in `DICOMWSIReader` to make it compatible with latest `wsidicom` version. #539, #580 - Updates `shapely` dependency to version >=2.0.0 and fixes any breaking changes. - Fixes bug with `DictionaryStore.bquery` and `geometry=None`, i.e. only a where predicate given. - Partly Fixes #532 raised by @blaginin - Fixes local tests for Windows/Linux - Fixes `flake8`, `deepsource` errors. - Uses `logger` instead of `warnings` and `print` statements to properly log runs. ### Development related changes - Upgrades dependencies which are dependent on Python 3.7 - Moves `requirements*.txt` files to `requirements` folder - Removes `tox` - Uses `pyproject.toml` for `bdist_wheel`, `pytest` and `isort` - Adds `joblib` and `numba` as dependencies.
PatchExtractor.filter_coordinates
. Instead ofcoordinate_resolution
,coordinate_units
, &mask_resolution
, it takesslide_shape
and a callable functionfunc
to validate the coordinates as input. New implementation is about ~200 times fast.I've been using WSIPatchDataset for something recently, and noticed that it uses filter_coordinates method of PatchExtractor, instead of the faster filter_coordinates_fast. I can't see any reason why it shouldn't use the faster version - I am guessing that its just a case of, when the faster version was added, WSIPatchDataset was never updated to use it.
So, this PR updates it to use the faster method, and also includes a few other small improvements like allowing the min_mask_ratio argument to be used in the WSIPatchDataset in the same way it is available for patch extractor, so that behaviour between PatchExtractor and WSIPatchDataset is consistent.
Changes in this PR:
Rough performance comparison:
~0.6 sec using filter_coordinates_fast
~4 min using filter_coordinates
consistent with what was reported in original PR which added filter_coordinates_fast to PatchExtractor.
The PR also replicates the ability to provide a custom 'filter func' from filter_coordinates into filter_coordinates_fast, so in theory we can now get rid of filter_coordinates as it does nothing filter_coordinates_fast doesn't do.