Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🆕 Integrate Foundation Models Available VIA timm: UNI, Prov-GigaPath, H-optimus-0 #856

Conversation

GeorgeBatch
Copy link
Contributor

Making a pull request as discussed in issue #855


Copied from the issue:

I think it would be useful to integrate pre-trained foundation models from other labs into tiatoolbox.models.architecture.vanilla.py.

Currently, the _get_architecture() function allows the use of models from torchvision.models.

But another function _get_timm_architecture() could be made to incorporate foundation models which are available from timm with weights on HuggingFace Hub. All the models from timm that I've used require users to sign the licence agreement with the authors, so the licencing question seems to be solved itself since there is no way users will get access to the model weights just through Tiatoolbox without getting the access request approved by the authors first.

@GeorgeBatch
Copy link
Contributor Author

Only added UNI and Prov-GigaPath for now. Will add more after initial comments.

I do not like that TimmBackbone pretty much repeats the CNNBackbone. The only difference is the absence of global average pooling in TimmBackbone.

@GeorgeBatch
Copy link
Contributor Author

I found this file for testing: https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/tests/models/test_arch_vanilla.py

There might be a problem regarding memory and compute resources when running some of the larger feature extractors through GitHub actions, e.g. Prov-GigaPath needs a considerable amount of memory just to be loaded.

@shaneahmed
Copy link
Member

I found this file for testing: https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/tests/models/test_arch_vanilla.py

There might be a problem regarding memory and compute resources when running some of the larger feature extractors through GitHub actions, e.g. Prov-GigaPath needs a considerable amount of memory just to be loaded.

As this is just testing the functionality and not loading weights, I hope this would work.

@shaneahmed
Copy link
Member

shaneahmed commented Sep 3, 2024

Only added UNI and Prov-GigaPath for now. Will add more after initial comments.

I do not like that TimmBackbone pretty much repeats the CNNBackbone. The only difference is the absence of global average pooling in TimmBackbone.

In that case, you can inherit CNNBackbone and just define the functions which change like we are doing here

class PatchPredictor(EngineABC):
for PatchPredictor. The PatchPredictor uses all the functionalities of EngineABC other than the ones defined explicitly.

@shaneahmed shaneahmed changed the title Integrate foundation models available through timm: UNI, Virchow, Hibou, H-optimus-0, etc. 🆕 Integrate Foundation Models Available VIA timm: UNI, Virchow, Hibou, H-optimus-0 Sep 3, 2024
@shaneahmed shaneahmed added this to the Release v1.6.0 milestone Sep 18, 2024
@shaneahmed shaneahmed added the enhancement New feature or request label Sep 18, 2024
@shaneahmed
Copy link
Member

I have updated this branch to make sure that tests pass on Ubuntu-24 before we merge it with develop.

GeorgeBatch and others added 4 commits September 24, 2024 16:41
- remove explicit assert statement for `timm` version
- add `timm` version into in if statement for prov-gigapath
- add comment about `timm` version for timm-based models

Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
was not removed while accepting suggested changes
@GeorgeBatch
Copy link
Contributor Author

I think we should add a notebook to show how to

  1. Extract and save metadata from your slides
  2. Make and save thumbnails and masks for the WSI-s
  3. Extract features using the masks saved before using either CNNBackbone or TimmBackbone

This is precisely what I wanted to know how to do, and there was no end-to-end example notebook.


Alternatively, we can add code for saving a mask into the Masking Notebook, which currently does not show how to save the masks:
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/03-tissue-masking.ipynb

wsi = WSIReader.open(slide_path)
mask = wsi.tissue_mask(resolution=1.25, units="power")
mask_thumbnail = mask.slide_thumbnail(resolution=1.25, units="power",)
mask_thumbnail_path = os.path.join(f"{slide_name}_mask.png")
imwrite(mask_thumbnail_path, np.uint8(mask_thumbnail * 255))

And add model = TimmBackbone("UNI") as an alternative to model = CNNBackbone("resnet50") in Slide Graph pipeline since it has the feature extraction code. Note, in Slide Graph notebooks, masks are assumed to be already saved.
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/inference-pipelines/slide-graph.ipynb

@GeorgeBatch
Copy link
Contributor Author

There are 3 more families of models that can be integrated, but they require their own files to be created: Hibou -b and -L, Phikon v1 and v2, Virchow v1 and v2

Should I add those files and call them from _get_timm_network()?

Hibou requires to trust remote code when creating the model, which I do not really like.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GeorgeBatch GeorgeBatch force-pushed the enhance-add-timm-feature-extractors branch from 4a7ad04 to 5c93e9a Compare November 5, 2024 07:42
@GeorgeBatch
Copy link
Contributor Author

@shaneahmed, I think this branch is ready to be merged into develop.

I added code for saving a mask into the Masking Notebook, which currently does not show how to save the masks:
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/03-tissue-masking.ipynb

I also added a comment showing that TimmBackbone could be used as an alternative:

model = CNNBackbone("resnet50")  # TimmBackbone(backbone="UNI", pretrained=True)

in both slide graph notebooks:
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/inference-pipelines/slide-graph.ipynb
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/full-pipelines/slide-graph.ipynb

Copy link
Member

@shaneahmed shaneahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgeBatch This would be very helpful.

@shaneahmed shaneahmed merged commit c980eec into TissueImageAnalytics:develop Nov 15, 2024
3 checks passed
@GeorgeBatch GeorgeBatch deleted the enhance-add-timm-feature-extractors branch November 15, 2024 21:16
@shaneahmed shaneahmed mentioned this pull request Dec 10, 2024
shaneahmed added a commit that referenced this pull request Dec 12, 2024
## TIAToolbox v1.6.0 (2024-12-12)

### Major Updates and Feature Improvements

- **Foundation Models Support via `timm` API** (#856, contributed by @GeorgeBatch)
  - Introduced `TimmBackbone` for running additional PyTorch Image Models.
  - Tested models include `UNI`, `Prov-GigaPath`, and `H-optimus-0`.
  - Added an example notebook demonstrating feature extraction with foundation models.
  - `timm` added as a dependency.
- **Performance Enhancements with `torch.compile`** (#716)
  - Improved performance on newer GPUs using `torch.compile`.
- **Multichannel Input Support in `WSIReader`** (#742)
- **AnnotationStore Filtering for Patch Extraction** (#822)
- **Python 3.12 Support**
- **Deprecation of Python 3.8 Support**
- **CLI Response Time Improvements** (#795)

### API Changes

- **Device Specification Update** (#882)
  - Replaced `has_gpu` with `device` for specifying GPU or CPU usage, aligning with PyTorch's `Model.to()` functionality.
- **Windows Compatibility Enhancement** (#769)
  - Replaced `POWER` with explicit multiplication.

### Bug Fixes and Other Changes

- **TIFFWSIReader Bound Reading Adjustment** (#777)
  - Fixed `read_bound` to use adjusted bounds.
  - Reduced code complexity in `WSIReader` (#814).
- **Annotation Rendering Fixes** (#813)
  - Corrected rendering of annotations with holes.
- **Non-Tiled TIFF Support in `WSIReader`** (#807, contributed by @GeorgeBatch)
- **HoVer-Net Documentation Update** (#751)
  - Corrected class output information.
- **Citation File Fix for `cffconvert`** (#869, contributed by @Alon-Alexander)
- **Bokeh Compatibility Updates**
  - Updated `bokeh_app` for compatibility with `bokeh>=3.5.0`.
  - Switched from `size` to `radius` for `bokeh>3.4.0` compatibility (#796).
- **JSON Extraction Fixes** (#772)
  - Restructured SQL expression construction for JSON properties with dots in keys.
- **VahadaneExtractor Warning** (#871)
  - Added warning due to changes in `scikit-learn>0.23.0` dictionary learning (#382).
- **PatchExtractor Error Message Refinement** (#883)
- **Immutable Output Fix in `WSIReader`** (#850)

### Development-Related Changes

- **Mypy Checks Added**
  - Applied to `utils`, `tools`, `data`, `annotation`, and `cli/common`.
- **ReadTheDocs PDF Build Deprecation**
- **Formatter Update**
  - Replaced `black` with `ruff-format`.
- **Dependency Removal**
  - Removed `jinja2`.
- **Test Environment Update**
  - Updated to `Ubuntu 24.04`.
- **Conda Environment Workflow Update**
  - Implemented `micromamba` setup.
- **Codecov Reporting Fix** (#811)
  **Full Changelog:** v1.5.1...v1.6.0

---------

Co-authored-by: John Pocock <John-P@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adam Shephard <39619155+adamshephard@users.noreply.github.com>
Co-authored-by: Mark Eastwood <20169086+measty@users.noreply.github.com>
Co-authored-by: Mostafa Jahanifar <74412979+mostafajahanifar@users.noreply.github.com>
Co-authored-by: Simon Graham <20071401+simongraham@users.noreply.github.com>
Co-authored-by: Abdol A <u2271662@live.warwick.ac.uk>
Co-authored-by: Jiaqi-Lv <60471431+Jiaqi-Lv@users.noreply.github.com>
Co-authored-by: Dmitrii Blaginin <blaginin@mbp.lan>
Co-authored-by: behnazelhaminia <30952176+behnazelhaminia@users.noreply.github.com>
Co-authored-by: George Batchkala <46561186+GeorgeBatch@users.noreply.github.com>
Co-authored-by: vqdang <24943262+vqdang@users.noreply.github.com>
Co-authored-by: Jiaqi Lv <lvjiaqi9@gmail.com>
Co-authored-by: Alon Alexander <alon008@gmail.com>
@shaneahmed shaneahmed mentioned this pull request Dec 12, 2024
shaneahmed added a commit that referenced this pull request Dec 12, 2024
## TIAToolbox v1.6.0 (2024-12-12)

### Major Updates and Feature Improvements

- **Foundation Models Support via `timm` API** (#856, contributed by @GeorgeBatch)
  - Introduced `TimmBackbone` for running additional PyTorch Image Models.
  - Tested models include `UNI`, `Prov-GigaPath`, and `H-optimus-0`.
  - Added an example notebook demonstrating feature extraction with foundation models.
  - `timm` added as a dependency.
- **Performance Enhancements with `torch.compile`** (#716)
  - Improved performance on newer GPUs using `torch.compile`.
- **Multichannel Input Support in `WSIReader`** (#742)
- **AnnotationStore Filtering for Patch Extraction** (#822)
- **Python 3.12 Support**
- **Deprecation of Python 3.8 Support**
- **CLI Response Time Improvements** (#795)

### API Changes

- **Device Specification Update** (#882)
  - Replaced `has_gpu` with `device` for specifying GPU or CPU usage, aligning with PyTorch's `Model.to()` functionality.
- **Windows Compatibility Enhancement** (#769)
  - Replaced `POWER` with explicit multiplication.

### Bug Fixes and Other Changes

- **TIFFWSIReader Bound Reading Adjustment** (#777)
  - Fixed `read_bound` to use adjusted bounds.
  - Reduced code complexity in `WSIReader` (#814).
- **Annotation Rendering Fixes** (#813)
  - Corrected rendering of annotations with holes.
- **Non-Tiled TIFF Support in `WSIReader`** (#807, contributed by @GeorgeBatch)
- **HoVer-Net Documentation Update** (#751)
  - Corrected class output information.
- **Citation File Fix for `cffconvert`** (#869, contributed by @Alon-Alexander)
- **Bokeh Compatibility Updates**
  - Updated `bokeh_app` for compatibility with `bokeh>=3.5.0`.
  - Switched from `size` to `radius` for `bokeh>3.4.0` compatibility (#796).
- **JSON Extraction Fixes** (#772)
  - Restructured SQL expression construction for JSON properties with dots in keys.
- **VahadaneExtractor Warning** (#871)
  - Added warning due to changes in `scikit-learn>0.23.0` dictionary learning (#382).
- **PatchExtractor Error Message Refinement** (#883)
- **Immutable Output Fix in `WSIReader`** (#850)

### Development-Related Changes

- **Mypy Checks Added**
  - Applied to `utils`, `tools`, `data`, `annotation`, and `cli/common`.
- **ReadTheDocs PDF Build Deprecation**
- **Formatter Update**
  - Replaced `black` with `ruff-format`.
- **Dependency Removal**
  - Removed `jinja2`.
- **Test Environment Update**
  - Updated to `Ubuntu 24.04`.
- **Conda Environment Workflow Update**
  - Implemented `micromamba` setup.
- **Codecov Reporting Fix** (#811)
  **Full Changelog:** v1.5.1...v1.6.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate foundation models available through timm: UNI, Virchow, Hibou, H-optimus-0, etc.
2 participants