Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🆕 Integrate Foundation Models Available VIA timm: UNI, Prov-GigaPath, H-optimus-0 #856

Conversation

GeorgeBatch
Copy link
Contributor

Making a pull request as discussed in issue #855


Copied from the issue:

I think it would be useful to integrate pre-trained foundation models from other labs into tiatoolbox.models.architecture.vanilla.py.

Currently, the _get_architecture() function allows the use of models from torchvision.models.

But another function _get_timm_architecture() could be made to incorporate foundation models which are available from timm with weights on HuggingFace Hub. All the models from timm that I've used require users to sign the licence agreement with the authors, so the licencing question seems to be solved itself since there is no way users will get access to the model weights just through Tiatoolbox without getting the access request approved by the authors first.

@GeorgeBatch
Copy link
Contributor Author

Only added UNI and Prov-GigaPath for now. Will add more after initial comments.

I do not like that TimmBackbone pretty much repeats the CNNBackbone. The only difference is the absence of global average pooling in TimmBackbone.

@GeorgeBatch
Copy link
Contributor Author

I found this file for testing: https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/tests/models/test_arch_vanilla.py

There might be a problem regarding memory and compute resources when running some of the larger feature extractors through GitHub actions, e.g. Prov-GigaPath needs a considerable amount of memory just to be loaded.

@shaneahmed
Copy link
Member

I found this file for testing: https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/tests/models/test_arch_vanilla.py

There might be a problem regarding memory and compute resources when running some of the larger feature extractors through GitHub actions, e.g. Prov-GigaPath needs a considerable amount of memory just to be loaded.

As this is just testing the functionality and not loading weights, I hope this would work.

@shaneahmed
Copy link
Member

shaneahmed commented Sep 3, 2024

Only added UNI and Prov-GigaPath for now. Will add more after initial comments.

I do not like that TimmBackbone pretty much repeats the CNNBackbone. The only difference is the absence of global average pooling in TimmBackbone.

In that case, you can inherit CNNBackbone and just define the functions which change like we are doing here

class PatchPredictor(EngineABC):
for PatchPredictor. The PatchPredictor uses all the functionalities of EngineABC other than the ones defined explicitly.

@shaneahmed shaneahmed changed the title Integrate foundation models available through timm: UNI, Virchow, Hibou, H-optimus-0, etc. 🆕 Integrate Foundation Models Available VIA timm: UNI, Virchow, Hibou, H-optimus-0 Sep 3, 2024
@shaneahmed shaneahmed added this to the Release v1.6.0 milestone Sep 18, 2024
@shaneahmed shaneahmed added the enhancement New feature or request label Sep 18, 2024
@shaneahmed
Copy link
Member

I have updated this branch to make sure that tests pass on Ubuntu-24 before we merge it with develop.

GeorgeBatch and others added 4 commits September 24, 2024 16:41
- remove explicit assert statement for `timm` version
- add `timm` version into in if statement for prov-gigapath
- add comment about `timm` version for timm-based models

Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
was not removed while accepting suggested changes
GeorgeBatch and others added 3 commits October 25, 2024 16:44
- As the requirements specify `timm>=1.0.3`. There is no need to check this separately.

Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
1. Split `test_functional` into `test_engine` and `test_full_inference`
2. In `test_full_inference` use `@pytest.mark.parametrize` for `CNNBackbone` and `TimmBackbone` instead of making 2 copies of the function
@GeorgeBatch
Copy link
Contributor Author

I think we should add a notebook to show how to

  1. Extract and save metadata from your slides
  2. Make and save thumbnails and masks for the WSI-s
  3. Extract features using the masks saved before using either CNNBackbone or TimmBackbone

This is precisely what I wanted to know how to do, and there was no end-to-end example notebook.


Alternatively, we can add code for saving a mask into the Masking Notebook, which currently does not show how to save the masks:
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/03-tissue-masking.ipynb

wsi = WSIReader.open(slide_path)
mask = wsi.tissue_mask(resolution=1.25, units="power")
mask_thumbnail = mask.slide_thumbnail(resolution=1.25, units="power",)
mask_thumbnail_path = os.path.join(f"{slide_name}_mask.png")
imwrite(mask_thumbnail_path, np.uint8(mask_thumbnail * 255))

And add model = TimmBackbone("UNI") as an alternative to model = CNNBackbone("resnet50") in Slide Graph pipeline since it has the feature extraction code. Note, in Slide Graph notebooks, masks are assumed to be already saved.
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/inference-pipelines/slide-graph.ipynb

@GeorgeBatch
Copy link
Contributor Author

There are 3 more families of models that can be integrated, but they require their own files to be created: Hibou -b and -L, Phikon v1 and v2, Virchow v1 and v2

Should I add those files and call them from _get_timm_network()?

Hibou requires to trust remote code when creating the model, which I do not really like.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GeorgeBatch GeorgeBatch force-pushed the enhance-add-timm-feature-extractors branch from 4a7ad04 to 5c93e9a Compare November 5, 2024 07:42
@GeorgeBatch
Copy link
Contributor Author

@shaneahmed, I think this branch is ready to be merged into develop.

I added code for saving a mask into the Masking Notebook, which currently does not show how to save the masks:
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/03-tissue-masking.ipynb

I also added a comment showing that TimmBackbone could be used as an alternative:

model = CNNBackbone("resnet50")  # TimmBackbone(backbone="UNI", pretrained=True)

in both slide graph notebooks:
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/inference-pipelines/slide-graph.ipynb
https://github.com/TissueImageAnalytics/tiatoolbox/blob/develop/examples/full-pipelines/slide-graph.ipynb

Copy link
Member

@shaneahmed shaneahmed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgeBatch This would be very helpful.

@shaneahmed shaneahmed merged commit c980eec into TissueImageAnalytics:develop Nov 15, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate foundation models available through timm: UNI, Virchow, Hibou, H-optimus-0, etc.
2 participants