Nnunet ensemble #198

scarere · 2024-07-25T14:18:38Z

PR Type

[Feature | Fix | Documentation | Other ]

Short Description

This PR introduces major updates to the prediction and evaluation pipelines for nnunet models. Namely a script that can be called to do all of inference, lesion extraction and picai_eval evaluation.

Updates to logging to make everything look nice
Included information on how long each step takes to run
- This will be useful when inference datasets get large to know what bottleneck is
Update predict script to return probabilities, annotations and case identifiers.
Updated eval script to predict from probabilities and extract detection maps manually
predict_and_eval script that combines everything
- performs inference for arbitrary number of nnunet models
- Ensembles the predicted probabilities for different checkpoints and nnunet configs
- Extracts lesion detection maps and predicted annotations/segmentations
- Computes picai eval metrics from detection maps
- Allows optional saving of probabilities, detection maps, annotations/segmentations and evaluation result metrics

Notes

I would like to mention that in the second half of working on this, the code had to become a lot more involved than I would have liked. I think this is a good solution for now as it is robust, generalizable to multiple use cases (ie. different amounts of models, multiclass segmenetation etc.) and just works. However in the future I would love to refactor this somehow, make it shorter and cleaner. I think there will be an opportunity across the entire picai research folder to refactor at some point and delete code. However I didn't want to expand the scope of this PR as refactoring would potentially be much more involved under the hood of the various API's being used and take a lot of time. Additionally, waiting on some of these packages to fix issues or make improvements to their own API's (I create an issue and they normally just ask me to do the PR for them lol)

That being said, if you see some easy simplifications I'd love to know. I'm really in deep on this one, so my head hurts when trying to reimagine it. This was a frustrating PR because I don't love the comprimises I had to make.

Currently everything needs to be loaded into memory simultaneously.
- nnunet uses multiprocessing exactly for this reason to avoid having to have the inference dataset in memory
- However we need to have it in memory currently in order to do the other steps without having to save and load files intermediately
- Potential future improvement that would be great would be to have the whole pipeline done one sample at a time with multiprocessing, and then just add our own arguments/flags to save each individual sample. However this would involve refactoring everything and redoing nnunet's inference code in part
  - However I'm already kind of overriding nnunet's main inference method to get it to return more information
I'd love to add some convenient file loading functionality in the future
- picai eval has some cool stuff in their fileloading that allows filenames to have 'postfixes'. Basically it learns to ignore a string such as '_probs' at the end of the filename when looking for case identifiers
- I'd love for there to be an option to specify a nnunet dataset id or the current method of direct folder paths
- Right now we need a dataset json as an input, might be worth removing this dependency as the only info we need from the dataset json are the file endings and the labels for the different classes.
I think I'd like to make the config files python instead of yaml. I wasn't sure how to do this before but I'm realizing now I can do it using importlib. Main benefit is that user can have variables in the config function to avoid typing the same path over and over again

scarere · 2024-07-25T15:07:29Z

Static code check failure was due to a new vulnerability found in pytorch versions under 2.2.0. I tried updating the dependencies but poetry still resolves torch version 2.1.2 despite the pyproject.toml allowing any version above 2.1.1. I'm assuming some other dependency is preventing us from upgrading to a newer torch version so I just ignored the vulnerability for now

scarere · 2024-07-25T15:10:07Z

As for the smoke tests, it has something to do with the metrics, but I didn't change anything in fl4health, all the changes were restricted to the research folder, so i don't know why the smoke tests are failing. Also I'm not sure if there's a way to see which smoke tests are failing but here is the error I get when I run them myself.

scarere · 2024-07-25T15:20:19Z

Ok not sure whats going on because the smoketests seem to be running now, i tried running it locally again and got a different error. I've updated my dependencies, did a git pull, so everything should be the same, but when I run python -m tests.smoke_tests.run_smoke_tests i get an error

scarere · 2024-07-25T15:42:43Z

Ok strange that they all passed now but I'll take it

scarere · 2024-07-25T15:44:04Z

Also I was able to remove ignoring the mlflow vulnerabilities so I guess that issue was solved

research/picai/nnunet_scripts/eval.py

research/picai/nnunet_scripts/predict_and_eval.py

research/picai/nnunet_scripts/eval.py

jewelltaylor

All in all, some great work! I have comments mostly around small details, clarifications and documentation requests.

I also echo Masoom's point about supporting evaluation without requiring the entire test set in RAM. As you mention, it seems like a lot of this would be transferrable. Up to you if you want to tackle that in this PR, or start another branch off this one for future PR adding that functionality. If there is a good reason to still support the RAM-based evaluation we can keep it in, otherwise it may be simpler and less code to support solely the disk-based evaluation.

scarere · 2024-08-02T17:18:37Z

Ok modifying stuff to use disk ended up being a bit more work than I thought. What we have works and is a nice quick and dirty method. We can discuss a big refactor for a future PR. I will look into the comments John made to see if any of them still apply

research/picai/README.md

research/picai/nnunet_scripts/eval.py

jewelltaylor

LGTM!

…unet smoke tests to front

scarere added 10 commits July 16, 2024 13:43

added prediction capability for nnunet ensembles

a536601

improved prediction and added reusable function

b1ee99f

Added saving functionality to predict

7fcfa17

added prediction capability for nnunet ensembles

8a5daad

improved prediction and added reusable function

52a908c

Added saving functionality to predict

83115e5

updated eval to work with probs

1cf99a0

fixing conflicts

29368df

Cleaned up predict and eval code

8a6c5d9

a bit of cleanup and readme updates

77d4dff

scarere requested review from jewelltaylor and emersodb July 25, 2024 14:18

scarere and others added 2 commits July 25, 2024 10:19

Merge branch 'main' into nnunet_ensemble

e47b662

Updated dependencies, ignored torch vulnerability

b31717b