Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nnunet ensemble #198

Merged
merged 21 commits into from
Aug 13, 2024
Merged

Nnunet ensemble #198

merged 21 commits into from
Aug 13, 2024

Conversation

scarere
Copy link
Collaborator

@scarere scarere commented Jul 25, 2024

PR Type

[Feature | Fix | Documentation | Other ]

Short Description

This PR introduces major updates to the prediction and evaluation pipelines for nnunet models. Namely a script that can be called to do all of inference, lesion extraction and picai_eval evaluation.

  • Updates to logging to make everything look nice
  • Included information on how long each step takes to run
    • This will be useful when inference datasets get large to know what bottleneck is
  • Update predict script to return probabilities, annotations and case identifiers.
  • Updated eval script to predict from probabilities and extract detection maps manually
  • predict_and_eval script that combines everything
    • performs inference for arbitrary number of nnunet models
    • Ensembles the predicted probabilities for different checkpoints and nnunet configs
    • Extracts lesion detection maps and predicted annotations/segmentations
    • Computes picai eval metrics from detection maps
    • Allows optional saving of probabilities, detection maps, annotations/segmentations and evaluation result metrics

Notes

I would like to mention that in the second half of working on this, the code had to become a lot more involved than I would have liked. I think this is a good solution for now as it is robust, generalizable to multiple use cases (ie. different amounts of models, multiclass segmenetation etc.) and just works. However in the future I would love to refactor this somehow, make it shorter and cleaner. I think there will be an opportunity across the entire picai research folder to refactor at some point and delete code. However I didn't want to expand the scope of this PR as refactoring would potentially be much more involved under the hood of the various API's being used and take a lot of time. Additionally, waiting on some of these packages to fix issues or make improvements to their own API's (I create an issue and they normally just ask me to do the PR for them lol)

That being said, if you see some easy simplifications I'd love to know. I'm really in deep on this one, so my head hurts when trying to reimagine it. This was a frustrating PR because I don't love the comprimises I had to make.

  • Currently everything needs to be loaded into memory simultaneously.
    • nnunet uses multiprocessing exactly for this reason to avoid having to have the inference dataset in memory
    • However we need to have it in memory currently in order to do the other steps without having to save and load files intermediately
    • Potential future improvement that would be great would be to have the whole pipeline done one sample at a time with multiprocessing, and then just add our own arguments/flags to save each individual sample. However this would involve refactoring everything and redoing nnunet's inference code in part
      • However I'm already kind of overriding nnunet's main inference method to get it to return more information
  • I'd love to add some convenient file loading functionality in the future
    • picai eval has some cool stuff in their fileloading that allows filenames to have 'postfixes'. Basically it learns to ignore a string such as '_probs' at the end of the filename when looking for case identifiers
    • I'd love for there to be an option to specify a nnunet dataset id or the current method of direct folder paths
    • Right now we need a dataset json as an input, might be worth removing this dependency as the only info we need from the dataset json are the file endings and the labels for the different classes.
  • I think I'd like to make the config files python instead of yaml. I wasn't sure how to do this before but I'm realizing now I can do it using importlib. Main benefit is that user can have variables in the config function to avoid typing the same path over and over again

@scarere
Copy link
Collaborator Author

scarere commented Jul 25, 2024

Static code check failure was due to a new vulnerability found in pytorch versions under 2.2.0. I tried updating the dependencies but poetry still resolves torch version 2.1.2 despite the pyproject.toml allowing any version above 2.1.1. I'm assuming some other dependency is preventing us from upgrading to a newer torch version so I just ignored the vulnerability for now

@scarere
Copy link
Collaborator Author

scarere commented Jul 25, 2024

As for the smoke tests, it has something to do with the metrics, but I didn't change anything in fl4health, all the changes were restricted to the research folder, so i don't know why the smoke tests are failing. Also I'm not sure if there's a way to see which smoke tests are failing but here is the error I get when I run them myself.

Screenshot from 2024-07-25 11-09-44

@scarere
Copy link
Collaborator Author

scarere commented Jul 25, 2024

Ok not sure whats going on because the smoketests seem to be running now, i tried running it locally again and got a different error. I've updated my dependencies, did a git pull, so everything should be the same, but when I run python -m tests.smoke_tests.run_smoke_tests i get an error

Screenshot from 2024-07-25 11-18-54

@scarere
Copy link
Collaborator Author

scarere commented Jul 25, 2024

Ok strange that they all passed now but I'll take it

@scarere
Copy link
Collaborator Author

scarere commented Jul 25, 2024

Also I was able to remove ignoring the mlflow vulnerabilities so I guess that issue was solved

Copy link
Collaborator

@jewelltaylor jewelltaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, some great work! I have comments mostly around small details, clarifications and documentation requests.

I also echo Masoom's point about supporting evaluation without requiring the entire test set in RAM. As you mention, it seems like a lot of this would be transferrable. Up to you if you want to tackle that in this PR, or start another branch off this one for future PR adding that functionality. If there is a good reason to still support the RAM-based evaluation we can keep it in, otherwise it may be simpler and less code to support solely the disk-based evaluation.

@scarere
Copy link
Collaborator Author

scarere commented Aug 2, 2024

Ok modifying stuff to use disk ended up being a bit more work than I thought. What we have works and is a nice quick and dirty method. We can discuss a big refactor for a future PR. I will look into the comments John made to see if any of them still apply

Copy link
Collaborator

@jewelltaylor jewelltaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@scarere scarere merged commit dba0d74 into main Aug 13, 2024
6 checks passed
@scarere scarere deleted the nnunet_ensemble branch August 13, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants