Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ONNX exporter #1826

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Add ONNX exporter #1826

wants to merge 14 commits into from

Conversation

MathijsdeBoer
Copy link
Contributor

@MathijsdeBoer MathijsdeBoer commented Dec 1, 2023

Purpose

Hey there! I, and others, have occasionally wondered if it was possible to export the trained nnU-Nets to an ONNX format, for deployment, or other sharing. To this end, I've cobbled together an exporter using the built-in pytorch ONNX exporter.

In the past, I had managed to get an exporter working for v1, but this won't work since the release of v2 changed some things around. Nevertheless, the exported information let us build a very minimalist copy of the entire nnU-Net pipeline that produced outputs faster than the v1 inference pipeline, and with less overhead.

Implementation

I've "repurposed" bits and pieces of code I found around the project (Mainly nnunetv2.inference.predict_from_raw_data.py) to load in a network, set the parameters and acquire any information the user might want to build their own ONNX pipeline. The command works roughly the same as the nnUNetv2_export_model_to_zip command, but exports an .onnx file for each fold and configuration combination instead. Each .onnx file should be accompanied with a .json file that has some basic information the user might want to know to build their own ONNX pipeline with.

Finally, it's made perfectly clear that any ONNX pipelining is the sole responsibility of the end-user, not the maintainers of nnU-Net through a big warning box which is printed before the export begins.

######################################################
!!!!!!!!!!!!!!!!!!!!!!!!WARNING!!!!!!!!!!!!!!!!!!!!!!!
######################################################
You are responsible for creating the ONNX pipeline
yourself.

This script will only export the model
weights to an onnx file, and some basic information
about the model. You will have to create the ONNX
pipeline yourself.

See
https://pytorch.org/tutorials/beginner/onnx/export_simple_model_to_onnx_tutorial.html#execute-the-onnx-model-with-onnx-runtime
for some documentation on how to do this.
######################################################

Motivation

While I can't speak for any others' reasons, our main reason to want an ONNX format is the deployment of the trained models to an inference server. Due to server costs, we might want to use some of ONNX's excellent optimization techniques for faster inference, maybe move away from a python-based pipeline in favor of compiled languages, like C. C++ or Rust. Finally, a lot of CUDA+Python Docker containers produce gigantic files (for example, the docker image I use to train the nnU-Nets is 24GB when exported), which will take an extremely long time to boot up on servers.

A lot of clinicians really would like to see neural networks applied in their practice, and quite a few companies are willing make that investment. However, server costs are still a large burden, so any ability to offer a way to export an nnU-Net to a common format, even without further support beyond the exporter, will hopefully allow a larger adoption outside of pure research across the board.

@thangngoc89
Copy link
Contributor

@MathijsdeBoer I think all the point you've written above are all valid when applying nnUNet in production environment.
I haven't had the chance to run this locally yet but asking here to see if what are the remaining tasks of this PR?
Does this still requires nnUNet's codebase (sliding window inference, pre- and post-processing)?

I think python is a required dependency in the foreseeable however removing pytorch would be already a huge win.

@MathijsdeBoer
Copy link
Contributor Author

@thangngoc89 I've since had the chance to run the exporter locally, and it produces an ONNX file successfully! I managed to get a very basic pipeline set up (no TTA, no parallelism, no ensembling, very naive sliding window, etc.) and it produces a very similar prediction on my local Quadro RTX 5000 machine in about a minute. For comparison, this is about the same time it takes our A100 server to create a prediction with the current nnUNet codebase (albeit with all the additional features included).

Some caveates to the exporter, though, due to floating point stuff the models will never be 100% the same, so outputs will vary every so slightly. Usually this won't be noticable, except for those edge cases where the model is uncertain.

As for your questions:

Todo:
Not much more to the code itself, I've successfully exported a model to ONNX with the code in the PR now. I might want to take a few minutes to cleanup the config.json export step, so it includes input channels and output labels, too. But that's mostly a nice to have, as the user who can export the model would also have access to the dataset.json file.
One thing I'm not quite able to figure out is the instance norm layers not switching to train=False when calling model.eval(), which might be causing a slight discrepancy in the exported model.

Beyond the code, all that's left is for someone more intimately familiar with the codebase to double check my solution, to see if I'm not mis-/abusing anything. Perhaps I'm not quite loading the model correctly? Finally, to see if there is any interest in this functionality at all.

Requiring nnUNet codebase:
After export is complete, the user won't necessarily need the nnUNet codebase anymore to run inference. It'll be up to the user to make their own pipeline, using whatever language/libraries they need. Though that does mean that they could choose to keep using the nnUNet codebase to do this.

Dependencies:
Python, PyTorch and nnUNet are only requirements for the export step, but not beyond that. It isn't my intention to move nnUNet away from PyTorch with this PR, as it offers a lot of benefits for training and ease of use. My only intention is for users to be able to easily export their models to do with as they see fit, beyond the scope of the nnUNet project.

@thangngoc89
Copy link
Contributor

@MathijsdeBoer thank you for a very detailed answer.

Python, PyTorch and nnUNet are only requirements for the export step, but not beyond that. It isn't my intention to move nnUNet away from PyTorch with this PR, as it offers a lot of benefits for training and ease of use. My only intention is for users to be able to easily export their models to do with as they see fit, beyond the scope of the nnUNet project.

Definitely. I was talking about not having Pytorch on inference mode.

I will try to test this PR locally and provide feedbacks soon

@MathijsdeBoer MathijsdeBoer marked this pull request as ready for review December 4, 2023 15:06
@FabianIsensee FabianIsensee self-assigned this Dec 5, 2023
@FabianIsensee
Copy link
Member

Hey @MathijsdeBoer thank you so much for all this work! Exporting to onnx is certainly something we should support. One of the reasons I have never done that so far is because I don't want to be responsible for people not being able to reproduce our results with their onnx files. After all, they would need to reimplement a lot of things around the model: Cropping, resampling, intensity normalization, sliding window inference with proper logit aggregation, resizing of the probabilities to the original data shape, conversion to segmentation, revert cropping. There are many opportunities for errors.

I am happy to accept this PR once you are happy with the config.json file etc. Just ping me when everything is complete.

One more thing (and please let me know if that doesn't make any sense - I am not very familiar with onnx and optimizations): Shouldn't it be possible to use the onnx optimized model in the regular nnU-Net inference code as well? Have you experimented with that?
Best,
Fabian

@MathijsdeBoer
Copy link
Contributor Author

Hey Fabian, no worries. It didn't end up being a massive amount of work, as PyTorch has a built-in exporter! Main time was spent reading through documentation and nnUNet core code to see if I could reuse as many existing functions as possible. That should make it so that any updates to the core code will keep the exporter up to date as well. Barring any fundamental rewrites, of course.

One of the reasons I have never done that so far is because I don't want to be responsible for people not being able to reproduce our results with their onnx files.

Yep, I recall seeing you mention something like that a few times before. I could make the warning text a lot stronger, and add a disclaimer for any future support. Maybe something like:

######################################################
!!!!!!!!!!!!!!!!!!!!!!!!WARNING!!!!!!!!!!!!!!!!!!!!!!!
######################################################
Exported models are provided as-is, without any
guarantees, warranties and/or support from MIC-DKFZ,
any associated persons and/or other entities.

You will bear sole responsibility for the proper
use of the exported models.

You are responsible for creating and validating
the ONNX pipeline yourself. To this end we provide
the .onnx file, and a config.json containing any
details you might need.
######################################################

Of course, there's always going to be someone who might ask questions on this, but this will at least make it pretty clear that they shouldn't expect any hand holding.

One thing that might be nice, however, is a short document that has a step-wise overview of how the inference works. Optionally with where to find these things in the code. Just as a general reference for pipeline implementations.

As for being happy with the config file, I think I've managed to include everything someone might need to accurately rebuild the pipeline with all possible export configurations. The only thing I haven't been able to test is what happens if someone has a label with multiple values. Fortunately, given the free nature of onnx pipeline implementations, there will be a lot of room for hardcoding things to fit their particular needs, so it might not be a big problem after all.

I think that if you want to use ONNX in the existing nnUNet pipeline, you'd have to decouple the nnUNetPredictor from the nn.Module, and simplify it down to basic steps. ONNX doesn't rely on torch.Tensor for data, and uses numpy.ndarray as input and output instead. Other than that, general steps are very similar!

...
# PyTorch
data = preprocess_data()
data = torch.tensor(data)
data.to(device)

torch_model = get_model()
torch_model.eval()
with torch.no_grad():
    pred = torch_model(data)

pred.detach().cpu().numpy()
postprocess_pred(pred)

...
# ONNX
data = preprocess_data()

ort_model = ort.InferenceSession(
    model_filepath,
    providers=["CUDAExecutionProvider"]  # Falls back to CPU automatically
)

# ONNX models use named inputs/outputs, exporter simply calls it "input", but this is more general
ort_inputs = {ort_model.get_inputs()[0].name: data}
# Identical to following in our case
ort_inputs = {"input": data}

pred = ort_model.run(None, ort_inputs)[0]

postprocess_pred(pred)

...

There's a few more things one can do before starting inference sessions, such as offline optimizations ahead of time. This should make the model faster and/or smaller, but unfortunately might still impart a slight performance penalty. So these tradeoffs would be up to the end-user.

@Sharpz7
Copy link

Sharpz7 commented Nov 21, 2024

Hey @MathijsdeBoer,

I was wondering where you had gotten with this? Happy to give a helping hand with merge conflicts if you are short of time.

As well as that, technical questions:

  • do you have the ability to merge the folds into a single ensemble model before exporting to onnx?
  • Could you give more details on what preprocess_data() and postprocess_pred actually are here? I have given this my best shot for the past 4-5 hours, but I am still getting something slightly off (I am a software engineer by background). Currently I use normalisation as input_image = (input_image - input_image.mean()) / max(input_image.std(), 1e-8) and postprocessing as np.argmax. I imagine I am missing lots or am entirely wrong with this, but I struggled with nnunet's codebase to find more than that.

Using predictor.predict_from_list_of_npy_arrays
Using ONNX

@thangngoc89
Copy link
Contributor

thangngoc89 commented Nov 21, 2024

@Sharpz7

do you have the ability to merge the folds into a single ensemble model before exporting to onnx

I don't think onnx support this

Could you give more details on what preprocess_data() and postprocess_pred actually are here?

Fort the source of truth, check nnUNet's current predict code here
https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/inference/predict_from_raw_data.py

  • The preprocessing involved resampling and normalization
  • Inference includes sliding window prediction (with Gaussian weight), test time augmentation, ensembling of all folds.
  • The postprocessing involved resampling to original spacing and keep only largest connected components, depending on the post-processing plan.

Currently I use normalisation as input_image = (input_image - input_image.mean()) / max(input_image.std(), 1e-8)

You can see normalization explanation here https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/explanation_normalization.md

The 0.5 and 99.5 percentile cutoff values could be extracted from the plan files

@Sharpz7
Copy link

Sharpz7 commented Nov 21, 2024

Thanks @thangngoc89

@FabianIsensee, at least for me, the reason I am going along this route is as a ML Engineer trying to implement this work, I have my own pre-processing and post-processing of the data, so having 2x of each, with some fully integrated into your pipeline, and mine not, just adds to the complexity. If you can think of a better solution for me, then let me know.

I have been unable to reproduce native results, so I am going to ditch my attempts for now (I got as far as being able to prove I have the pre-processing replicated up to the normalisation stage) , but I imagine I am going to run into more headaches getting nnunet integrated into a wider system anyways, so I might be back :))

It is also not immediately clear to me how much of the "model" is inside ONNX, my suspicion is some of the preprocessing at the very least is, because I can input images that are normalised or not normalised to ONNX, and get the same output.

Would be great if we can get this PR up with an example, at least for the default preprocessing and postprocessing workflows (i.e znorm + crop + pad for when window is larger than cropped image)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants