-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too much time and ram while saving the inference results #2014
Comments
Today I tried to take inference a nifti with 1000x1000x1000 size and (0.2, 0.2, 0.2) pixel spacing. |
Hi @FabianIsensee , do you have any idea about this? |
You can try to split the bigger volume into multiple smaller patches. For example, you can split it into 9, 16 or 25 patches. Then you do the inference on each patch separately and ultimately aggregate the result into a segmentation for the bigger volume. |
There is no set standard for the volumes to be tested. There are also multiple models predicting different parts of the volume. Therefore, overlap between patches will be very difficult. Distortion at the margins will be inevitable. Despite this, do you recommend split the volume into small patches or would you have a different suggestion? |
Overlap between patches is not difficult, nnUNet already does this. You just need to patchify with overlap once more in order to reduce RAM usage and to speed up the inference. |
Thanks for reply! Do you mean physically splitting the test nifti file to 9 (or 16 or 25) pieces, or is there a simple way to do this in nnUnet? Can I split it into smaller patches by changing the "patch_size" parameter in the json? What determines the testing and recording time of a nifti? Why does a nifti of size 801x801x458 with (0.4, 0.4, 0.4, 0.4) pixel spacing take longer to test and record than a nifti of size 1000x1000x1000 with (0.2, 0.2, 0.2, 0.2) pixel spacing? |
I suggest to physically split the test nifti file (you can use the patchly library).
nnUNet does 3 things for inference:
It depends on the target spacing with which nnUNet model was trained because each case is resampled to that spacing. Or the 1000x1000x1000 case may have been cropped to a smaller size because it is zero on the margins. |
OK, I will try to split a file. |
I am looked patchly library but I didn't understand. |
You split the images into patches and then save them as nifti files. After predicting on all the patches, you aggregate the segmentation results into full-sized images. |
I tried to split nifti file to 27 patchs and inference them. |
To reduce RAM usage you can decrease the number of processes used for preprocessing and segmentation export (see |
Using process as 1 is reduced RAM usage, system is not crashed. |
I have the same issue, the export time is about 35s and the GPU inference time is only about 8s, after disable the TTA, the GPU inference time drop to 1s, but the export time is still 35s. So the performance bottleneck is the exporting. |
It indeed seems like you have issues with the segmentation export. So if you increase the number of workers the export is of course faster, however on the other hand you risk running out of RAM. Large 3D volumes are always tricky to work with. Regarding your issue with the overlapping patches, the nnUNetPredictor takes the following default arguments:
Here you can set the tile_step_size to a value higher than 0.5 (but maximum 1) in order to determine how much overlap there is between the patches. Higher overlap usually leads to better performance though. If you set use_mirroring = False (disabling test time augmentation) your inference will be much faster, again at the cost of performance. |
Closing. Feel free to reopen if you still have questions! |
Hey @YUjh0729 This is hard to judge form afar but my first guess would be that your local GPU does not have sufficient VRAM and fails for that particular FLARE case. nnUNet always tries to perform as many operations as possible on the GPU (already storing the image here instead of just the patches) in order to increase speed. If this fails then it falls back to using the GPU just for the individual patches and keeps the image on the CPU, this however takes longer. And this is also when you get this error message. You can set „perform_everything_on_device=False“ in the Predictor to immediately go with the second option. Best, Max |
Hi @mrokuss |
@YUjh0729 check my implementation https://github.com/pooya-mohammadi/nnUNet |
@pooya-mohammadi |
Hello!
I have trained two model with full resolution.
The first model have 32 class and the second is 5 class.
When i try to take test, sometimes inference time is too long.
Cases:
512x512x246 nifti,
32 class model: 4 seconds for inference, 3 minutes to save results (112 steps in tqdm)
5 class model: 1:29 minutes for inference, 20 seconds to save results (245 steps in tqdm)
801x801x458 nifti,
32 class model: 10:17 minutes for inference, 2:30 minutes to save results (1100 steps in tqdm)
5 class model: 11:12 minutes for inference, 53 minutes to save results (2080 steps in tqdm)
What is the difference?
Less class model takes more time.
Have can I calculate tqdm step size for each model and nifti?
Is there any way to take results as another type from nifti, for example json?
Because CPU saving is too slow and use too much RAM memory.
Sometimes it use all RAM and system crashes.
I have 220 GB RAM, Tesla V100 and 46 thread processor.
But saving the result is use only one thread.
What can I do for less inference time?
The text was updated successfully, but these errors were encountered: