-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are there detailed results that I could read? #31
Comments
Hi Fabian, Thank you for your comments, and we greatly admire your work on nnU-Net. Also, thank you for patiently answering the many questions from us (Shivam). With the widespread success of the nnU-Net, we hypothesize that initializing these models with good starting points, particularly by learning representation from large-scale medical images via self-supervision, will boost nnU-Net performance, especially for those applications with limited annotation. Regarding your comments, indeed, when learning from scratch and fine-tuning Models Genesis N times each, the best score of fine-tuning is not necessarily higher than that of learning from scratch. However, in the competition, the best model is selected and submitted among multiple local runs. As of now, we didn't claim in our papers that fine-tuning nnU-Net outperforms learning nnU-Net from scratch. Instead, our paper demonstrates that fine-tuning Models Genesis can lead to a stabilized performance with a higher average score given multiple runs. Recently, we conducted an experiment on the LiTS/Task03 for 3 runs. Here are the results of liver tumor segmentation evaluated on the validation set:
So far, we pre-trained nnU-Net on the LUNA 2016 dataset only, meaning that the pre-trained model saw none of the images in those target datasets. Given that the nnU-Net configuration differs from task to task, we are not sure if pre-training the architecture individually for each task would be an efficient solution. We think semi-supervised learning, as you suggested, is worth trying. Thanks again for your comments, and we would greatly appreciate any further comments and suggestions that you may have. Zongwei & Shivam |
Hi Zongwei & Shivam, I can see how the different architectures for each dataset may be prohibitive, but there are ways of working around that. I am not quite sure how you implemented it, but you can try to just transfer the Let me know if you have any questions. |
You had me worried with your nnU-Net results repro :-D I was afraid that I broke something because you were unable to get the same Tumor Dice score as I was initially reporting. So I reran the trainings for 3D fullres (which is what you seem to be using) and I got 64.54 before and 65.53 after postprocessing (5-fold CV with the same splits as used for the number reported in our paper). So everything is fine. |
Hi FabianIsensee, |
Hi, |
Hi Fabian, The reproduced results that we have shared with you (62.61% +- 0.51%) are evaluated from 5-fold CV, which should correspond to "0.6372" in Table F.6 of https://arxiv.org/pdf/1904.08128.pdf. Unfortunately, as of now, we are not able to produce a dice score of 64.54% before and 65.53% after postprocessing with 3D_fullres configuration. As your reference, we attached some of the configurations
May we ask you to fine-tune our released pre-trained nnU-Net weights under your environment setup and see if there is performance gain? Besides, it would be really helpful if you could provide the mean and standard deviation of the validation performance if you have multiple runs for learning nnU-Net from scratch? We have noticed that performance fluctuation occurs, especially when training with varying environment configurations. Hence, stabilizing and elevating the overall performance is the primary purpose of developing Models Genesis. Thanks, |
Hi, |
Hi there,
I would really like to read some additional details about nnU-Net and models genesis. So far you seem to have taken the first place in Task03, but it is difficult so see whether that is a significant result (given that I am getting some variation when running the same training several times which will also translate into different test set performances). Overall, your submission on the Decathlon is below ours, indicating that the pretraining may not be beneficial on all tasks. This makes is difficult to really estimate the impact of your pre-training strategy.
Specifically, I would be interested in how much your pretrained models help in semi-supervised learning. Say you take all non-LiTS/Task03 (Task03 is essentially LiTS) datasets with livers in them (BCV Abdomen, KiTS, Pancreas (?), ...) and run models genesis on them for pretraining, how well does your pretrained nnU-Net perform when fine tuned on 10, 20, 50 etc LiTS cases for the LiTS task? Can you beat the nnU-Net baseline by a significant margin if you use all these additional datasets for pretraining?
Best,
Fabian
The text was updated successfully, but these errors were encountered: