Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading network weights with wrong state_dict fails silently #1508

Closed
matt3o opened this issue Jul 28, 2023 · 4 comments
Closed

Loading network weights with wrong state_dict fails silently #1508

matt3o opened this issue Jul 28, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@matt3o
Copy link
Contributor

matt3o commented Jul 28, 2023

Describe the bug
I spent a few hours wondering why my network won't work and which of my transforms are broken, only to find out that the loading of the network did not work. Monailabel via default uses the "model" key for the state_dict, whereas my code uses the "net" key.
However running MONAILabel, the network is just initialized randomly and the loading of the weights fails completely silently.

The easiest solution would be to change the default value of load_strict to True in BasicInferTask.
This would make a lot of sense for beginners. MONAILabel hides away the specifics of loading the network weights - the least I would expect is an error message telling me something may have gone wrong. In this case however it is certain that something went wrong so it would be even better for the program to fail imo.

@matt3o matt3o changed the title Loading network weights with wrong state_dict silently fails Loading network weights with wrong state_dict fails silently Jul 28, 2023
@tangy5
Copy link
Collaborator

tangy5 commented Jul 28, 2023

Thanks for the suggestions.
This could be an enhancing feature for the future. The key for loading model weights are not always "state_dict". This can be optimized.
The reason why the default is not set to "load_strict" to true is that most of the times users may not have all the weights matched. Falise will enable loading parts of pretrained weights. For this feature, we may need a argument that can adjust the configuration.

Thanks again for the suggestions, if you get time, you could raise a PR for your suggestions, I think they are useful to all users. Or we will put these items in TODO list and enhance in the normal dev plan. Thank you.

@matt3o
Copy link
Contributor Author

matt3o commented Jul 29, 2023

Thanks for the quick response @tangy5. Hm yeah I get that for many models you want that value to be False. Why not set it to true in the BasicInferTask and then to False in every sample-app/model, where it is actually needed?
For me as a noobie trying to add a new model this was a major pain.

I can add a PR however I'd need to know which of the two options you prefer:

  1. After loading the checkpoint, go through the keys and make sure the desired one is in the state_dict of checkpoint. If not print an error message but continue the program. This would only modify BasicInferTask._get_network(). No change in the previous behaviour apart from an error message stating that the weights might not have been loaded correctly or at least were only partial.
  2. Set the default value of load_strict to True in BasicInferTask and change the value in all available sample apps where it is not set so far to False.

@tangy5
Copy link
Collaborator

tangy5 commented Jul 30, 2023

I think the two items you proposed makes sense, feel free to raise a PR. You could add "-s" to your commit command, and our team can edit and test your branch, we will have CI/CD integration and unit tests and env build. I will also try and help if anything is missing or need discussion.

Thanks again.

@tangy5 tangy5 added the enhancement New feature or request label Aug 8, 2023
@matt3o
Copy link
Contributor Author

matt3o commented Aug 29, 2023

I'll close this issue now since the code for it was already merged.

However @tangy5 it would great if you can merge the second part as well: #1533
The first PR #1521 was merged before it was fully finished since I was still waiting for feedback. As it is the dev branch probably will unnecessarily breaks other models which is not what I wanted to achieve with the PR.

@matt3o matt3o closed this as completed Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants