-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Composer crashes when attempting to load sharded checkpoint #998
Comments
Hello @growlix , are you running this in If so, this issue was fixed in mosaicml/composer#2907 and released in v0.19.0, so you should upgrade your composer version. |
Thank you so much, @hanlint! We are running in |
@hanlint , we tried composer 0.19.0 but we are still hitting the issue . Is there any change to the config we need to make?
|
When attempting load a sharded checkpoint, we (@prigoyal and I) hit the following error:
Environment
To reproduce
Steps to reproduce the behavior:
fsdp_config.state_dict: sharded
in the config.load_path
to the directory containing the checkpoint files.Expected behavior
The checkpoint should be loaded and the model should continue training and/or evaluating.
Additional context
The text was updated successfully, but these errors were encountered: