-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--save_state doesn't produce anything? #1921
Comments
Intressting.. No, nothing like that.. It just says the first part "saving checkpoint C:.............." Does one have to set any specific after --save_state ? I thought if one used --Save_state it would save after every epoch generated? I'm going to test --save_state_on_train_end now that I read in the readme.. but still it would be nice if it saved on every checkpoint generated.. |
Double checking the code for saving states suggests that it will only save state when also saving a checkpoint, ie, if you set it so save every N steps or every N epoch, if you have save_state set to true, then it'll save a state along with the checkpoint that you can resume training with. Without know your exact settings, I can only assume that you are probably missing out on the save every N steps or N epoch option Lines 1032 to 1044 in 6e3c1d0
|
When I train Loras with Kohya I want to be able to resume my training in case I need to pause it for some reason. I've been experimenting with the --Save_state command but it doesn't do anything.. Nothing get's created in my set --output_dir..
Am I missing something? I thought a folder with relevant items should get created with each safetensor file no ?
Also when I get this to work.. Do I use the --resume command like this --resume /folder/with/resume/files ?
The text was updated successfully, but these errors were encountered: