-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loss=nan on 1660 SUPER 6GB #293
Comments
have you tried mixed precision fp32? |
Thanks for your response. |
Just choice "no" instead of fp16 for mixed_precision |
Hey I just stumbled on this thread with the same problem. I have a regular GTX 1660 6GB. I also run into VRAM issues if I do not go with I see this thread was closed 5 days ago. Was there a resolution? I suppose "not planned" suggests that there wasn't. I just wanted to confirm. In the meantime I've managed to just train some LoRAs on colab. |
I got the same NAN problem on my GTX 1660 6GB. I traced it and found it is caused by the I ran it on A5000 and the returns are correct. I think there are some errors in CUDA or GTX 1660. |
@kohya-ss |
Sorry for late reply. I think you can add immediately after
If this fix works, I will add an option to enable this. Please let me know the result! |
Yes, this works perfectly, thank you very much for your help! |
That's good! I will add an option to enable it. |
original author of the webui PR, it causes some noticable slowdown on non-turing cards; also holy cow you can lora on 6gb vram or is it a modded card? |
Hey,
I have a NVIDIA GeForce 1660 SUPER 6GB card, and I wanted to train LoRA models with it.
This is my configuration:
accelerate launch --num_cpu_threads_per_process 4 train_network.py --network_module="networks.lora" --pretrained_model_name_or_path=/mnt/models/animefull-final-pruned.ckpt --vae=/mnt/models/animevae.pt --train_data_dir=/mnt/datasets/character --output_dir=/mnt/out --output_name=character --caption_extension=.txt --shuffle_caption --prior_loss_weight=1 --network_alpha=128 --resolution=512 --enable_bucket --min_bucket_reso=320 --max_bucket_reso=768 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=0.0001 --text_encoder_lr=0.00005 --max_train_epochs=20 --mixed_precision=fp16 --save_precision=fp16 --use_8bit_adam --xformers --save_every_n_epochs=1 --save_model_as=safetensors --clip_skip=2 --flip_aug --color_aug --face_crop_aug_range="2.0,4.0" --network_dim=128 --max_token_length=225 --lr_scheduler=constant
The train directory's name is 3_Concept1, so 3 repetitions are used.
The script does not throw any errors, but loss=nan and corrupted unets are produced.
I've tried setting mixed_precision to no, but then I've run out of VRAM.
I've also tried disabling xformers, but again, I've run out of VRAM.
I've compiled xformers myself, using
pip install ninja && MAX_JOBS=4 pip install -v .
Also tried several other xformers versions, like 0.0.16 and the one suggested in the README.
Tried both CUDA 11.6 and 11.7.
Python version: 3.10.6
PyTorch version: torch==1.12.1+cu116 torchvision==0.13.1+cu116
Any help is much appreciated!
Thank you!
The text was updated successfully, but these errors were encountered: