What the hell is wrong with the repo, getting all weird images, don't care the steps.😣 #230

ZeroCool22 · 2023-04-19T23:04:29Z

Describe the bug

I installed everything following this guide: https://pastebin.com/uE1WcSxD

The only steps i did different are:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
sudo apt-get install cuda=11.8.0-1

Everything seems to works fine but (while training), when i create images with AUTOS's GUI (i tried INVOKEAI too, same issue) all the images looks weird.

In the next image, you can see the training data used and the result images when using AUTO's:

💡And this is not a matter of steps numbers, i tried with different steps counts (1000 - 4500) and always get same weirds images.

I used this script to do the conversion to ckpt: https://pastebin.com/ct6mTzAA

But as i said before, it's not a conversion script problem, because i used the model in Diffusers format in INVOKEAI and i get the same weird results.

If someone could tell me what is wrong with the repo would be great.

Reproduction

Training process console:

(diffusers) zerocool@DESKTOP-MMG43AJ:~/github/diffusers/examples/dreambooth$ ./my_training.sh
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/accelerator.py:249: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/transformers/modeling_utils.py:429: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/zerocool/anaconda3/envs/diffusers did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib: did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('runwayml/stable-diffusion-v1-5')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 118
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so...
/home/zerocool/anaconda3/envs/diffusers/lib/python3.9/site-packages/diffusers/configuration_utils.py:203: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Caching latents: 100%|██████████████████████████████████████████████████████████████████| 35/35 [00:05<00:00,  6.10it/s]
04/19/2023 19:46:10 - INFO - __main__ - ***** Running training *****
04/19/2023 19:46:10 - INFO - __main__ -   Num examples = 35
04/19/2023 19:46:10 - INFO - __main__ -   Num batches each epoch = 35
04/19/2023 19:46:10 - INFO - __main__ -   Num Epochs = 58
04/19/2023 19:46:10 - INFO - __main__ -   Instantaneous batch size per device = 1
04/19/2023 19:46:10 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
04/19/2023 19:46:10 - INFO - __main__ -   Gradient Accumulation steps = 1
04/19/2023 19:46:10 - INFO - __main__ -   Total optimization steps = 2000
Steps:   1%|▌                                                    | 19/2000 [00:33<45:12,  1.37s/it, loss=0.167, lr=1e-6]

Logs

No response

System Info

diffusers version: 0.15.0.dev0
Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: 3.9.16
PyTorch version (GPU?): 2.0.0+cu118 (True)
Huggingface_hub version: 0.13.4
Transformers version: 4.28.1
Accelerate version: 0.18.0
xFormers version: 0.0.18
Using GPU in script?: 1080 TI
Using distributed or parallel set-up in script?:

The text was updated successfully, but these errors were encountered:

ViddleShtix · 2023-07-10T21:20:56Z

Same here, but only when using 8 bit adam. Without it it works perfectly fine. It seems like 8 bit adam causes the model to overfit very quickly. I've been trying to find a workaround for days but haven't gotten anywhere so now I'm justing playing around with steps and learning rate until something works.

2kpr · 2023-08-01T04:40:33Z

Same here, but only when using 8 bit adam. Without it it works perfectly fine. It seems like 8 bit adam causes the model to overfit very quickly. I've been trying to find a workaround for days but haven't gotten anywhere so now I'm justing playing around with steps and learning rate until something works.

Check your bitsandbytes, there is a good chance you have a version above 0.35.0 and if so downgrade your bitsandbytes to 0.35.0 and train again. Basically there have been known issues with any bitsandbytes above 0.35.0 since the end of 2022 when using AdamW8bit, etc.

kohya-ss/sd-scripts#523

ZeroCool22 added the bug Something isn't working label Apr 19, 2023

2kpr mentioned this issue Aug 1, 2023

Wrong indented lines cause bugs for a long time bitsandbytes-foundation/bitsandbytes#659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What the hell is wrong with the repo, getting all weird images, don't care the steps.😣 #230

What the hell is wrong with the repo, getting all weird images, don't care the steps.😣 #230

ZeroCool22 commented Apr 19, 2023 •

edited

Loading

ViddleShtix commented Jul 10, 2023

2kpr commented Aug 1, 2023

What the hell is wrong with the repo, getting all weird images, don't care the steps.😣 #230

What the hell is wrong with the repo, getting all weird images, don't care the steps.😣 #230

Comments

ZeroCool22 commented Apr 19, 2023 • edited Loading

Describe the bug

Reproduction

Logs

System Info

ViddleShtix commented Jul 10, 2023

2kpr commented Aug 1, 2023

ZeroCool22 commented Apr 19, 2023 •

edited

Loading