Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Linux: SDXL-based models fail to load, PyTorch error #15566

Open
5 of 6 tasks
prmbittencourt opened this issue Apr 18, 2024 · 6 comments
Open
5 of 6 tasks

[Bug]: Linux: SDXL-based models fail to load, PyTorch error #15566

prmbittencourt opened this issue Apr 18, 2024 · 6 comments
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance

Comments

@prmbittencourt
Copy link

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

Whenever I select an SDXL model from the dropdown list at the top of the page, including the SDXL base model, it fails to load. The terminal output shows the following error: AttributeError: module 'torch' has no attribute 'float8_e4m3fn'.

Steps to reproduce the problem

  1. Launch the WebUI.
  2. Click the "down" arrow below "Stable Diffusion checkpoint" at the top left of the page.
  3. Select an SDXL model from the dropdown list.
  4. After a few seconds processing, the error will be printed to the terminal output and the selection will return to the previously selected model.

What should have happened?

The model should load.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo-2024-04-18-15-34.json

Console logs

################################################################
Launching launch.py...
################################################################
Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
Version: v1.9.0
Commit hash: adadb4e3c7382bf3e4f7519126cd6c70f4f8557b
Launching Web UI with arguments: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --medvram-sdxl
2024-04-18 12:28:22.419346: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
==============================================================================
You are running torch 2.0.1+rocm5.4.2.
The program is tested to work with torch 2.1.2.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
*** "Disable all extensions" option was set, will only load built-in extensions ***
Loading weights [fbc31a67aa] from /opt/stable-diffusion-web-ui/models/Stable-diffusion/instruct-pix2pix-00-22000.safetensors
Running on local URL:  http://127.0.0.1:7860
Creating model from config: /opt/stable-diffusion-web-ui/configs/instruct-pix2pix.yaml
LatentDiffusion: Running in eps-prediction mode
Applying attention optimization: sub-quadratic... done.
Model loaded in 2.1s (load weights from disk: 0.5s, create model: 0.2s, apply weights to model: 1.1s, calculate empty prompt: 0.2s).

To create a public link, set `share=True` in `launch()`.
Startup time: 17.6s (import torch: 2.6s, import gradio: 1.1s, setup paths: 10.3s, other imports: 0.4s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 2.2s).
Loading model sd_xl_base_1.0.safetensors [31e35c80fc] (2 out of 2)
Loading weights [31e35c80fc] from /opt/stable-diffusion-web-ui/models/Stable-diffusion/sd_xl_base_1.0.safetensors
Creating model from config: /opt/stable-diffusion-web-ui/repositories/generative-models/configs/inference/sd_xl_base.yaml
changing setting sd_model_checkpoint to sd_xl_base_1.0.safetensors [31e35c80fc]: AttributeError
Traceback (most recent call last):
  File "/opt/stable-diffusion-web-ui/modules/options.py", line 165, in set
    option.onchange()
  File "/opt/stable-diffusion-web-ui/modules/call_queue.py", line 13, in f
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/stable-diffusion-web-ui/modules/initialize_util.py", line 181, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: sd_models.reload_model_weights()), call=False)
                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 860, in reload_model_weights
    sd_model = reuse_model_from_already_loaded(sd_model, checkpoint_info, timer)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 826, in reuse_model_from_already_loaded
    load_model(checkpoint_info)
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 748, in load_model
    load_model_weights(sd_model, checkpoint_info, state_dict, timer)
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 448, in load_model_weights
    module.to(torch.float8_e4m3fn)
              ^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch' has no attribute 'float8_e4m3fn'

Additional information

SD1.5 models work. Tested on fully up-to-date EndeavourOS.

@prmbittencourt prmbittencourt added the bug-report Report of a bug, yet to be confirmed label Apr 18, 2024
@w-e-w
Copy link
Collaborator

w-e-w commented Apr 18, 2024

hint

==============================================================================
You are running torch 2.0.1+rocm5.4.2.
The program is tested to work with torch 2.1.2.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
============================================================================

@w-e-w w-e-w added asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance and removed bug-report Report of a bug, yet to be confirmed labels Apr 18, 2024
@prmbittencourt
Copy link
Author

Hi, thanks for your input. I ran the script with --reinstall-torch and am now on Torch 2.2.2+rocm5.7. Loading the SDXL model works but every time I generate an image, I get the following error:

==========================================================================================s/it]
A tensor with all NaNs was produced in VAE.
Web UI will now convert VAE into 32-bit float and retry.
To disable this behavior, disable the 'Automatically revert VAE to 32-bit floats' setting.
To always start with 32-bit VAE, use --no-half-vae commandline flag.
==========================================================================================

I'm not sure if it's related to the original problem or not.

@w-e-w
Copy link
Collaborator

w-e-w commented Apr 19, 2024

that is not an error message

a message does not equal error message
it is telling you what's happening
if it is actually an error you won't say SDXL model works but every time


what vae are you using sdxl-vae-fp16-fix
if you are not using the nan fix onw then irrc it's it is quite likely to get nan in vae
download https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl.vae.safetensors and place it in you vae dir
select it when using sdxl in setting / quick setting or configure XL models to use this vae in the checkpoints tab card icon


I never tried fp8 my self so I can't be sure,
but if you want my guess assuming if fp8 is used during vae then I suspect it will only increase the chance of nan

@prmbittencourt
Copy link
Author

Toggling the fp8 option seems to have fixed it.

@kode54
Copy link

kode54 commented May 1, 2024

Doesn't SDXL require Python 3.10 and not anything newer?

@w-e-w
Copy link
Collaborator

w-e-w commented May 1, 2024

3.10 is what we test webui on
it doesn't necessarily mean that it wouldn't work with other versions

but if you're using a new version you might run into issues like if you're packing version is too new and the package hasn't been updated for that version yet
or on the other hand the version of python can be too new and the package we use is no longer available on that version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance
Projects
None yet
Development

No branches or pull requests

3 participants