[Bug]: Linux: SDXL-based models fail to load, PyTorch error #15566

prmbittencourt · 2024-04-18T15:39:28Z

Checklist

The issue exists after disabling all extensions
The issue exists on a clean installation of webui
The issue is caused by an extension, but I believe it is caused by a bug in the webui
The issue exists in the current version of the webui
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

Whenever I select an SDXL model from the dropdown list at the top of the page, including the SDXL base model, it fails to load. The terminal output shows the following error: AttributeError: module 'torch' has no attribute 'float8_e4m3fn'.

Steps to reproduce the problem

Launch the WebUI.
Click the "down" arrow below "Stable Diffusion checkpoint" at the top left of the page.
Select an SDXL model from the dropdown list.
After a few seconds processing, the error will be printed to the terminal output and the selection will return to the previously selected model.

What should have happened?

The model should load.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo-2024-04-18-15-34.json

Console logs

################################################################
Launching launch.py...
################################################################
Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
Version: v1.9.0
Commit hash: adadb4e3c7382bf3e4f7519126cd6c70f4f8557b
Launching Web UI with arguments: --skip-torch-cuda-test --upcast-sampling --opt-sub-quad-attention --medvram-sdxl
2024-04-18 12:28:22.419346: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
==============================================================================
You are running torch 2.0.1+rocm5.4.2.
The program is tested to work with torch 2.1.2.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
*** "Disable all extensions" option was set, will only load built-in extensions ***
Loading weights [fbc31a67aa] from /opt/stable-diffusion-web-ui/models/Stable-diffusion/instruct-pix2pix-00-22000.safetensors
Running on local URL:  http://127.0.0.1:7860
Creating model from config: /opt/stable-diffusion-web-ui/configs/instruct-pix2pix.yaml
LatentDiffusion: Running in eps-prediction mode
Applying attention optimization: sub-quadratic... done.
Model loaded in 2.1s (load weights from disk: 0.5s, create model: 0.2s, apply weights to model: 1.1s, calculate empty prompt: 0.2s).

To create a public link, set `share=True` in `launch()`.
Startup time: 17.6s (import torch: 2.6s, import gradio: 1.1s, setup paths: 10.3s, other imports: 0.4s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 2.2s).
Loading model sd_xl_base_1.0.safetensors [31e35c80fc] (2 out of 2)
Loading weights [31e35c80fc] from /opt/stable-diffusion-web-ui/models/Stable-diffusion/sd_xl_base_1.0.safetensors
Creating model from config: /opt/stable-diffusion-web-ui/repositories/generative-models/configs/inference/sd_xl_base.yaml
changing setting sd_model_checkpoint to sd_xl_base_1.0.safetensors [31e35c80fc]: AttributeError
Traceback (most recent call last):
  File "/opt/stable-diffusion-web-ui/modules/options.py", line 165, in set
    option.onchange()
  File "/opt/stable-diffusion-web-ui/modules/call_queue.py", line 13, in f
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/stable-diffusion-web-ui/modules/initialize_util.py", line 181, in <lambda>
    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: sd_models.reload_model_weights()), call=False)
                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 860, in reload_model_weights
    sd_model = reuse_model_from_already_loaded(sd_model, checkpoint_info, timer)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 826, in reuse_model_from_already_loaded
    load_model(checkpoint_info)
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 748, in load_model
    load_model_weights(sd_model, checkpoint_info, state_dict, timer)
  File "/opt/stable-diffusion-web-ui/modules/sd_models.py", line 448, in load_model_weights
    module.to(torch.float8_e4m3fn)
              ^^^^^^^^^^^^^^^^^^^
AttributeError: module 'torch' has no attribute 'float8_e4m3fn'

Additional information

SD1.5 models work. Tested on fully up-to-date EndeavourOS.

The text was updated successfully, but these errors were encountered:

w-e-w · 2024-04-18T17:05:40Z

hint

==============================================================================
You are running torch 2.0.1+rocm5.4.2.
The program is tested to work with torch 2.1.2.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
============================================================================

A big improvement for dtype casting system with fp8 storage type and manual cast #14031

prmbittencourt · 2024-04-19T15:27:11Z

Hi, thanks for your input. I ran the script with --reinstall-torch and am now on Torch 2.2.2+rocm5.7. Loading the SDXL model works but every time I generate an image, I get the following error:

==========================================================================================s/it]
A tensor with all NaNs was produced in VAE.
Web UI will now convert VAE into 32-bit float and retry.
To disable this behavior, disable the 'Automatically revert VAE to 32-bit floats' setting.
To always start with 32-bit VAE, use --no-half-vae commandline flag.
==========================================================================================

I'm not sure if it's related to the original problem or not.

w-e-w · 2024-04-19T15:52:51Z

that is not an error message

a message does not equal error message
it is telling you what's happening
if it is actually an error you won't say SDXL model works but every time

what vae are you using sdxl-vae-fp16-fix
if you are not using the nan fix onw then irrc it's it is quite likely to get nan in vae
download https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl.vae.safetensors and place it in you vae dir
select it when using sdxl in setting / quick setting or configure XL models to use this vae in the checkpoints tab card icon

I never tried fp8 my self so I can't be sure,
but if you want my guess assuming if fp8 is used during vae then I suspect it will only increase the chance of nan

prmbittencourt · 2024-04-19T19:18:52Z

Toggling the fp8 option seems to have fixed it.

kode54 · 2024-05-01T21:43:07Z

Doesn't SDXL require Python 3.10 and not anything newer?

w-e-w · 2024-05-01T23:53:36Z

3.10 is what we test webui on
it doesn't necessarily mean that it wouldn't work with other versions

but if you're using a new version you might run into issues like if you're packing version is too new and the package hasn't been updated for that version yet
or on the other hand the version of python can be too new and the package we use is no longer available on that version

prmbittencourt added the bug-report Report of a bug, yet to be confirmed label Apr 18, 2024

w-e-w added asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance and removed bug-report Report of a bug, yet to be confirmed labels Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Linux: SDXL-based models fail to load, PyTorch error #15566

[Bug]: Linux: SDXL-based models fail to load, PyTorch error #15566

prmbittencourt commented Apr 18, 2024

w-e-w commented Apr 18, 2024

prmbittencourt commented Apr 19, 2024

w-e-w commented Apr 19, 2024 •

edited

Loading

prmbittencourt commented Apr 19, 2024

kode54 commented May 1, 2024

w-e-w commented May 1, 2024

[Bug]: Linux: SDXL-based models fail to load, PyTorch error #15566

[Bug]: Linux: SDXL-based models fail to load, PyTorch error #15566

Comments

prmbittencourt commented Apr 18, 2024

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

w-e-w commented Apr 18, 2024

prmbittencourt commented Apr 19, 2024

w-e-w commented Apr 19, 2024 • edited Loading

prmbittencourt commented Apr 19, 2024

kode54 commented May 1, 2024

w-e-w commented May 1, 2024

w-e-w commented Apr 19, 2024 •

edited

Loading