Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD Segmentation Fault #1288

Open
CobeyH opened this issue Dec 8, 2023 · 25 comments
Open

AMD Segmentation Fault #1288

CobeyH opened this issue Dec 8, 2023 · 25 comments
Labels
bug (AMD) Something isn't working (AMD specific)

Comments

@CobeyH
Copy link

CobeyH commented Dec 8, 2023

Describe the problem
I am running Ubuntu with an AMD GPU. I configured my environment variables and set up rocminfo as suggested by this issue: #1079 .

The web page now launches successfully and it no longer shows an error that the GPU isn't detected. However, when I enter a text or image prompt and click the "Generate" button, a segmentation fault occurs.

** System Info ***
System: Ubuntu 22.04.3
CPU: AMD Ryzen 5 3600
GPU: AMD RX 6750XT
Python: 3.10.13
Environment: Venv

HCC_AMDGPU_TARGET=gfx1031
HSA_OVERRIDE_GFX_VERSION=10.3.2

Full Console Log
Update failed.
authentication required but no callback set
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.10.13 (main, Aug 25 2023, 13:20:03) [GCC 9.4.0]
Fooocus version: 2.1.824
Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch().
Total VRAM 12272 MB, total RAM 15903 MB
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon RX 6750 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/cobey/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/cobey/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/cobey/repos/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/cobey/repos/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 1.79 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 4950368496917309143
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[1] 10757 segmentation fault (core dumped) python entry_with_update.py

@NL-TCH
Copy link

NL-TCH commented Dec 9, 2023

got exactly the same on RX5700XT

python entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.11.6 (main, Oct  3 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)]
Fooocus version: 2.1.824
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 8176 MB, total RAM 31833 MB
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon RX 5700 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/user/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/user/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/user/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/user/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 1.57 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7295514245041223923
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)

@Khoraji
Copy link

Khoraji commented Dec 10, 2023

Same fault, core dumped 5700XT

@L226
Copy link

L226 commented Dec 10, 2023

Same here, followed #1079 successfully.

However I'm using Radeon graphics with my R7 pro 5850U. Tried with and without --use-split-cross-attention

Ubuntu 22.04.3, 6.1.66
AMD Ryzen 7 Pro 5850U
AMD Radeon Graphics
48 GB RAM

python entry_with_update.py --preset realistic --use-split-cross-attention
Update failed.
authentication required but no callback set
Update succeeded.
[System ARGV] ['entry_with_update.py', '--preset', 'realistic', '--use-split-cross-attention']
Loaded preset: /home/user/genai/Fooocus/presets/realistic.json
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Fooocus version: 2.1.824
Running on local URL:  http://127.0.0.1:7866

To create a public link, set `share=True` in `launch()`.
Total VRAM 4096 MB, total RAM 43960 MB
Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --normalvram
Set vram state to: LOW_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon Graphics : native
VAE dtype: torch.float32
Using split optimization for cross attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors
Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors].
Loaded LoRA [/home/user/genai/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors] with 788 keys at weight 0.25.
Loaded LoRA [/home/user/genai/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [/home/user/genai/Fooocus/models/checkpoints/realisticStockPhoto_v10.safetensors] with 264 keys at weight 0.25.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 2.21 seconds
App started successful. Use the app with http://127.0.0.1:7866/ or 127.0.0.1:7866
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 3.0
[Parameters] Seed = 6293613909801716834
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] ship on fire, dramatic, intricate, elegant, highly detailed, extremely new, professional, cinematic, artistic, sharp focus, color light, winning, romantic, smart, cute, epic, creative, cool, loving, attractive, pretty, charming, complex, amazing, passionate, charismatic, colorful, coherent, iconic, fine, vibrant, incredible, beautiful, awesome, pure
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] ship on fire, full color, cinematic, stunning, highly detailed, formal, serious, determined, elegant, professional, artistic, emotional, pretty, attractive, smart, charming, best, dramatic, sharp focus, beautiful, cute, modern, futuristic, surreal, iconic, fine detail, colorful, ambient light, dynamic, amazing, symmetry, intricate, elite, magical
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 8.59 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Segmentation fault (core dumped)

@Khoraji
Copy link

Khoraji commented Dec 10, 2023 via email

@galvani4987
Copy link

I'm running mint linux fully update and all.
Running on a AMD Ryzen 5600G + RX 5600 XT 6 Gb + 32gb DDR4.
I get the exact same segmentation fault (core dumped).
I have tested a bunch of args and VARIABLES but no luck.
I have installed rocm 5.7 but for every test i get a different error message and end up with a fail.
So i got back to the start and to this thread.
I hope someone figures it out.
Thanks a lot everyone, this is great and we are pretty close to making it work... i hope.

@galvani4987
Copy link

galvani4987 commented Dec 11, 2023

This has been published by lllyasviel:
#1327
I did enlarge my swapfile to 64G using this tutorial: https://linuxhandbook.com/increase-swap-ubuntu/
Reinstalled Fooocus from scratch and ran it.
About a minute or so after i hit Generate it gets stuck in "[Fooocus] Preparing Fooocus text #1 ..."
Then it segfaults.

@Robin-qwerty
Copy link

I have the same issue. Running arch and I have a RX 6750 XT, 32GB ram and 40GB swap

(fooocus_env) [root@ArchLinuxRobin Fooocus]# python entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.10.10 (main, Mar  5 2023, 22:26:53) [GCC 12.2.1 20230201]
Fooocus version: 2.1.835
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 12272 MB, total RAM 31955 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6750 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
Base model loaded: /root/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/root/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/root/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/root/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.24 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7930202201705363266
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)
(fooocus_env) [root@ArchLinuxRobin Fooocus]#

All I get to see in the browser is 'Waiting for task to start ...'

And my memory is barely used
image

@wnm210
Copy link

wnm210 commented Dec 18, 2023

This has been published by lllyasviel: #1327 I did enlarge my swapfile to 64G using this tutorial: https://linuxhandbook.com/increase-swap-ubuntu/ Reinstalled Fooocus from scratch and ran it. About a minute or so after i hit Generate it gets stuck in "[Fooocus] Preparing Fooocus text #1 ..." Then it segfaults.

same here, and it's stuck
image

@L226
Copy link

L226 commented Dec 19, 2023

Tried increasing swapfile (in my case - disabling existing 1G swap partition and creating /activating new 40G swap file) with cache pressure = 100, swappiness = 60), still segfaults:

...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1152, 896)
Preparation time: 10.88 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Segmentation fault (core dumped)

Looking at swap usage it didn't really use anything, also RAM util looked pretty low.

Running strace on the process showed some funny lookups, so I guess the AMD integration still needs work or I need to re-install some packages;

e.g.

[pid ****] access("/usr/local/games/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
[pid ****] access("/snap/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)
[pid ****] access("/snap/bin/amdgcn-amd-amdhsa-ld.lld", R_OK|X_OK) = -1 ENOENT (No such file or directory)

I will try to look more deeply into it after the break

@eVen-gits
Copy link

Getting segfault as well. I don't think it's a RAM issue (128GB).

Kernel: 6.6.7-4-MANJARO 
Uptime: 1 day, 22 hours, 59 mins 
Packages: 1184 (pacman), 11 (flatpak) 
Shell: bash 5.2.21 
Resolution: 3840x1600 
DE: Plasma 5.27.10 
WM: kwin 
Theme: [Plasma], Breeze [GTK2/3] 
Icons: [Plasma], breeze [GTK2/3] 
Terminal: konsole 
CPU: AMD Ryzen 5 5600X (12) @ 3.700GHz 
GPU: AMD ATI Radeon RX 5600 OEM/5600 XT / 5700/5700 XT 
Memory: 19460MiB / 128710MiB ```

@WYOhellboy
Copy link

Also getting segmentation fault:
CPU: AMD Ryzen 7 2700x
RAM: 48GB
GPU: AMD Radeon RX 7800xt
Swap: 55GB
Using Manjaro with Gnome as DE.

@mashb1t mashb1t added bug Something isn't working help wanted Extra attention is needed labels Dec 29, 2023
@klassiker
Copy link

Got segfaults as well, but managed to fix it. Here is what I found:

With whl/rocm5.6, I've got a plain segfault with no information. Excerpt from strace right before the segfault:

strace -ff python entry_with_update.py --preset realistic
.....
[pid  XXXX] ioctl(6, AMDKFD_IOC_MAP_MEMORY_TO_GPU, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_QUEUE, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x40} ---

After trying whl/nightly/rocm5.7, I've got a little bit of error information:

[pid  XXXX] ioctl(6, AMDKFD_IOC_MAP_MEMORY_TO_GPU, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_QUEUE, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] ioctl(6, AMDKFD_IOC_CREATE_EVENT, ...) = 0
[pid  XXXX] futex(..., FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid  XXXX] write(2, "Exception in thread Thread-2 (wo"..., 39Exception in thread Thread-2 (worker):
...
RuntimeError: HIP error: invalid device function

After finding ROCm/ROCm#2536 and trying strace -ff python -c 'import torch; torch.rand(3,3).to(torch.device("cuda"))', the same error appeared.

Using export HSA_OVERRIDE_GFX_VERSION=11.0.0 for gfx1100 from rocminfo both the simple test and entry_with_update.py run successfully. The segfault happened to me at the same locations, either on startup using preset realistic or when clicking Generate without a preset, so I guess it's the same issue as here.

For debugging the output of rocminfo | grep Name might help, also try all of whl/rocm5.6, whl/nightly/rocm5.6 and whl/nightly/rocm5.7 with the simple pytorch command in a clean environment using env -i bash, exporting HSA_OVERRIDE_GFX_VERSION to the appropriate value for your GPU. Also verify and check you are using the correct GPU if you have an iGPU. Also check if you can find the error at the same location with strace.

I guess #627 is related.

Hope this helps.

@merlinblack
Copy link

After reinstalling the dependencies today, I can run this without needing any env vars to override anything.
python -c 'import torch; torch.rand(3,3).to(torch.device("cuda"))'

However I still get a Segfault after clicking 'Generate'

#> python entry_with_update.py
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.11.7 (main, Dec 18 2023, 00:00:00) [GCC 13.2.1 20231205 (Red Hat 13.2.1-6)]
Fooocus version: 2.1.862
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
Total VRAM 12272 MB, total RAM 32035 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6700 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /home/nigel/prog/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/nigel/prog/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [/home/nigel/prog/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/nigel/prog/Fooocus/models/checkpoints/juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.68 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 8321946732629474494
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)

It does take a moment to break - but watching my ram usage, both vram and ram usage go up a little on startup, but not any higher after clicking generate.

@AstroJMo
Copy link

I have a 7950x and 7900 xtx. I disabled integrated graphics in my bios and I no longer get the segmentation fault. Running the test-rocm.py was showing that I had two rocm devices. I read on another forum that this might cause problems. Seems it was true for me at least.

@PiotrCe
Copy link

PiotrCe commented Jan 14, 2024

I'm using:
Ubuntu 22.04.3 LTS
RX 5700 XT

my rocminfo output:


Agent 2


Name: gfx1010
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 5700 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU


I had the Segmentation fault (core dumped) while using miniconda3. After switching to anaconda this error never appeared again. Now when I run HSA_OVERRIDE_GFX_VERSION=10.3.0 python entry_with_update.py the app starts and after clicking "Generate" I'm getting:

[Fooocus Model Management] Moving model(s) has taken 1.49 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 3497165507932006909
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
:0:rocdevice.cpp :2692: 2014655231 us: [pid:6187 tid:0x7fac53fff640] Callback: Queue 0x7fa9bdf00000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
Aborted (core dumped)

@Laurent-VueJS
Copy link

Laurent-VueJS commented Jan 17, 2024

I have a 7950x and 7900 xtx. I disabled integrated graphics in my bios and I no longer get the segmentation fault. Running the test-rocm.py was showing that I had two rocm devices. I read on another forum that this might cause problems. Seems it was true for me at least.

Just my 2 cents : For info I use the iGPU of the ryzen r9 7900x and I always have segment fault (or other errors) while I have only this (i)GPU. So multiple GPU might not be the problem but the iGPU might well be (?). I have seen on AMD specs that iGPU's are not officially supported by ROCM :-( NB : on windows (with directml) I can sometimes generate one picture on iGPU but only on "extreme speed" that uses about 40GB of Vram (my limit). Other settings use more than 40GB and the process stops when I reach this limit (probably due to a memory leak (?)

@Schweeeeeeeeeeeeeeee
Copy link

image
image
Same problem

@ttio2tech
Copy link
Contributor

My 5700XT can run Fooocus without issue. Although it's slow (2 minutes an image for extreme mode, 3 minutes an image for Speed mode). I also made a video at https://youtu.be/HgGZyNRA1Ns

@mashb1t
Copy link
Collaborator

mashb1t commented Feb 22, 2024

@CobeyH is this issue still present for you using the latest version of Fooocus or can it be closed?

@mashb1t mashb1t added question Further information is requested and removed help wanted Extra attention is needed labels Feb 22, 2024
@Schweeeeeeeeeeeeeeee
Copy link

Still present
$ python entry_with_update.py --preset realistic
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--preset', 'realistic']
Python 3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801]
Fooocus version: 2.1.865
Loaded preset: /home/boobs/Fooocus/presets/realistic.json
Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch().
Total VRAM 12272 MB, total RAM 31235 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6700 XT : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: /home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors
Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors].
Loaded LoRA [/home/boobs/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for UNet [/home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 788 keys at weight 0.25.
Loaded LoRA [/home/boobs/Fooocus/models/loras/SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for CLIP [/home/boobs/Fooocus/models/checkpoints/realisticStockPhoto_v20.safetensors] with 264 keys at weight 0.25.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Segmentation fault (core dumped)

@mashb1t mashb1t added bug (AMD) Something isn't working (AMD specific) and removed question Further information is requested bug Something isn't working labels Feb 23, 2024
@hqnicolas
Copy link

Runing here without no problem
https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd
to RX6000 use:
HSA_OVERRIDE_GFX_VERSION=10.3.0

@Schweeeeeeeeeeeeeeee
Copy link

Runing here without no problem https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd to RX6000 use: HSA_OVERRIDE_GFX_VERSION=10.3.0

How would i use HSA_OVERRIDE_GFX_VERSION=10.3.0

@Laurent-VueJS
Copy link

HSA_OVERRIDE_GFX_VERSION=xxxx must be placed before the command every time - on a single command (or you can make it permanent in your environment variables - google can tell you how :-) ). Pay attention that the number depends on your card model. Most common are 10.3.0 or 11.0.0 > lookup your card on the internet to be sure (or just try the 2 most common settings and you have 99% chance that one will work). nb : for me I tried the correct value and it still fails. Apparently ROCm does not provide support for some older or integrated AMD GPU's like mine (see the list of supported models on ROCm page). But CPU works very well and my other PC with Nvidia GPU also very well. I love Fooocus :-)

@Tedris
Copy link

Tedris commented May 2, 2024

I am getting the same running in Ubuntu with RX5700 and a 40GB swap, it gets stuck on Preparing Fooocus text 1 before coming back with Segfault

It works fine on Windows but I wanted to see if it would run faster on Linux.

@mashb1t mashb1t mentioned this issue May 4, 2024
5 tasks
@mikwee
Copy link

mikwee commented Jul 9, 2024

I'm on Fedora, GPU is Radeon RX 6600, CPU is Intel(R) Core(TM) i5-4690, RAM is 16GB. After I click "Generate", it takes a long time and then segfaults. I increased my swap size to 40GB (with a 32GB file added to a 8GB partition), restarted, and nothing changed. My console output is pretty much identical, but I'll copy-paste it anyway:

Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py']
Python 3.10.14 (main, Jun  3 2024, 17:19:22) [GCC 14.1.1 20240522 (Red Hat 14.1.1-4)]
Fooocus version: 2.4.3
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 8176 MB, total RAM 15917 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 AMD Radeon RX 6600 : native
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
IMPORTANT: You are using gradio version 3.41.2, however version 4.29.0 is available, please upgrade.
--------
Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: /home/testuser/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors
VAE loaded: None
Request to load LoRAs [('sd_xl_offset_example-lora_1.0.safetensors', 0.1)] for model [/home/testuser/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [/home/testuser/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/testuser/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 1.66 seconds
Started worker with PID 4277
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] CLIP Skip = 2
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 7406799653888165672
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
Segmentation fault (core dumped)

Hope this gets solved soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (AMD) Something isn't working (AMD specific)
Projects
None yet
Development

No branches or pull requests