Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted images on Vukan backend #439

Open
wbruna opened this issue Oct 19, 2024 · 7 comments · Fixed by ggerganov/llama.cpp#10496 · May be fixed by #509
Open

Corrupted images on Vukan backend #439

wbruna opened this issue Oct 19, 2024 · 7 comments · Fixed by ggerganov/llama.cpp#10496 · May be fixed by #509

Comments

@wbruna
Copy link

wbruna commented Oct 19, 2024

I'm getting either corrupted or inconsistent images on Vulkan backend, for any resolution other than 512x512.

My system is a Linux PC with a Ryzen 3400G, almost-vanilla Debian 12 (using the distro graphic stack). All following tests with:
--type f16 --lora-model-dir ./LoRA --model ./SD/dreamshaper_8.safetensors --prompt 'a fantasy character, detailed background, colorful<lora:lcm-lora-sdv1-5:1>' --cfg-scale 1.0 --sampling-method lcm --steps 4 --rng cuda --seed 42 -b 1 --color
, and a script alternating resolution and compiled binary (Vulkan or CPU backend).

320x512:

Vulkan 1 Vulkan 2 CPU
320x512_vulkan1 320x512_vulkan2 320x512_cpu

Vulkan images look ok-ish (for such a small resolution anyway), but the same seed should produce the same image. And the CPU render looks quite different.

Second test, 384x384; similar behavior (changes between Vulkan 1 and 2 may not be apparent on the thumbnail):

Vulkan 1 Vulkan 2 CPU
384x384_vulkan1 384x384_vulkan2 384x384_cpu

The third test, 448x448, gets weird:

Vulkan 1 Vulkan 2 CPU
448x448_vulkan1 448x448_vulkan2 448x448_cpu

At first, I blamed my PC drivers. But then, the 512x512 test:

Vulkan 1 Vulkan 2 CPU
512x512_vulkan1 512x512_vulkan2 512x512_cpu

Looks absolutely fine, and identical between Vulkan and CPU.

In summary:

  • 512x512 works fine;
  • any other resolution produces inconsistent images between runs;
  • some resolutions introduce artifacts

(related: #122 )

@stduhpf
Copy link
Contributor

stduhpf commented Oct 19, 2024

Good catch. Using the same prompt, i get a smilar behavior as you.
Though I don't get anything as dramatic as your 448x448 results. (I only get variations of your "Vulkan 1" images, no matter how many times I try).

I might try to investigate what's going on, but I'm not confident I'll figure it out.

@stduhpf
Copy link
Contributor

stduhpf commented Oct 19, 2024

@0cc4m Do you have a clue?

@stduhpf
Copy link
Contributor

stduhpf commented Oct 19, 2024

This also happens with images bigger than 512x512 if the resolution isn't a multiple of 128...

@0cc4m
Copy link

0cc4m commented Oct 25, 2024

Thank you for the detailed report, I'll look into it when I find some time.

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this issue Nov 25, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
@jeffbolznv
Copy link

I used GGML_VULKAN_CHECK_RESULTS to narrow down that group_norm was failing, reproduced it with a backend test, and pushed a fix to ggerganov/llama.cpp#10496.

BTW, GGML_VULKAN_CHECK_RESULTS is really helpful, but it looks like it may get harder to build with the recent backend split.

0cc4m pushed a commit to ggerganov/llama.cpp that referenced this issue Nov 26, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
@0cc4m
Copy link

0cc4m commented Nov 26, 2024

BTW, GGML_VULKAN_CHECK_RESULTS is really helpful, but it looks like it may get harder to build with the recent backend split.

Yeah, I built it to narrow down these kinds of model issues. I hope we can keep it around.

ggerganov pushed a commit to ggerganov/ggml that referenced this issue Dec 3, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
@wbruna
Copy link
Author

wbruna commented Dec 3, 2024

I manually applied 56d8a95 on a local build, and it seems to fix this issue. Thanks!

ggerganov pushed a commit to ggerganov/ggml that referenced this issue Dec 3, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
@stduhpf stduhpf linked a pull request Dec 4, 2024 that will close this issue
ggerganov pushed a commit to ggerganov/whisper.cpp that referenced this issue Dec 5, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
ggerganov pushed a commit to ggerganov/whisper.cpp that referenced this issue Dec 8, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
arthw pushed a commit to arthw/llama.cpp that referenced this issue Dec 20, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
github-actions bot pushed a commit to martin-steinegger/ProstT5-llama that referenced this issue Dec 30, 2024
Fix bad calculation of the end of the range. Add a backend test that
covers the bad case (taken from stable diffusion).

Fixes leejet/stable-diffusion.cpp#439.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants