Add TencentARC PhotoMaker support #179

bssrdf · 2024-02-20T00:43:27Z

Hi, this is an implementation of TencentARC PhotoMaker. I am putting it in draft mode for now to get feedbacks.

~~The results are not quite as good as the official version and I have to turn on vae tiling because of 8GB GPU mem limit and vae tiling has seam issues visible.~~

ID fidelity has been improved and an option to offload VAE latent decode to cpu was added. Now quality is much better.

Note: this requires an upstream fix in GGML.

…r tensor buffers

…word's token id

…sformer when batch size > 1 (to be investigated)

… images

…ion transformer; getting better results

… working

bssrdf · 2024-02-27T14:38:25Z

@bssrdf DGGML_CUDA_FORCE_MMQ Is this option strictly necessary?

pmid.h has a lot of commented code.

OOps, that option is only for older GPUs without tensor cores. I have removed it and also cleaned up comments in pmid.hpp.
Thanks for reviewing.

leejet · 2024-02-27T16:23:28Z

Great! I'll find time in the next few days to review your changes.

…LIPvision

…o input images

Green-Sky · 2024-02-28T23:23:59Z

clip.hpp

-            blocks["self_attn"]   = std::shared_ptr<GGMLBlock>(new MultiheadAttention2(d_model, n_head));
+
+        blocks["self_attn"]   = std::shared_ptr<GGMLBlock>(new MultiheadAttention(d_model, n_head, true, atten1));
+
        blocks["layer_norm1"] = std::shared_ptr<GGMLBlock>(new LayerNorm(d_model));
        blocks["layer_norm2"] = std::shared_ptr<GGMLBlock>(new LayerNorm(d_model));


saw this while skimming.
you generally want to use std::make_shared<LayerNorm>(d_model) instead, it offers a bunch of small optimizations and clean up the code. (it implicitly casts to std::shared_ptr<GGMLBlock>)

Fair point, but these constructs are not from this PR and they already exist in many other places. Maybe a separate PR can address them. Thanks for reviewing.

yea, should be looked at after this merges.

This reverts commit 27887b6.

leejet · 2024-03-03T08:13:49Z

I have reviewed the entire architectural changes and made some improvements. Everything is okay, but there are some code portions that I think can be reused. I will finish the modifications shortly, and then we can merge. Thank you for your amazing work.

Green-Sky · 2024-03-03T10:18:34Z

stable-diffusion.h

@@ -145,7 +145,7 @@ SD_API sd_image_t* txt2img(sd_ctx_t* sd_ctx,
                           float control_strength,
                           float style_strength,
                           bool normalize_input,
-                           std::vector<sd_image_t*> &input_id_images);
+                           const char* input_id_images_path);


personally not a fan of using paths here instead of images.
But I do think we should provide a convenience function that loads an image from path, since that is an common use case (but not always the case).

Yes, for the API, passing image data in memory is more elegant than directly passing a file path. I will take some time later to explore how to improve that.

bssrdf · 2024-03-03T23:43:34Z

I have reviewed the entire architectural changes and made some improvements. Everything is okay, but there are some code portions that I think can be reused. I will finish the modifications shortly, and then we can merge. Thank you for your amazing work.

Sounds good. Thanks for the code reusing updates. Let me know if you find CLIPVisionModel can be customized/shared so CLIPVisionModel2 can be removed.

leejet · 2024-03-09T11:27:34Z

@bssrdf I have completed the code reuse for the CLIPVisionModel and LoraModel. I have tested it on my machine, and everything is working well. Could you please take some time to test it as well? If there are no issues, I will proceed to merge this pull request.

bssrdf · 2024-03-09T17:36:37Z

@bssrdf I have completed the code reuse for the CLIPVisionModel and LoraModel. I have tested it on my machine, and everything is working well. Could you please take some time to test it as well? If there are no issues, I will proceed to merge this pull request.

Thanks, @leejet. Appreciate your time consolidating CLIPVisionModel and LoraModel.
My first try of running PhotoMaker crashed with segfault and cuda OOM during building PM's lora model :

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1, VMM: yes
[INFO ] stable-diffusion.cpp:158  - loading model from '../models/RealVisXL_V3.0.safetensors'
[INFO ] model.cpp:705  - load ../models/RealVisXL_V3.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[WARN ] stable-diffusion.cpp:193  - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[INFO ] stable-diffusion.cpp:233  - VAE Autoencoder: Using CPU backend
[INFO ] model.cpp:705  - load ../models/photomaker-v1.safetensors using safetensors format
[INFO ] lora.hpp:38   - loading LoRA from '../models/photomaker-v1.safetensors'
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 354.38 MiB on device 0: cudaMalloc failed: out of memory
[ERROR] ggml_extend.hpp:873  - lora alloc params backend buffer failed
[ERROR] model.cpp:1390 - read tensor data failed: '../models/photomaker-v1.safetensors'
Segmentation fault

leejet · 2024-03-10T07:44:36Z

@bssrdf I tried your latest commit 16c9da0 and my latest commit df28af9. Looking at the logs, there doesn't seem to be much difference in VRAM usage between the two. Could it be that there were other processes on your system using a significant amount of VRAM at that time and causing insufficient availability?

16c9da0

[INFO ] stable-diffusion.cpp:157  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:168  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:881  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:881  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:881  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[INFO ] pmid.hpp:405  - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:881  - lora_pmid params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] pmid.hpp:450  - finished loaded lora
[INFO ] stable-diffusion.cpp:266  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:881  - pmid params backend buffer size =  1243.48 MB (407 tensors)

df28af9

[INFO ] stable-diffusion.cpp:158  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:169  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:878  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:878  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:878  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:878  - lora params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:264  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:878  - pmid params backend buffer size =  1243.48 MB (407 tensors)

leejet · 2024-03-10T08:35:30Z

I added a parameter, --clip-on-cpu, to run CLIP on the CPU. This should reduce VRAM usage by ~2GB for PhotoMaker.
no --clilp-on-cpu

[INFO ] stable-diffusion.cpp:414  - total params memory size = 7182.38MB (VRAM 7182.38MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)

--clilp-on-cpu

[INFO ] stable-diffusion.cpp:414  - total params memory size = 7182.38MB (VRAM 4994.54MB, RAM 2187.84MB): clip 1564.36MB(RAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(RAM)

bssrdf · 2024-03-11T00:38:43Z

@bssrdf I tried your latest commit 16c9da0 and my latest commit df28af9. Looking at the logs, there doesn't seem to be much difference in VRAM usage between the two. Could it be that there were other processes on your system using a significant amount of VRAM at that time and causing insufficient availability?

16c9da0

[INFO ] stable-diffusion.cpp:157  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:168  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:881  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:881  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:881  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[INFO ] pmid.hpp:405  - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:881  - lora_pmid params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] pmid.hpp:450  - finished loaded lora
[INFO ] stable-diffusion.cpp:266  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:881  - pmid params backend buffer size =  1243.48 MB (407 tensors)

df28af9

[INFO ] stable-diffusion.cpp:158  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:169  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:878  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:878  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:878  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:878  - lora params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:264  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:878  - pmid params backend buffer size =  1243.48 MB (407 tensors)

@leejet, yes, the segfault and OOM error are caused by my WSL instance. Restarting it fixed the issue. My test of Photomaker runs ok,except for a "double free or corruption (fasttop)" error at the end. I think it is ready for the final merge. Many thanks.

leejet · 2024-03-12T15:14:25Z

Great! Thank you for your contribution.

bssrdf added 30 commits February 2, 2024 18:36

first efforts at implementing photomaker; lots more to do

5dbbea0

added PhotoMakerIDEncoder model in SD

cbfa702

fixed soem bugs; now photomaker model weights can be loaded into thei…

78651f2

…r tensor buffers

added input id image loading

7da51ad

added preprocessing inpit id images

702a732

finished get_num_tensors

7a7baef

fixed a bug in remove_duplicates

f16b4da

add a get_learned_condition_with_trigger function to do photomaker stuff

df7f642

add a convert_token_to_id function for photomaker to extract trigger …

ad7ec45

…word's token id

making progress; need to implement tokenizer decoder

87fcee0

making more progress; finishing vision model forward

7f5f580

debugging vision_model outputs

0a38d84

corrected clip vision model output

304704f

continue making progress in id fusion process

184e4b8

finished stacked id embedding; to be tested

024c187

remove garbage file

a584147

debuging graph compute

d939514

more progress; now alloc buffer failed

807c340

fixed wtype issue; input images can only be 1 because issue with tran…

5a13b48

…sformer when batch size > 1 (to be investigated)

added delayed subject conditioning; now photomaker runs and generates…

f4bf8e0

… images

fixed stat_merge_step

857af48

added photomaker lora model (to be tested)

b0579ec

reworked pmid lora

753231a

finished applying pmid lora; to be tested

44052e1

finalized pmid lora

c122507

add a few print tensor; tweak in sample again

539a94a

small tweak; still not getting ID faces

26c5cca

fixed a bug in FuseBlock forward; also remove diag_mask op in for vis…

191c5ec

…ion transformer; getting better results

disable pmid lora apply for now; 1 input image seems working; > 1 not…

26f591d

… working

turn pmid lora apply back on

62b0a9b

bssrdf added 4 commits February 27, 2024 11:38

add input image requirement in README

6079615

bring back freeing pmid lora params buffer; simply pooled output of C…

fd098af

…LIPvision

remove MultiheadAttention2; customized MultiheadAttention

70c3397

added a WIN32 get_files_from_dir; turn off Photomakder if receiving n…

16c9da0

…o input images

Green-Sky reviewed Feb 28, 2024

View reviewed changes

leejet added 8 commits March 3, 2024 14:42

update docs

41f20e6

fix ci error

27887b6

Merge branch 'master' into add-photomaker-support

983e552

make stable-diffusion.h a pure c header file

b0940f0

This reverts commit 27887b6.

fix ci error

f8c0831

format code

745ed8f

reuse get_learned_condition

6bb87cf

reuse pad_tokens

7e2c796

Green-Sky reviewed Mar 3, 2024

View reviewed changes

leejet added 2 commits March 9, 2024 17:38

reuse CLIPVisionModel

9b3c8d8

reuse LoraModel

df28af9

add --clip-on-cpu

6727d1c

fix lora name conversion for SDXL

df2afd8

leejet merged commit a469688 into leejet:master Mar 12, 2024
9 checks passed

bssrdf mentioned this pull request Mar 22, 2024

SDXL : LoRa problem #203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TencentARC PhotoMaker support #179

Add TencentARC PhotoMaker support #179

bssrdf commented Feb 20, 2024 •

edited

Loading

bssrdf commented Feb 27, 2024

leejet commented Feb 27, 2024

Green-Sky Feb 28, 2024

bssrdf Feb 29, 2024

Green-Sky Feb 29, 2024

leejet commented Mar 3, 2024

Green-Sky Mar 3, 2024

leejet Mar 9, 2024

bssrdf commented Mar 3, 2024

leejet commented Mar 9, 2024

bssrdf commented Mar 9, 2024

leejet commented Mar 10, 2024

leejet commented Mar 10, 2024

bssrdf commented Mar 11, 2024 •

edited

Loading

leejet commented Mar 12, 2024

Add TencentARC PhotoMaker support #179

Add TencentARC PhotoMaker support #179

Conversation

bssrdf commented Feb 20, 2024 • edited Loading

bssrdf commented Feb 27, 2024

leejet commented Feb 27, 2024

Green-Sky Feb 28, 2024

Choose a reason for hiding this comment

bssrdf Feb 29, 2024

Choose a reason for hiding this comment

Green-Sky Feb 29, 2024

Choose a reason for hiding this comment

leejet commented Mar 3, 2024

Green-Sky Mar 3, 2024

Choose a reason for hiding this comment

leejet Mar 9, 2024

Choose a reason for hiding this comment

bssrdf commented Mar 3, 2024

leejet commented Mar 9, 2024

bssrdf commented Mar 9, 2024

leejet commented Mar 10, 2024

leejet commented Mar 10, 2024

bssrdf commented Mar 11, 2024 • edited Loading

leejet commented Mar 12, 2024

bssrdf commented Feb 20, 2024 •

edited

Loading

bssrdf commented Mar 11, 2024 •

edited

Loading