Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TencentARC PhotoMaker support #179

Merged
merged 65 commits into from
Mar 12, 2024
Merged

Conversation

bssrdf
Copy link
Contributor

@bssrdf bssrdf commented Feb 20, 2024

Hi, this is an implementation of TencentARC PhotoMaker. I am putting it in draft mode for now to get feedbacks.

image image
image image
image image
image image
image image

The results are not quite as good as the official version and I have to turn on vae tiling because of 8GB GPU mem limit and vae tiling has seam issues visible.

ID fidelity has been improved and an option to offload VAE latent decode to cpu was added. Now quality is much better.

Note: this requires an upstream fix in GGML.

bssrdf added 30 commits February 2, 2024 18:36
…sformer when batch size > 1 (to be investigated)
@bssrdf
Copy link
Contributor Author

bssrdf commented Feb 27, 2024

@bssrdf DGGML_CUDA_FORCE_MMQ Is this option strictly necessary?

pmid.h has a lot of commented code.

OOps, that option is only for older GPUs without tensor cores. I have removed it and also cleaned up comments in pmid.hpp.
Thanks for reviewing.

@leejet
Copy link
Owner

leejet commented Feb 27, 2024

Great! I'll find time in the next few days to review your changes.

blocks["self_attn"] = std::shared_ptr<GGMLBlock>(new MultiheadAttention2(d_model, n_head));

blocks["self_attn"] = std::shared_ptr<GGMLBlock>(new MultiheadAttention(d_model, n_head, true, atten1));

blocks["layer_norm1"] = std::shared_ptr<GGMLBlock>(new LayerNorm(d_model));
blocks["layer_norm2"] = std::shared_ptr<GGMLBlock>(new LayerNorm(d_model));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saw this while skimming.
you generally want to use std::make_shared<LayerNorm>(d_model) instead, it offers a bunch of small optimizations and clean up the code. (it implicitly casts to std::shared_ptr<GGMLBlock>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, but these constructs are not from this PR and they already exist in many other places. Maybe a separate PR can address them. Thanks for reviewing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, should be looked at after this merges.

@leejet
Copy link
Owner

leejet commented Mar 3, 2024

I have reviewed the entire architectural changes and made some improvements. Everything is okay, but there are some code portions that I think can be reused. I will finish the modifications shortly, and then we can merge. Thank you for your amazing work.

@@ -145,7 +145,7 @@ SD_API sd_image_t* txt2img(sd_ctx_t* sd_ctx,
float control_strength,
float style_strength,
bool normalize_input,
std::vector<sd_image_t*> &input_id_images);
const char* input_id_images_path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally not a fan of using paths here instead of images.
But I do think we should provide a convenience function that loads an image from path, since that is an common use case (but not always the case).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the API, passing image data in memory is more elegant than directly passing a file path. I will take some time later to explore how to improve that.

@bssrdf
Copy link
Contributor Author

bssrdf commented Mar 3, 2024

I have reviewed the entire architectural changes and made some improvements. Everything is okay, but there are some code portions that I think can be reused. I will finish the modifications shortly, and then we can merge. Thank you for your amazing work.

Sounds good. Thanks for the code reusing updates. Let me know if you find CLIPVisionModel can be customized/shared so CLIPVisionModel2 can be removed.

@leejet
Copy link
Owner

leejet commented Mar 9, 2024

@bssrdf I have completed the code reuse for the CLIPVisionModel and LoraModel. I have tested it on my machine, and everything is working well. Could you please take some time to test it as well? If there are no issues, I will proceed to merge this pull request.

@bssrdf
Copy link
Contributor Author

bssrdf commented Mar 9, 2024

@bssrdf I have completed the code reuse for the CLIPVisionModel and LoraModel. I have tested it on my machine, and everything is working well. Could you please take some time to test it as well? If there are no issues, I will proceed to merge this pull request.

Thanks, @leejet. Appreciate your time consolidating CLIPVisionModel and LoraModel.
My first try of running PhotoMaker crashed with segfault and cuda OOM during building PM's lora model :

ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1, VMM: yes
[INFO ] stable-diffusion.cpp:158  - loading model from '../models/RealVisXL_V3.0.safetensors'
[INFO ] model.cpp:705  - load ../models/RealVisXL_V3.0.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[WARN ] stable-diffusion.cpp:193  - !!!It looks like you are using SDXL model. If you find that the generated images are completely black, try specifying SDXL VAE FP16 Fix with the --vae parameter. You can find it here: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors
[INFO ] stable-diffusion.cpp:233  - VAE Autoencoder: Using CPU backend
[INFO ] model.cpp:705  - load ../models/photomaker-v1.safetensors using safetensors format
[INFO ] lora.hpp:38   - loading LoRA from '../models/photomaker-v1.safetensors'
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 354.38 MiB on device 0: cudaMalloc failed: out of memory
[ERROR] ggml_extend.hpp:873  - lora alloc params backend buffer failed
[ERROR] model.cpp:1390 - read tensor data failed: '../models/photomaker-v1.safetensors'
Segmentation fault

@leejet
Copy link
Owner

leejet commented Mar 10, 2024

@bssrdf I tried your latest commit 16c9da0 and my latest commit df28af9. Looking at the logs, there doesn't seem to be much difference in VRAM usage between the two. Could it be that there were other processes on your system using a significant amount of VRAM at that time and causing insufficient availability?

16c9da0

[INFO ] stable-diffusion.cpp:157  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:168  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:881  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:881  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:881  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[INFO ] pmid.hpp:405  - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:881  - lora_pmid params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] pmid.hpp:450  - finished loaded lora
[INFO ] stable-diffusion.cpp:266  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:881  - pmid params backend buffer size =  1243.48 MB (407 tensors)

df28af9

[INFO ] stable-diffusion.cpp:158  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:169  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:878  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:878  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:878  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:878  - lora params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:264  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:878  - pmid params backend buffer size =  1243.48 MB (407 tensors)

@leejet
Copy link
Owner

leejet commented Mar 10, 2024

I added a parameter, --clip-on-cpu, to run CLIP on the CPU. This should reduce VRAM usage by ~2GB for PhotoMaker.
no --clilp-on-cpu

[INFO ] stable-diffusion.cpp:414  - total params memory size = 7182.38MB (VRAM 7182.38MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)

--clilp-on-cpu

[INFO ] stable-diffusion.cpp:414  - total params memory size = 7182.38MB (VRAM 4994.54MB, RAM 2187.84MB): clip 1564.36MB(RAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(RAM)

@bssrdf
Copy link
Contributor Author

bssrdf commented Mar 11, 2024

@bssrdf I tried your latest commit 16c9da0 and my latest commit df28af9. Looking at the logs, there doesn't seem to be much difference in VRAM usage between the two. Could it be that there were other processes on your system using a significant amount of VRAM at that time and causing insufficient availability?

16c9da0

[INFO ] stable-diffusion.cpp:157  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:168  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:676  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:881  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:881  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:881  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[INFO ] pmid.hpp:405  - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:881  - lora_pmid params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1315 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] pmid.hpp:450  - finished loaded lora
[INFO ] stable-diffusion.cpp:266  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:676  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:742  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:881  - pmid params backend buffer size =  1243.48 MB (407 tensors)

df28af9

[INFO ] stable-diffusion.cpp:158  - loading model from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors'
[INFO ] stable-diffusion.cpp:169  - loading vae from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] model.cpp:705  - load ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors'
[INFO ] stable-diffusion.cpp:181  - Stable Diffusion XL
[INFO ] stable-diffusion.cpp:187  - Stable Diffusion weight type: f16
[DEBUG] stable-diffusion.cpp:188  - ggml tensor size = 432 bytes
[DEBUG] ggml_extend.hpp:878  - clip params backend buffer size =  1564.36 MB (713 tensors)
[DEBUG] ggml_extend.hpp:878  - unet params backend buffer size =  4900.07 MB (1680 tensors)
[DEBUG] ggml_extend.hpp:878  - vae params backend buffer size =  94.47 MB (140 tensors)
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[INFO ] lora.hpp:38   - loading LoRA from '..\models\photomaker-v1.safetensors'
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] ggml_extend.hpp:878  - lora params backend buffer size =  354.38 MB (10240 tensors)
[DEBUG] model.cpp:1343 - loading tensors from ..\models\photomaker-v1.safetensors
[DEBUG] lora.hpp:74   - finished loaded lora
[INFO ] stable-diffusion.cpp:264  - loading stacked ID embedding (PHOTOMAKER) model file from '..\models\photomaker-v1.safetensors'
[INFO ] model.cpp:705  - load ..\models\photomaker-v1.safetensors using safetensors format
[DEBUG] model.cpp:771  - init from '..\models\photomaker-v1.safetensors'
[DEBUG] ggml_extend.hpp:878  - pmid params backend buffer size =  1243.48 MB (407 tensors)

@leejet, yes, the segfault and OOM error are caused by my WSL instance. Restarting it fixed the issue. My test of Photomaker runs ok,except for a "double free or corruption (fasttop)" error at the end. I think it is ready for the final merge. Many thanks.

@leejet
Copy link
Owner

leejet commented Mar 12, 2024

Great! Thank you for your contribution.

@leejet leejet merged commit a469688 into leejet:master Mar 12, 2024
9 checks passed
@bssrdf bssrdf mentioned this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants