-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TencentARC PhotoMaker support #179
Conversation
…sformer when batch size > 1 (to be investigated)
…ion transformer; getting better results
OOps, that option is only for older GPUs without tensor cores. I have removed it and also cleaned up comments in pmid.hpp. |
Great! I'll find time in the next few days to review your changes. |
blocks["self_attn"] = std::shared_ptr<GGMLBlock>(new MultiheadAttention2(d_model, n_head)); | ||
|
||
blocks["self_attn"] = std::shared_ptr<GGMLBlock>(new MultiheadAttention(d_model, n_head, true, atten1)); | ||
|
||
blocks["layer_norm1"] = std::shared_ptr<GGMLBlock>(new LayerNorm(d_model)); | ||
blocks["layer_norm2"] = std::shared_ptr<GGMLBlock>(new LayerNorm(d_model)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
saw this while skimming.
you generally want to use std::make_shared<LayerNorm>(d_model)
instead, it offers a bunch of small optimizations and clean up the code. (it implicitly casts to std::shared_ptr<GGMLBlock>
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, but these constructs are not from this PR and they already exist in many other places. Maybe a separate PR can address them. Thanks for reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, should be looked at after this merges.
I have reviewed the entire architectural changes and made some improvements. Everything is okay, but there are some code portions that I think can be reused. I will finish the modifications shortly, and then we can merge. Thank you for your amazing work. |
@@ -145,7 +145,7 @@ SD_API sd_image_t* txt2img(sd_ctx_t* sd_ctx, | |||
float control_strength, | |||
float style_strength, | |||
bool normalize_input, | |||
std::vector<sd_image_t*> &input_id_images); | |||
const char* input_id_images_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally not a fan of using paths here instead of images.
But I do think we should provide a convenience function that loads an image from path, since that is an common use case (but not always the case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the API, passing image data in memory is more elegant than directly passing a file path. I will take some time later to explore how to improve that.
Sounds good. Thanks for the code reusing updates. Let me know if you find CLIPVisionModel can be customized/shared so CLIPVisionModel2 can be removed. |
@bssrdf I have completed the code reuse for the CLIPVisionModel and LoraModel. I have tested it on my machine, and everything is working well. Could you please take some time to test it as well? If there are no issues, I will proceed to merge this pull request. |
Thanks, @leejet. Appreciate your time consolidating CLIPVisionModel and LoraModel.
|
@bssrdf I tried your latest commit 16c9da0 and my latest commit df28af9. Looking at the logs, there doesn't seem to be much difference in VRAM usage between the two. Could it be that there were other processes on your system using a significant amount of VRAM at that time and causing insufficient availability?
|
I added a parameter, [INFO ] stable-diffusion.cpp:414 - total params memory size = 7182.38MB (VRAM 7182.38MB, RAM 0.00MB): clip 1564.36MB(VRAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(VRAM)
[INFO ] stable-diffusion.cpp:414 - total params memory size = 7182.38MB (VRAM 4994.54MB, RAM 2187.84MB): clip 1564.36MB(RAM), unet 4900.07MB(VRAM), vae 94.47MB(VRAM), controlnet 0.00MB(VRAM), pmid 623.48MB(RAM) |
@leejet, yes, the segfault and OOM error are caused by my WSL instance. Restarting it fixed the issue. My test of Photomaker runs ok,except for a "double free or corruption (fasttop)" error at the end. I think it is ready for the final merge. Many thanks. |
Great! Thank you for your contribution. |
Hi, this is an implementation of TencentARC PhotoMaker. I am putting it in draft mode for now to get feedbacks.
The results are not quite as good as the official version and I have to turn on vae tiling because of 8GB GPU mem limit and vae tiling has seam issues visible.ID fidelity has been improved and an option to offload VAE latent decode to cpu was added. Now quality is much better.
Note: this requires an upstream fix in
GGML
.