-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InstructPix2Pix pipeline support. #625
Conversation
ecd57f2
to
0eea205
Compare
Hi! I'd like to highlight a point with InstructPix2Pix Unet inference. Since it uses 3 inputs to the unet with text guidance and image guidance, I can't use static batch size with data parallel on 2 devices, so I'm passing dynamic_batch_size=True (so I'm splitting it 2:1). But it is a setting for all models, so it is a suboptimal solution. What do you think is better here? Introduce a new parameter that allows to set dynamic batching exclusively for unet? Also do we need to address it somewhere in other parts of code that tied with unet export? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @asntr,
Thanks for contributing, the PR looks awesome (pleasure to review PR with tests, snippet, comments, docstrings!!!).
Do not have much to change, just some small nits for this PR. And for the
And for the case of classifier free guidance, indeed it's the first time that we meet the case where we don't just double but triple the inputs batch size. I think it would make sense to address it during the compilation to avoid using dynamic batching. Identify if the task is for pix2pix, modify the batch size for compiling the unet according to how we place models in the pipe to the neuron cores. Eg.
- if we only leverage one core (
data_parallel_mode=="none"
) or place the whole pipeline on both neuron cores (data_parallel_mode=="all"
)** then, the batch_size for compiling unet shall be3*batch_size
. - if we place just unet on both neuron cores then, the batch_size for compiling unet shall be something either
3*batch_size
iftorch_neuronx.data_parallel
is able to split the inputs with odd batch_size into 2 chunks (need to test, haven't try yet but I think so) or(3*batch_size + batch_size%2) // 2
and then truncate the outputnoise_pred
during the inference runtime.
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
optimum/neuron/pipelines/diffusers/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
Hi @asntr, what do you think of the changes on the compilation that I suggested? Let me know if you are interested in working on this! |
If you prefer to get this PR merged first, could you rebase your branch, there was a fix on the styling tool today, with it the CI shall be good. |
Hi @JingyaHuang! Sorry for a delayed response. I was thinking that choosing batch size depending on For example, t2i pipeline doesn't allow to use cfg when you are not using So, maybe it is indeed a task for a next pr, and I can totally work on it! |
I also have this example of inference output on INF2 for my snippet, should I update the docs with this example (and create a PR into documentation-images) ? |
Sounds great, thanks @asntr! Ping me if you need any help. |
Yeah please do! The image looks great! We could put it under the sdxl section: https://github.com/huggingface/optimum-neuron/blob/main/docs/source/tutorials/stable_diffusion.mdx#stable-diffusion-xl-turbo Thank you! |
Hi @JingyaHuang , I placed the docs under the I also opened this pr: https://huggingface.co/datasets/optimum/documentation-images/discussions/4 Let me know if you're happy with this. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @asntr for adding the doc, it looks great! Let's just wait for the CIs finishing to get this PR merge. (trainium CIs and INF1 CIs might fail but it's totally irrelevant and we can ignore them).
What does this PR do?
Fixes: #624
Added a support for loading and compiling InstructPix2Pix pipeline using neuron
Before submitting