Inquiry regarding the Mask-Guided Coordination Scheme #6

MunchkinChen · 2024-04-03T05:55:59Z

Hello 👋

Thank you for your amazing work!

I have a few questions concerning your paper, typically the Mask-Guided Coordination (Section 4.3)

Is the mask-guided coordination scheme also implemented during "appearance editing"?
Is masked attention applied in the spatial self-attention block or the temporal self-attention block, or both?
When and where is masked attention applied in terms of denoising timestep $t$ and attention layer $l$?
Is it only during content preservation $t>t_0, l>l_0$ (resp. structure control t<t_2, l>l_2)? In other words, is the $V$ (resp. $Q, K$) in formula (6) from the reconstruction branch?
For the mask $M$, do you use the same mask for all video frames (if so, could you elaborate how this mask is generated?) or do you concatenate all the frame masks?

P.S. What's the exact source prompt you use to generate the results in Figure 1? I attempted 'A raccoon is playing guitar' but it didn't quite nail that cartoonish and detailed background vibe as in your demo

Your guidance on these queries would be immensely valuable, many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry regarding the Mask-Guided Coordination Scheme #6

Inquiry regarding the Mask-Guided Coordination Scheme #6

MunchkinChen commented Apr 3, 2024 •

edited

Loading

Inquiry regarding the Mask-Guided Coordination Scheme #6

Inquiry regarding the Mask-Guided Coordination Scheme #6

Comments

MunchkinChen commented Apr 3, 2024 • edited Loading

MunchkinChen commented Apr 3, 2024 •

edited

Loading