Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry regarding the Mask-Guided Coordination Scheme #6

Open
MunchkinChen opened this issue Apr 3, 2024 · 0 comments
Open

Inquiry regarding the Mask-Guided Coordination Scheme #6

MunchkinChen opened this issue Apr 3, 2024 · 0 comments

Comments

@MunchkinChen
Copy link

MunchkinChen commented Apr 3, 2024

Hello 👋

Thank you for your amazing work!

I have a few questions concerning your paper, typically the Mask-Guided Coordination (Section 4.3)

  1. Is the mask-guided coordination scheme also implemented during "appearance editing"?
  2. Is masked attention applied in the spatial self-attention block or the temporal self-attention block, or both?
  3. When and where is masked attention applied in terms of denoising timestep $t$ and attention layer $l$?
    Is it only during content preservation $t&gt;t_0, l&gt;l_0$ (resp. structure control t<t_2, l>l_2)? In other words, is the $V$ (resp. $Q, K$​) in formula (6) from the reconstruction branch?
  4. For the mask $M$, do you use the same mask for all video frames (if so, could you elaborate how this mask is generated?) or do you concatenate all the frame masks?

P.S. What's the exact source prompt you use to generate the results in Figure 1? I attempted 'A raccoon is playing guitar' but it didn't quite nail that cartoonish and detailed background vibe as in your demo

Your guidance on these queries would be immensely valuable, many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant