-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AttentionProcessor support #949
Conversation
Interesting! What steps are needed to integrate attention maps into the training process? I'm willing to run some tests on my datasets. |
@ThereforeGames It works but still a little rough. if you can figure out the following steps you could probably try it out. https://github.com/rockerBOO/sd-scripts/tree/daam_sampling This branch for the sampling. In the sample file: |
Thanks - I imagine even those rough heatmaps could have a benefit on the model's understanding of object relationships. Will have to give this a try! |
Hello! Sorry for the late reply. This is a great suggestion, and I am very interested in your ideas! Unfotunately I don't have time to go into details, but please let me just write my basic thoughts first. I am very sorry, but overall, I am not a big fan of Diffusers' design policy: it seems that Diffusers is trying to be a generic library with very broad support, and it adds a lot of complexity to their code. I like to keep the sd-scripts repository as simple as possible. But, of course, I also understand the effectiveness of AttentionProcessor. So, I think we need to carefully consider how to implement a way to keep the repository simple while interoperable with Diffusers. Please give me time to think about how to address this issue. |
Thank you @kohya-ss . I appreciate your kind words. I was in the middle of working with that library to get this working and the changes I initially proposes with were just enough to get it working. After working with the library and augmenting it to work with CompVis like library I found it was assuming too many things to exist. I have reworked the library completely at this point to better support different types of libraries. So most of these changes I initially proposed won't be necessary to support. If we look at the current sd-scripts version we are doing an attention processor in the forward. So my proposal has reduced significantly to just adding a The current proposal would be adequate for my needs. The most limited version of what I'd need is the naming in the forward needs to support the HF Diffusers names
Thank you for taking the time to look at this proposal. |
Thank you for the update! I think the implementation is very nice and simple. However, I still wonder it might be advantageous to leave the existing code as is and only call the AttnProcessor if and when it is set, so that some compatibility testing with the existing code can be omitted. In addition, if we did that, the name translation might not be necessary. Of course, I think the PR is already great and ready to merge. However, I would like to carefully test the behavior of this code for backward compatibility. |
I went through and simplified it to the approach you hinted at here. See #961 I believe this is the simplest approach and captures all the properties for the forward arguments (for cases were people just pass No worries either way. I'm happy with this process we went through to reduce this down. Thank you again for your time. |
Background: I was working to add DAAM attention mapping heatmaps into the sampling pipeline (and possibly into the training process). For my own purposes but also to propose to you. To do this I needed to add AttentionProcessor like support and many small changes to be able to have DAAM use it's diffusers interface to work with the Kohya original_unet and sampling pipelines.
Diffusers made some changes and added a AttentionProcessor which abstracts out the attention mechanism and allows one to add their own processor in there. The DAAM library hooks into this processor to do cross attention mapping into a heatmap. https://github.com/castorini/daam/blob/v0.1.0/daam/trace.py#L281-L282
This also adds some additional code from the Diffusers Attention class to further support the manipulations of attention that DAAM requires.
I added the dropout architecture back in because DAAM was looking for the last node and it missing caused it to error out. We could make it
nn.Identity
instead if that's faster.The
encoder_hidden_states
was being used on Diffusers side which I believe is related to thecontext
term so I mapped that over. This allows other libraries to pass diffusers like variables or the originalcontext
and should work similarly. This happens due to a lot of **kwargs like mappings.Ultimately I made these changes to limit any further refactoring into maybe a diffusers like architecture and better allow sd-scripts to work with diffusers. I know this is a lot of changes and additions. I can refactor it down if necessary. Thank you!