-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Llama FA2] Re-add _expand_attention_mask and clean a couple things #27074
Conversation
@ArthurZucker could you give this a quick review? It'd make the Bart FA PR much easier to continue and should also fix the better transformers problem with optimum |
The documentation is not available anymore as the PR was closed or merged. |
Of course! |
def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None): | ||
warnings.warn( | ||
"Calling `transformers.models.llama.modeling_llama._expand_mask` is deprecated and will be removed in v4.37. Use `transformers.models.llama.modeling_llama.AttnMaskConverter._expand_mask" | ||
) | ||
return AttnMaskConverter._expand_mask(mask=mask, dtype=dtype, tgt_len=tgt_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We should probably do the same for falcon and mistral as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think in optimum only the llama mask utils are imported: https://github.com/huggingface/optimum/blob/313e1bd0de2b44aaa71797464f1e8b6a041a6f18/optimum/bettertransformer/models/attention.py#L25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok 👍🏻
…uggingface#27074) * clean * clean llama * fix more * make style * Apply suggestions from code review * Apply suggestions from code review * Update src/transformers/models/llama/modeling_llama.py * Update src/transformers/models/llama/modeling_llama.py * Apply suggestions from code review * finish * make style
What does this PR do?
This PR cleans the attention mask converter a bit more, corrects some docstrings and removes outdated comments and deprecates
_expand_attention_mask
to fix optimum.