-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tokenizer] Unify tokenizer _pad #9280
[Tokenizer] Unify tokenizer _pad #9280
Conversation
Thanks for your contribution! |
…Fish19/PaddleNLP into dev_20241016_update_tokenizer__pad
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9280 +/- ##
===========================================
+ Coverage 52.94% 52.96% +0.02%
===========================================
Files 657 657
Lines 106533 106384 -149
===========================================
- Hits 56404 56351 -53
+ Misses 50129 50033 -96 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7f3b3c8
to
867ad0b
Compare
PR types
Function optimization
PR changes
APIs
Description
Unify tokenizer _pad function.
attention_mask([1,seq_len,seql_len])
padding action into tokenizer_base_pad
.attn_mask_startend_row_indices
padding action into tokenizer_base_pad
.[FlashMask] Add FlashMask for Qwen2 #9264误差范围验证基于此PR