-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM INFER] Optimize fuse some kernels in postprocess #9201
[LLM INFER] Optimize fuse some kernels in postprocess #9201
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9201 +/- ##
===========================================
- Coverage 52.92% 52.90% -0.03%
===========================================
Files 661 661
Lines 107069 106936 -133
===========================================
- Hits 56670 56571 -99
+ Misses 50399 50365 -34 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
for (int i = tid; i < bad_words_length; i += blockDim.x) { | ||
const int64_t bad_words_token_id = bad_words_list[i]; | ||
if (bad_words_token_id >= length || bad_words_token_id < 0) continue; | ||
logits_now[bad_words_token_id] = -1e10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果这里固定写了-1e10,那TypeName应该只能限定Float32或者Bfloat16,而不能传Float16。但算子注册的时候全都注册了,这存在溢出的风险。虽然目前通过组网强制cast(Float32),但容易被用户用错。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可以修改为,根据传入的类型设置不同精度的初始值?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得比较合理的情况是,输入不同的类型都兼容下;但如果简单处理,也可以只考虑注册特定的精度的算子
PR types
Performance optimization
PR changes
Others
Description
1.get_padding_offset与remove_padding kernel fuse
2.stop_generation_multi_ends_v2与update_inputs kernel与前面的一些操作进行fuse
3.set_value_by_flags_and_idx_v2与set_stop_value_multi_ends_v2 kernel fuse
均增加测试代码,算子级别已对齐精度