-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLM Inference] add --use_fake_parameter option for ptq fake scales and fix compute error of total_max_length #8955
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8955 +/- ##
===========================================
+ Coverage 54.79% 54.82% +0.02%
===========================================
Files 636 636
Lines 99876 99970 +94
===========================================
+ Hits 54732 54807 +75
- Misses 45144 45163 +19 ☔ View full report in Codecov by Sentry. |
@@ -124,7 +124,7 @@ class PredictorArgument: | |||
|
|||
@property | |||
def total_max_length(self): | |||
return self.src_length + self.max_length | |||
return 8192 # Maximum sequence length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
类似模型的超参,之前理解有问题,因此改正过来
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…nd fix compute error of total_max_length (PaddlePaddle#8955) * update some code * update * update * update * update tune_cublaslt_gemm demo * fix step in tune_cublaslt_gemm
PR types
Others
PR changes
Others
Description