[LLM Inference] add --use_fake_parameter option for ptq fake scales and fix compute error of total_max_length #8955

yuanlehome · 2024-08-17T10:58:45Z

PR types

Others

PR changes

Others

Description

add --use_fake_parameter option for ptq fake scales
修复src_length与max_length的问题，现在导出静态图模型时无须指定它们。
修复llama3/qwen2等模型输出尾部多余的“#”符号
更新tune_cublaslt_gemm示例，增加llama3.1/qwen2 tune示例

paddle-bot · 2024-08-17T10:58:50Z

Thanks for your contribution!

codecov · 2024-08-17T11:30:36Z

Codecov Report

Attention: Patch coverage is 0% with 73 lines in your changes missing coverage. Please review.

Project coverage is 54.82%. Comparing base (e0d2809) to head (0cb99af).
Report is 225 commits behind head on develop.

Files with missing lines	Patch %	Lines
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	38 Missing ⚠️
paddlenlp/experimental/transformers/utils.py	0.00%	34 Missing ⚠️
paddlenlp/utils/llm_utils.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8955      +/-   ##
===========================================
+ Coverage    54.79%   54.82%   +0.02%     
===========================================
  Files          636      636              
  Lines        99876    99970      +94     
===========================================
+ Hits         54732    54807      +75     
- Misses       45144    45163      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DesmonDay · 2024-08-19T10:48:51Z

llm/predict/predictor.py

@@ -124,7 +124,7 @@ class PredictorArgument:

    @property
    def total_max_length(self):
-        return self.src_length + self.max_length
+        return 8192  # Maximum sequence length.


类似模型的超参，之前理解有问题，因此改正过来

DesmonDay

LGTM

…nd fix compute error of total_max_length (PaddlePaddle#8955) * update some code * update * update * update * update tune_cublaslt_gemm demo * fix step in tune_cublaslt_gemm

update some code

156a4a3

yuanlehome added 2 commits August 17, 2024 11:36

update

eeb6f4c

update

852b90d

yuanlehome changed the title ~~[LLM Inference] update some code~~ [LLM Inference] add --use_fake_parameter option for ptq fake scales and fix compute error of total_max_length Aug 17, 2024

yuanlehome added 3 commits August 17, 2024 12:47

update

ed6dfc6

update tune_cublaslt_gemm demo

750ab63

fix step in tune_cublaslt_gemm

0cb99af

DesmonDay reviewed Aug 19, 2024

View reviewed changes

DesmonDay approved these changes Aug 19, 2024

View reviewed changes

wawltor merged commit 71b3be3 into PaddlePaddle:develop Aug 19, 2024
9 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM Inference] add --use_fake_parameter option for ptq fake scales and fix compute error of total_max_length #8955

[LLM Inference] add --use_fake_parameter option for ptq fake scales and fix compute error of total_max_length #8955

yuanlehome commented Aug 17, 2024 •

edited

Loading

paddle-bot bot commented Aug 17, 2024

codecov bot commented Aug 17, 2024 •

edited

Loading

DesmonDay Aug 19, 2024

yuanlehome Aug 19, 2024

DesmonDay left a comment

[LLM Inference] add --use_fake_parameter option for ptq fake scales and fix compute error of total_max_length #8955

[LLM Inference] add --use_fake_parameter option for ptq fake scales and fix compute error of total_max_length #8955

Conversation

yuanlehome commented Aug 17, 2024 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 17, 2024

codecov bot commented Aug 17, 2024 • edited Loading

Codecov Report

DesmonDay Aug 19, 2024

Choose a reason for hiding this comment

yuanlehome Aug 19, 2024

Choose a reason for hiding this comment

DesmonDay left a comment

Choose a reason for hiding this comment

yuanlehome commented Aug 17, 2024 •

edited

Loading

codecov bot commented Aug 17, 2024 •

edited

Loading