Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update llm infer docs #9314

Merged
merged 2 commits into from
Oct 25, 2024

Conversation

yuanlehome
Copy link
Collaborator

PR types

Others

PR changes

Docs

Description

update llm infer docs

Copy link

paddle-bot bot commented Oct 24, 2024

Thanks for your contribution!

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.41%. Comparing base (6211e3d) to head (d5ad497).
Report is 8 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #9314      +/-   ##
===========================================
- Coverage    53.44%   52.41%   -1.04%     
===========================================
  Files          664      661       -3     
  Lines       109935   108376    -1559     
===========================================
- Hits         58757    56801    -1956     
- Misses       51178    51575     +397     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -94,6 +95,8 @@ PaddleNLP 提供了多种参数,用于配置推理模型和优化推理性能

- `block_attn`: 是否使用 Block Attention 推理, 默认值为False。Block Attention 是基于 PageAttention 的思想设计并实现的,在保持高性能推理和动态插入的基础上可以动态地为 cachekv 分配存储空间,极大地节省显存并提升推理的吞吐。

- `append_attn`: Append Attention 在 Block Attention 实现的基础上,进一步借鉴 FlashInfer 的实现对 Attention 模块进行了优化,并增加了C4的高性能支持,极大地提升了推理性能。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好说明一下两者的关系,是二选一,还是叠加?

建议可以说一下 append_attn 的优势是什么?哪些场景下更合适,这里只说借鉴 FlashInfer,但是用户可能不知道FlashInfer是什么。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

二选一,全场景下应该都合适,属于是block_attn的升级版

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

辛苦补充一下到文档里吧,主要是让用户看懂,看明白。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -43,27 +43,27 @@ BF16推理

```shell
# 动态图推理
python ./predict/predictor.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --dtype bfloat16 --mode dynamic --inference_model 1 --block_attn 1
python ./predict/predictor.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --dtype bfloat16 --mode dynamic --inference_model 1 --append_attn 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档是直接对外,append_attn 相关功能的CI也辛苦加一下,避免用户跑不通的情况。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI有些问题,已经在调试了,在下个PR中会提交

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuanlehome yuanlehome merged commit 81f5ab5 into PaddlePaddle:develop Oct 25, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants