-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build(ascend): add Dockerfile for ascend aarch64 910B #2278
Conversation
I got this error when trying to import
The |
deeplink.framework supports 8.0.RC1.alpha003,other versions are not tested for now. |
@yunfwe 目前支持的模型为 llama2-7b, internlm2-7b, mixtral-8x7b,可以参考以下脚本进行静态的推理,chat版本的功能还在开发中: import deeplink_ext
import lmdeploy
from lmdeploy import PytorchEngineConfig
if __name__ == "__main__":
backend_config = PytorchEngineConfig(tp=1, cache_max_entry_count=0.3,
device_type="ascend")
pipe = lmdeploy.pipeline("internlm/internlm2-chat-7b",
backend_config=backend_config)
question = ["上海有什么美食?"]
response = pipe(question, request_output_len=128, do_preprocess=True)
for idx, r in enumerate(response):
print(f"Question: {question[idx]}")
print(f"Answer: {r.text}")
print() |
感谢解惑 |
@RunningLeon may open another PR to add |
OK |
Using
The model is |
The current version is slower than MindIE. It is based on eager mode and is not fully optimized (If you have a Huawei machine with an Intel CPU, you can get 3x performance without any changes.) MindIE is based on graph mode, so it shows better performance. We are working on graph mode and will release the graph mode version of 910b on lmdeploy by the end of October. |
docker/Dockerfile_aarch64_910B
Outdated
pip3 install pathlib2 protobuf attrs attr scipy && \ | ||
pip3 install requests psutil absl-py && \ | ||
pip3 install torch==2.1.1 torchvision==0.16.1 --index-url=https://download.pytorch.org/whl/cpu && \ | ||
pip3 install transformers==4.38.0 && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to specify the version of transformers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to specify the version of transformers?
we only test specific version, and maybe it can relax to 4.38.0-4.41.2, we should test
RUN echo -e "diff --git a/impl/ascend_npu/CMakeLists.txt b/impl/ascend_npu/CMakeLists.txt\n\ | ||
index e684c59..f1cd8d4 100755\n\ | ||
--- a/impl/ascend_npu/CMakeLists.txt\n\ | ||
+++ b/impl/ascend_npu/CMakeLists.txt\n\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning patch, as you mentioned before such warnings can be misunderstood by users as errors.
index e684c59..f1cd8d4 100755\n\ | ||
--- a/impl/ascend_npu/CMakeLists.txt\n\ | ||
+++ b/impl/ascend_npu/CMakeLists.txt\n\ | ||
@@ -14,6 +14,11 @@ FetchContent_Declare(op_plugin\n\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put this in a cmakelist file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch can't be merged into main branch, so we use this method to work around.
sed -i 's@http://mirrors.tuna.tsinghua.edu.cn@https://mirrors.tuna.tsinghua.edu.cn@g' /etc/apt/sources.list && \ | ||
apt clean && rm -rf /var/lib/apt/lists/* | ||
|
||
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 7 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why install both gcc-7 and gcc-9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is gcov
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why install both gcc-7 and gcc-9
OS embedded gcc is 9.4.0, but due to some strange compiler error, we must use gcc 7.5.0 for deeplink.framework. This update-alternatives command is a proper way to use gcc 7.5.0 default and remain gcc 9.4.0 available for OS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is
gcov
necessary?
I think gcov is not necessary, but the piece of code just makes gcc toolchain version to be same. update-alternatives only controls different versions of same programs. I refer update-alternatives manual here.
Any image on Docker Hub? |
Will LMDeploy become a competitor to MindIE? As a user of Ascend 910B, which inference and serving engine should I chose? |
No, please use the dockerfile. (some compliance reasons..) |
Yes, we have graph mode, and capture graph via torch.dynamo. |
I tested the performance of the graph mode: For single request, the graph mode's speed is much closer to MindIE compared to the eager mode. However, for batched requests, the graph mode's speed is still far lower than MindIE. Also, for the prefill stage with batched requests, the graph mode's speed is even slower than the eager mode. |
Thanks for your testing. |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Providing a Dockerfile for running ascend backends with pytorch engine,
Currently only Dockerfile for aarch64 platform is prepared.
Modification
Add Dockerfile for ascend aarch64 910B
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist