-
Notifications
You must be signed in to change notification settings - Fork 211
[LLM Runtime] Enable interactive mode of python api #548
Conversation
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
intel_extension_for_transformers/llm/runtime/graph/scripts/chat_with_llama.py
Outdated
Show resolved
Hide resolved
intel_extension_for_transformers/llm/runtime/graph/scripts/run_llm_chatglm2.py
Outdated
Show resolved
Hide resolved
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
@lvliang-intel, For multi round conversation mode, please refer to the example in readme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. BTW, do we have any plan to support multi-batch prompt generation (static batching. padding, like transformers
) in python API? Since ChatGLM
series models need it. @Zhenzhong1
for example:
inputs = tokenizer(batch_text, padding=True, return_tensors="pt")
inputs = inputs.reshape(1, -1)
outputs = model.generate(inputs, batch_size=n...)
outputs = outputs.reshape(n, max_new_tokens)
It can save the time of C++ padding development. And use it for top-p-top-k
and beam_search
(later). However, it may introduce padding_atten_mask
for MHA
and next-tokens handling. Just a thought after discussing it with @Zhenzhong1. No related to this PR.
Yes, we use transformer tokenizer, which includes padding function. python api of multi-batch needs to wait for c++ api to be done before starting. |
intel_extension_for_transformers/llm/runtime/graph/application/main_pybind.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Type of Change
feature
Description
Expected Behavior & Potential Risk
the expected behavior that triggered by this PR
How has this PR been tested?
how to reproduce the test (including hardware information)
Dependency Change?
any library dependency introduced or removed