[LLM Runtime] Enable interactive mode of python api #548

zhenwei-intel · 2023-10-25T08:06:07Z

Type of Change

feature

demo of chatting with llama2
max_new_tokens for each prompt not all history

Description

...................................................................................................
model_init_from_file: support_jblas_kv = 0
model_init_from_file: kv self size =  512.00 MB
> You are a zoologist, you will answer my questions about animals.
 Of course! I'd be happy to help answer any questions you have about animals. 🐠 What would you like to know?</s>
> who are you?
 Hello! As a zoologist, I am an expert in the field of animal biology and behavior. I have extensive knowledge of various species, their habitats, and their social behaviors. I am here to help answer any questions you may have about animals, so feel free to ask me anything! 🐝</s>
> What is an elephant?
Ah, a great question! Elephants are large, intelligent mammals that belong to the genus Elephantidae. There are two main species of elephant: the African elephant and the Asian elephant.
African elephants (Loxodonta africana) are found in the savannas and grasslands of sub-Saharan Africa, while Asian elephants (Elephas maximus) are found in Southeast Asia, including India, China, and Indonesia.
Elephants are known for their large size, with African males reaching heights of up to 4 meters (13 feet) and weighing up to 6 tons (5,400 kg). Asian elephants are generally smaller, with males reaching heights of up to 2.5 meters (8 feet) and weighing up to 3 tons (2,700 kg).
Elephants have several distinctive physical features, including their large ears, which help them regulate their body temperature in hot climates. They also have a long, flexible trunk that they use for breathing, eating, and drinking. Their trunks are made up of muscles and can be moved in various ways to perform different functions.
Elephants are highly intelligent and social animals, known for their complex communication systems and strong family bonds. They have a highly developed sense of empathy and can recognize and respond to the emotions of other elephants.
In addition to their intelligence and social behavior, elephants are also known for their remarkable memory and ability to learn from experience. They have been observed exhibiting cultural behaviors, such as the use of tools and the transmission of knowledge between generations.
Elephants play a crucial role in many ecosystems, particularly in Africa, where they help maintain the balance of the savannah ecosystem by controlling the population of other herbivores through their feeding habits. They are also important seed dispersers and can have a significant impact on the distribution of plant species.
Overall, elephants are fascinating creatures that continue to capture the imagination and interest of scientists and animal lovers around the world. 🐘</s>
> How much does it weight?
 The weight of an elephant can vary depending on the individual and its age, size, and species. Here are some approximate weights for African and Asian elephants:
African Elephants:
* Males: 2-4 tons (3,600-7,200 kg)
* Females: 1.5-2.5 tons (2,300-3,600 kg)
Asian Elephants:
* Males: 1.5-2.5 tons (2,300-3,600 kg)
* Females: 1-2 tons (900-1,800 kg)
It's worth noting that these weights are approximate and can vary depending on various factors such as the individual elephant's size, age, and species.</s>
> thank you  
 You're welcome! It was a pleasure to help you. If you have any other questions or need further assistance, feel free to ask. Have a great day!</s>
>

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

intel_extension_for_transformers/llm/runtime/graph/scripts/chat_with_llama.py

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

intel_extension_for_transformers/llm/runtime/graph/scripts/run_llm_chatglm2.py

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

zhenwei-intel · 2023-10-27T02:30:31Z

@lvliang-intel, For multi round conversation mode, please refer to the example in readme

zhentaoyu

LGTM. BTW, do we have any plan to support multi-batch prompt generation (static batching. padding, like transformers) in python API? Since ChatGLM series models need it. @Zhenzhong1
for example:

inputs = tokenizer(batch_text, padding=True, return_tensors="pt")
inputs = inputs.reshape(1, -1)
outputs = model.generate(inputs, batch_size=n...)
outputs = outputs.reshape(n, max_new_tokens)

It can save the time of C++ padding development. And use it for top-p-top-k and beam_search (later). However, it may introduce padding_atten_mask for MHA and next-tokens handling. Just a thought after discussing it with @Zhenzhong1. No related to this PR.

zhenwei-intel · 2023-10-27T05:04:07Z

LGTM. BTW, do we have any plan to support multi-batch prompt generation (static batching. padding, like transformers) in python API? Since ChatGLM series models need it. @Zhenzhong1 for example:
inputs = tokenizer(batch_text, padding=True, return_tensors="pt")
inputs = inputs.reshape(1, -1)
outputs = model.generate(inputs, batch_size=n...)
outputs = outputs.reshape(n, max_new_tokens)
It can save the time of C++ padding development. And use it for top-p-top-k and beam_search (later). However, it may introduce padding_atten_mask for MHA and next-tokens handling. Just a thought after discussing it with @Zhenzhong1. No related to this PR.

Yes, we use transformer tokenizer, which includes padding function. python api of multi-batch needs to wait for c++ api to be done before starting.

intel_extension_for_transformers/llm/runtime/graph/application/main_pybind.cpp

a32543254

LGTM

zhenwei-intel added 8 commits October 24, 2023 14:29

generate_count

b9bb5e5

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

interative chat

f82f659

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

chat with chatglm and llama

8e90fac

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

llama mutil round

572b5bb

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

update prompt template

5fc0eb1

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

update chat with llama

fcc6411

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

clean

f158c9f

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

clean

f7db16b

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

zhenwei-intel requested a review from airMeng as a code owner October 25, 2023 08:06

zhenwei-intel added 2 commits October 25, 2023 16:07

clean

acbb30e

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

clean

7cfb537

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

zhenwei-intel marked this pull request as draft October 25, 2023 08:11

zhenwei-intel added 3 commits October 26, 2023 09:22

using kv cache history

cf00b86

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

add chatglm

0db4e87

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

add chatglm2

b9a06da

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

airMeng added the ITREX.cpp label Oct 26, 2023

airMeng reviewed Oct 26, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/scripts/chat_with_llama.py Outdated Show resolved Hide resolved

fix logic

f3d7983

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

airMeng reviewed Oct 27, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/scripts/run_llm_chatglm2.py Outdated Show resolved Hide resolved

move example to readme

81e63a9

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

zhenwei-intel changed the title ~~[LLM Runtime] Add demo for chatting with llama~~ [LLM Runtime] Enable interactive mode of python api Oct 27, 2023

zhenwei-intel requested review from a32543254 and zhentaoyu October 27, 2023 02:25

zhenwei-intel marked this pull request as ready for review October 27, 2023 02:27

zhenwei-intel requested review from intellinjun and Zhenzhong1 October 27, 2023 02:39

zhentaoyu approved these changes Oct 27, 2023

View reviewed changes

a32543254 reviewed Oct 27, 2023

View reviewed changes

intel_extension_for_transformers/llm/runtime/graph/application/main_pybind.cpp Show resolved Hide resolved

a32543254 approved these changes Oct 27, 2023

View reviewed changes

VincyZhang merged commit 6e32ca6 into main Oct 27, 2023
11 checks passed

VincyZhang deleted the lzw/talk_history branch October 27, 2023 05:16

VincyZhang pushed a commit that referenced this pull request Nov 7, 2023

[LLM Runtime] Enable interactive mode of python api (#548)

dbafc12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM Runtime] Enable interactive mode of python api #548

[LLM Runtime] Enable interactive mode of python api #548

zhenwei-intel commented Oct 25, 2023 •

edited

Loading

zhenwei-intel commented Oct 27, 2023

zhentaoyu left a comment

zhenwei-intel commented Oct 27, 2023

a32543254 left a comment

[LLM Runtime] Enable interactive mode of python api #548

[LLM Runtime] Enable interactive mode of python api #548

Conversation

zhenwei-intel commented Oct 25, 2023 • edited Loading

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

zhenwei-intel commented Oct 27, 2023

zhentaoyu left a comment

Choose a reason for hiding this comment

zhenwei-intel commented Oct 27, 2023

a32543254 left a comment

Choose a reason for hiding this comment

zhenwei-intel commented Oct 25, 2023 •

edited

Loading