Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

[LLM Runtime] Enable interactive mode of python api #548

Merged
merged 15 commits into from
Oct 27, 2023
Merged

Conversation

zhenwei-intel
Copy link
Contributor

@zhenwei-intel zhenwei-intel commented Oct 25, 2023

Type of Change

feature

  • demo of chatting with llama2
  • max_new_tokens for each prompt not all history

Description

...................................................................................................
model_init_from_file: support_jblas_kv = 0
model_init_from_file: kv self size =  512.00 MB
> You are a zoologist, you will answer my questions about animals.
 Of course! I'd be happy to help answer any questions you have about animals. 🐠 What would you like to know?</s>
> who are you?
 Hello! As a zoologist, I am an expert in the field of animal biology and behavior. I have extensive knowledge of various species, their habitats, and their social behaviors. I am here to help answer any questions you may have about animals, so feel free to ask me anything! 🐝</s>
> What is an elephant?
Ah, a great question! Elephants are large, intelligent mammals that belong to the genus Elephantidae. There are two main species of elephant: the African elephant and the Asian elephant.
African elephants (Loxodonta africana) are found in the savannas and grasslands of sub-Saharan Africa, while Asian elephants (Elephas maximus) are found in Southeast Asia, including India, China, and Indonesia.
Elephants are known for their large size, with African males reaching heights of up to 4 meters (13 feet) and weighing up to 6 tons (5,400 kg). Asian elephants are generally smaller, with males reaching heights of up to 2.5 meters (8 feet) and weighing up to 3 tons (2,700 kg).
Elephants have several distinctive physical features, including their large ears, which help them regulate their body temperature in hot climates. They also have a long, flexible trunk that they use for breathing, eating, and drinking. Their trunks are made up of muscles and can be moved in various ways to perform different functions.
Elephants are highly intelligent and social animals, known for their complex communication systems and strong family bonds. They have a highly developed sense of empathy and can recognize and respond to the emotions of other elephants.
In addition to their intelligence and social behavior, elephants are also known for their remarkable memory and ability to learn from experience. They have been observed exhibiting cultural behaviors, such as the use of tools and the transmission of knowledge between generations.
Elephants play a crucial role in many ecosystems, particularly in Africa, where they help maintain the balance of the savannah ecosystem by controlling the population of other herbivores through their feeding habits. They are also important seed dispersers and can have a significant impact on the distribution of plant species.
Overall, elephants are fascinating creatures that continue to capture the imagination and interest of scientists and animal lovers around the world. 🐘</s>
> How much does it weight?
 The weight of an elephant can vary depending on the individual and its age, size, and species. Here are some approximate weights for African and Asian elephants:
African Elephants:
* Males: 2-4 tons (3,600-7,200 kg)
* Females: 1.5-2.5 tons (2,300-3,600 kg)
Asian Elephants:
* Males: 1.5-2.5 tons (2,300-3,600 kg)
* Females: 1-2 tons (900-1,800 kg)
It's worth noting that these weights are approximate and can vary depending on various factors such as the individual elephant's size, age, and species.</s>
> thank you  
 You're welcome! It was a pleasure to help you. If you have any other questions or need further assistance, feel free to ask. Have a great day!</s>
> 

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
@zhenwei-intel zhenwei-intel marked this pull request as draft October 25, 2023 08:11
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
@zhenwei-intel zhenwei-intel changed the title [LLM Runtime] Add demo for chatting with llama [LLM Runtime] Enable interactive mode of python api Oct 27, 2023
@zhenwei-intel zhenwei-intel marked this pull request as ready for review October 27, 2023 02:27
@zhenwei-intel
Copy link
Contributor Author

@lvliang-intel, For multi round conversation mode, please refer to the example in readme

Copy link
Contributor

@zhentaoyu zhentaoyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. BTW, do we have any plan to support multi-batch prompt generation (static batching. padding, like transformers) in python API? Since ChatGLM series models need it. @Zhenzhong1
for example:

inputs = tokenizer(batch_text, padding=True, return_tensors="pt")
inputs = inputs.reshape(1, -1)
outputs = model.generate(inputs, batch_size=n...)
outputs = outputs.reshape(n, max_new_tokens)

It can save the time of C++ padding development. And use it for top-p-top-k and beam_search (later). However, it may introduce padding_atten_mask for MHA and next-tokens handling. Just a thought after discussing it with @Zhenzhong1. No related to this PR.

@zhenwei-intel
Copy link
Contributor Author

LGTM. BTW, do we have any plan to support multi-batch prompt generation (static batching. padding, like transformers) in python API? Since ChatGLM series models need it. @Zhenzhong1 for example:

inputs = tokenizer(batch_text, padding=True, return_tensors="pt")
inputs = inputs.reshape(1, -1)
outputs = model.generate(inputs, batch_size=n...)
outputs = outputs.reshape(n, max_new_tokens)

It can save the time of C++ padding development. And use it for top-p-top-k and beam_search (later). However, it may introduce padding_atten_mask for MHA and next-tokens handling. Just a thought after discussing it with @Zhenzhong1. No related to this PR.

Yes, we use transformer tokenizer, which includes padding function. python api of multi-batch needs to wait for c++ api to be done before starting.

Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@VincyZhang VincyZhang merged commit 6e32ca6 into main Oct 27, 2023
11 checks passed
@VincyZhang VincyZhang deleted the lzw/talk_history branch October 27, 2023 05:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants