add IPEX-XPU support for Llama2 model Inference (greedy search) #701

faaany · 2024-05-07T02:27:11Z

What does this PR do?

This PR enables Intel GPU support for Llama2 model inference in optimum-intel. This PR covers greedy search generation only. Below is an example:

import torch 
from transformers import AutoTokenizer, pipeline
from optimum.intel import IPEXModelForCausalLM

model_id = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = IPEXModelForCausalLM.from_pretrained(model_id, device_map="xpu", torch_dtype=torch.float16, export=True)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, do_sample=False, num_beams=1, use_cache=True)
results = pipe("He's a dreadful magician and")
print(results)

* add xpu patch to optimum intel * simple path for xpu inference

faaany · 2024-05-08T11:40:22Z

close this PR due to messy commit history; pls go to #703 for more info.

ganyi1996ppo and others added 4 commits April 28, 2024 18:27

add xpu patch to optimum intel (huggingface#7)

0a45828

* add xpu patch to optimum intel * simple path for xpu inference

can run but precision error

5e2aba3

optimize optimum

7fcc5f8

further optimize

b739cdf

faaany force-pushed the xpu-llama2 branch from fc64755 to 97aa59c Compare May 8, 2024 09:37

add device check

e871935

faaany force-pushed the xpu-llama2 branch from 97aa59c to e871935 Compare May 8, 2024 09:40

resolve merge conflicts

4aeb008

faaany force-pushed the xpu-llama2 branch from 9182774 to 4aeb008 Compare May 8, 2024 10:26

Merge branch 'main' into xpu-llama2

21d179e

faaany closed this May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add IPEX-XPU support for Llama2 model Inference (greedy search) #701

add IPEX-XPU support for Llama2 model Inference (greedy search) #701

faaany commented May 7, 2024 •

edited

Loading

faaany commented May 8, 2024 •

edited

Loading

add IPEX-XPU support for Llama2 model Inference (greedy search) #701

add IPEX-XPU support for Llama2 model Inference (greedy search) #701

Conversation

faaany commented May 7, 2024 • edited Loading

What does this PR do?

faaany commented May 8, 2024 • edited Loading

faaany commented May 7, 2024 •

edited

Loading

faaany commented May 8, 2024 •

edited

Loading