Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

[Optimization] Text-generation support qwen #513

Merged
merged 43 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
f04d0fd
[CPP Graph] Opt qbits dequant (#465)
zhewang1-intc Oct 19, 2023
4adacf1
use INC 2.3.1
VincyZhang Oct 19, 2023
d962f58
use INC 2.3.1 (#500)
VincyZhang Oct 19, 2023
66238a5
[RUNTIME] Enabing streaming llm for Runtime (#501)
zhenwei-intel Oct 19, 2023
ea112e7
Merge branch 'main' of https://github.com/intel/intel-extension-for-t…
VincyZhang Oct 19, 2023
51485c6
Reduce the UT evaluation time (#498)
changwangss Oct 19, 2023
ff4abb8
Merge branch 'main' of https://github.com/intel/intel-extension-for-t…
VincyZhang Oct 19, 2023
9bdc764
Minor fix (#507)
VincyZhang Oct 19, 2023
6bd2b60
support qwen
changwangss Oct 19, 2023
ea720c2
Fix ChatGLM2 model loading issue (#510)
lvliang-intel Oct 19, 2023
02523e9
Update README.md
hshen14 Oct 19, 2023
0cff05a
Remove OneDNN env setint for BF16 inference (#509)
lvliang-intel Oct 20, 2023
1bee379
remove invalid code
changwangss Oct 20, 2023
ea69f9a
support Avx2 (#493)
yuchengliu1 Oct 20, 2023
f7d0d97
add neuralchat ut for audio util (#466)
Liangyx2 Oct 20, 2023
b9155ef
reduce ut time consumption (#499)
xin3he Oct 20, 2023
5f4175a
update python api readme (#504)
zhenwei-intel Oct 20, 2023
a8873ea
Add docker setup session for neuralchat finetuning sample (#496)
louie-tsai Oct 20, 2023
22fe7ad
Update README.md
hshen14 Oct 20, 2023
53b1b61
Update run_generation.py
changwangss Oct 20, 2023
b38241d
Update README.md
hshen14 Oct 20, 2023
1d91245
Update README.md
hshen14 Oct 20, 2023
18d9c57
Update README.md
hshen14 Oct 20, 2023
f98d72a
Update README.md
hshen14 Oct 20, 2023
0f6aee6
Update README.md
hshen14 Oct 20, 2023
a8db98f
Update README.md for fast token issue (#515)
louie-tsai Oct 21, 2023
52717e4
Fix typo in README.md (#516)
eltociear Oct 21, 2023
3cf68ee
Update README.md
hshen14 Oct 21, 2023
7fb944a
Update README.md
hshen14 Oct 21, 2023
7fed478
Update README.md
hshen14 Oct 21, 2023
dc81e4c
Update README.md
hshen14 Oct 21, 2023
dcfbcfd
improve Avx2 (#511)
yuchengliu1 Oct 21, 2023
a615905
Merge branch 'main' of https://github.com/intel/intel-extension-for-t…
VincyZhang Oct 21, 2023
61993cc
Revert "update python api readme (#504)"
VincyZhang Oct 21, 2023
4144197
Merge branch 'main' into wangchang/qwen
VincyZhang Oct 21, 2023
5b01e95
Update README.md
hshen14 Oct 22, 2023
bfb6a25
Update README.md (#519)
ayushrakesh Oct 22, 2023
0e0a9eb
docs: fix typos in question answering of pytorch (#520)
shresthasurav Oct 22, 2023
ec29f2f
fixed typos (#522)
Smoothieewastaken Oct 23, 2023
1357a02
Updated README.md (#517)
alienishi Oct 23, 2023
b3e4b25
Merge branch 'main' into wangchang/qwen
changwangss Oct 23, 2023
572ecbf
Merge branch 'main' into wangchang/qwen
changwangss Oct 23, 2023
2e77b6b
Merge branch 'main' into wangchang/qwen
changwangss Oct 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@
excluded_precisions=excluded_precisions, # default is []
)
elif args.woq:
quantization_config = WeightOnlyQuantConfig() #default is A32W4G32
quantization_config = WeightOnlyQuantConfig(compute_type="fp32", weight_type="int4_fullrange", group_size=32) #default is A32W4G32
# bitsandbytes
elif args.bitsandbytes:
# GPU device is need for `load_in_4bit` and `load_in_8bit`.
Expand All @@ -133,6 +133,8 @@
if quantization_config is not None:
user_model = AutoModelForCausalLM.from_pretrained(args.model,
quantization_config=quantization_config,
trust_remote_code=args.trust_remote_code,
torchscript=True if args.sq else False,
use_llm_runtime=False
)
if args.sq:
Expand All @@ -145,8 +147,8 @@
load_in_8bit=args.load_in_8bit,
use_llm_runtime=False
)
elif not args.int8 or not args.int8_bf16_mixed:
user_model = AutoModelForCausalLM.from_pretrained(args.model, config=config, use_llm_runtime=False)
elif not args.int8 and not args.int8_bf16_mixed:
user_model = AutoModelForCausalLM.from_pretrained(args.model, config=config, trust_remote_code=args.trust_remote_code, use_llm_runtime=False)
# peft
if args.peft_model_id is not None:
from peft import PeftModel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ def generate_dummy_past_key_values(input_bs, model):
else:
new_shape = [input_bs * num_attention_heads, 1, d_k]
pkv = pkv + (torch.ones(size=new_shape),)
elif model.config.model_type == "qwen":
new_shape = [input_bs, 1, num_attention_heads, d_k]
dummy_tensor = torch.ones(size=new_shape)
pkv = tuple(dummy_tensor for _ in range(nb_pkv))
else:
new_shape = [input_bs, num_attention_heads, 1, d_k]
dummy_tensor = torch.ones(size=new_shape)
Expand Down
Loading