Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

[RUNTIME] Enabing streaming llm for Runtime #501

Merged
merged 4 commits into from
Oct 19, 2023
Merged

Conversation

zhenwei-intel
Copy link
Contributor

@zhenwei-intel zhenwei-intel commented Oct 19, 2023

Type of Change

feature

Description

support text generation continuously, when context length is greater than ctx_size

  • n_keep, number of tokens to keep from initial prompt
  • n_discard, number of tokens will be discarded, with remaining tokens insert begin of current tokens

Expected Behavior & Potential Risk

outputs = model.generate(inputs, streamer=streamer, max_new_tokens=300, ctx_size=100, n_keep=4, n_discard=1)

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

N/A

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
@a32543254 a32543254 changed the title add n_keep and n_discard for streaming llm [RUNTIME] Enabing streaming llm for Runtime Oct 19, 2023
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
@a32543254
Copy link
Contributor

For streaming llm,
Based on paper
We will recommend use n_keep as 4 and n_discard = -1 to keep a relative great acc and performance with infinite inference with streaming llm

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
@airMeng
Copy link
Contributor

airMeng commented Oct 19, 2023

wonder whether this will impact contiguous batching since both n_keep and n_discard determines the size of KV cache.

@airMeng airMeng requested a review from zhentaoyu October 19, 2023 09:07
@hshen14
Copy link
Contributor

hshen14 commented Oct 19, 2023

wonder whether this will impact contiguous batching since both n_keep and n_discard determines the size of KV cache.

I asked the same question to Zhentao. The answer is No.

@hshen14 hshen14 merged commit 66238a5 into main Oct 19, 2023
11 checks passed
@hshen14 hshen14 deleted the lzw/add_n_discard branch October 19, 2023 09:17
zhenwei-intel added a commit that referenced this pull request Oct 23, 2023
* Support StreamingLLM on CPU

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
VincyZhang added a commit that referenced this pull request Oct 23, 2023
* [CPP Graph] Opt qbits dequant (#465)

* use INC 2.3.1

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>

* use INC 2.3.1 (#500)

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>

* [RUNTIME] Enabing streaming llm for Runtime (#501)

* Support StreamingLLM on CPU

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

* Reduce the UT evaluation time (#498)

Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com>

* Minor fix (#507)

* Fix ChatGLM2 model loading issue (#510)

* Fix ChatGLM2 model loading issue

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Remove OneDNN env setint for BF16 inference (#509)

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: VincyZhang <wenxin.zhang@intel.com>

* support Avx2 (#493)

* support Memcpy2D

* support gelu fusion

---------

Co-authored-by: luoyu-intel <yu.luo@intel.com>

* add neuralchat ut for audio util (#466)

* reduce ut time consumption (#499)

Signed-off-by: Xin He <xin3.he@intel.com>

* update python api readme (#504)

* Add docker setup session for neuralchat finetuning sample (#496)

* Update README.md to new added docker setup session

Signed-off-by: Louie Tsai <louie.tsai@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md for fast token issue (#515)

Signed-off-by: Louie Tsai <louie.tsai@intel.com>

* Fix typo in README.md (#516)

convertion -> conversion

Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* improve Avx2  (#511)

* Revert "update python api readme (#504)"

This reverts commit 5f4175a.

* Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md (#519)

Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com>

* docs: fix typos in question answering of pytorch (#520)

Signed-off-by: Surav Shrestha <suravshresth@gmail.com>

* fixed typos (#522)

* Updated README.md (#517)

Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com>

* update python api readme

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

* fix readme

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Update README.md

Signed-off-by: Dong, Bo <bo1.dong@intel.com>

* Add Data type description
Align Doc and help info

Signed-off-by: Hengyu Meng <hengyu.meng@intel.com>

* align

Signed-off-by: Hengyu Meng <hengyu.meng@intel.com>

* fix eos token id

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>

---------

Signed-off-by: Wenxin Zhang <wenxin.zhang@intel.com>
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Wang, Chang <chang1.wang@intel.com>
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: Haihao Shen <haihao.shen@intel.com>
Signed-off-by: Xin He <xin3.he@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Signed-off-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Signed-off-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com>
Signed-off-by: Surav Shrestha <suravshresth@gmail.com>
Signed-off-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com>
Signed-off-by: Dong, Bo <bo1.dong@intel.com>
Signed-off-by: Hengyu Meng <hengyu.meng@intel.com>
Co-authored-by: Wang, Zhe <zhe1.wang@intel.com>
Co-authored-by: Wenxin Zhang <wenxin.zhang@intel.com>
Co-authored-by: Wang, Chang <chang1.wang@intel.com>
Co-authored-by: lvliang-intel <liang1.lv@intel.com>
Co-authored-by: Haihao Shen <haihao.shen@intel.com>
Co-authored-by: yuchengliu1 <yucheng.liu@intel.com>
Co-authored-by: luoyu-intel <yu.luo@intel.com>
Co-authored-by: Liangyx2 <106130696+Liangyx2@users.noreply.github.com>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: ayushrakesh <115995339+ayushrakesh@users.noreply.github.com>
Co-authored-by: Surav Shrestha <148626286+shresthasurav@users.noreply.github.com>
Co-authored-by: Smoothieewastaken <86610201+Smoothieewastaken@users.noreply.github.com>
Co-authored-by: Aditya Aryaman Das <128703909+alienishi@users.noreply.github.com>
Co-authored-by: Dong, Bo <bo1.dong@intel.com>
Co-authored-by: Hengyu Meng <hengyu.meng@intel.com>
VincyZhang pushed a commit that referenced this pull request Oct 23, 2023
* Support StreamingLLM on CPU

Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants