Add simple CPU offloading support. #2081

janimo · 2024-11-18T17:46:28Z

Based on the vLLM implementation, applied to llama.py, gemma2.py and qwen2.py

Tested with unquantized Llama 3.1 8B, Gemma 2 9B and Qwen 2.5 7B on a 6GB RTX 4060.

janimo · 2024-11-18T20:18:51Z

This is the code it is based on vllm-project/vllm#6496

test/srt/test_srt_engine.py

zhyncs · 2024-11-23T06:38:53Z

@janimo @merrymercy Unit test 3 failed

janimo · 2024-11-23T07:37:08Z

@janimo @merrymercy Unit test 3 failed

@zhyncs it fails in a different test, before the one included in this PR, it seems to be an unrelated OOM present in other CI runs as well.

janimo requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners November 18, 2024 17:46

janimo force-pushed the cpu-offload branch from 6f5e4a8 to 1b5ccaa Compare November 18, 2024 21:28

janimo added 4 commits November 21, 2024 23:02

Add simple CPU offloading support.

3a39a8d

Qwen2 CPU offload support

f6211c3

Add CPU offload test case

901ad3d

Support OLMo and OLMoE

969e672

janimo force-pushed the cpu-offload branch from 1b5ccaa to 969e672 Compare November 21, 2024 21:02

merrymercy reviewed Nov 23, 2024

View reviewed changes

test/srt/test_srt_engine.py Outdated Show resolved Hide resolved

merrymercy reviewed Nov 23, 2024

View reviewed changes

test/srt/test_srt_engine.py Outdated Show resolved Hide resolved

merrymercy added 3 commits November 22, 2024 21:59

Update test/srt/test_srt_engine.py

d5c8839

Update test/srt/test_srt_engine.py

d57b974

Merge branch 'main' into cpu-offload

b610689

merrymercy approved these changes Nov 23, 2024

View reviewed changes

merrymercy enabled auto-merge (squash) November 23, 2024 06:00

merrymercy merged commit d98fa1e into sgl-project:main Nov 23, 2024
12 of 13 checks passed

zhyncs mentioned this pull request Nov 23, 2024

Fix grid size in Triton decoding kernel #2134

Merged

janimo deleted the cpu-offload branch November 25, 2024 10:26

This was referenced Nov 29, 2024

Revert "Add simple CPU offloading support" #2252

Merged

Revert "Revert "Add simple CPU offloading support"" #2253

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add simple CPU offloading support. #2081

Add simple CPU offloading support. #2081

janimo commented Nov 18, 2024 •

edited

Loading

janimo commented Nov 18, 2024

zhyncs commented Nov 23, 2024

janimo commented Nov 23, 2024

Add simple CPU offloading support. #2081

Add simple CPU offloading support. #2081

Conversation

janimo commented Nov 18, 2024 • edited Loading

janimo commented Nov 18, 2024

zhyncs commented Nov 23, 2024

janimo commented Nov 23, 2024

janimo commented Nov 18, 2024 •

edited

Loading