Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openvino model executor #1

Closed
wants to merge 107 commits into from
Closed

Conversation

ilya-lavrenov
Copy link
Owner

@ilya-lavrenov ilya-lavrenov commented Mar 12, 2024

Install:

# Note: install openvino with pytorch_module_extension first!
cd vllm
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cpu"
VLLM_OPENVINO=1 pip install -e .

Run sample via vLLM modeling:

python3 examples/offline_inference.py

Run via optimum modeling:

VLLM_OPENVINO_OPTIMUM=1 python3 examples/offline_inference.py

@ilya-lavrenov ilya-lavrenov force-pushed the openvino-model-executor branch from 735fcf4 to 228cbf7 Compare March 12, 2024 20:49
ilya-lavrenov and others added 28 commits April 4, 2024 13:16
Use PagedAttentionExtension from OV without contrib dependency
Disable NPU merged to OV master recently
Add bitsandbytes to requirements and use fixed vllm version in the client
Disable weight compression on optimum-intel conversion path
Produce artifacts for bare metal installation in Dockerfile.openvino
[CPU] PagedAttention support u8 kvcache
Revert "Produce artifacts for bare metal installation in Dockerfile.openvino"
[CPU] Add comment for u8 kvcache layout
@ilya-lavrenov ilya-lavrenov deleted the openvino-model-executor branch October 9, 2024 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants