Skip to content
This repository has been archived by the owner on Aug 10, 2024. It is now read-only.
/ vllm-ci Public archive

CI scripts designed to build a Pascal-compatible version of vLLM.

License

Notifications You must be signed in to change notification settings

sasha0552/vllm-ci

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vllm-ci

CI scripts designed to build a Pascal-compatible version of vLLM and Triton.

Installation

Note: this repository holds "nightly" builds of vLLM, which may have the same vLLM version between releases in this repository, but have different source code. Despite the fact that they are "nightly", they are generally stable.

Note: kernels for all GPUs except Pascal have been excluded to reduce build time and wheel size. You can still use the new GPUs using tensor parallelism with Ray (and using two instances of vLLM, one of which will use upstream vLLM). Complain in issues if it disrupts your workflow.

To install the patched vLLM (the patched triton will be installed automatically):

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install vLLM
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ vllm

# Launch vLLM
vllm serve --help

To update a patched vLLM between same vLLM release versions (e.g. 0.5.0 (commit 000000) -> 0.5.0 (commit ffffff))

# Activate virtual environment
source venv/bin/activate

# Update vLLM
pip3 install --force-reinstall --extra-index-url https://sasha0552.github.io/vllm-ci/ --no-cache-dir --no-deps --upgrade vllm

To install aphrodite-engine with the patched triton:

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
source venv/bin/activate

# Install aphrodite-engine
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ --extra-index-url https://downloads.pygmalion.chat/whl aphrodite-engine

# Launch aphrodite-engine
aphrodite --help

In other words, add --extra-index-url https://sasha0552.github.io/vllm-ci/ to the original installation command.

To install the patched triton separately, for use in other applications (for example, Stable Diffusion WebUIs):

Note that this will install triton==2.3.0 (for torch==2.3.0)! If you need other versions of triton, check out my other repo - triton-ci. I plan to publish it on PyPI as soon as the file size limit increase request is approved.

Install application that published on PyPI and depends on triton:

# Install triton
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ <PACKAGE NAME>

Install triton before installing application:

# Install triton
pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ triton

If application is already installed:

# Install triton
pip3 install --index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall --no-deps triton

Don't forget to activate the virtual environment (if necessary) before performing actions!

About

CI scripts designed to build a Pascal-compatible version of vLLM.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages