Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

Closed
simon-mo opened this issue Jan 29, 2024 · 5 comments · Fixed by #2830
Closed

Call for Help: Proper Build System (CMake, Bazel, etc). #2654

simon-mo opened this issue Jan 29, 2024 · 5 comments · Fixed by #2830
Labels
help wanted Extra attention is needed

Comments

@simon-mo
Copy link
Collaborator

simon-mo commented Jan 29, 2024

Currently vLLM's compilation tool uses PyTorch's extension builders, which calls Ninja under the hood. This works okay but have the following issues:

  • Only supports NVIDIA and AMD GPUs.
  • Slow sequential builds. This is amplified by adding quantization kernels and LoRA kernels.
  • No caching and incremental builds.

We would liked to ask for community's help on recommending a technology, prototype, and implement it. Ideally something like CMake or Bazel could work but it requires some careful thinking.

The requirements:

  • Must support multiple hardware architecture (NVIDIA, AMD, Intel, etc).
  • Must support incremental build, which also implies caching.
  • Must support parallelizable build.
  • Good to have editor support (by generating compilation database).
  • Ideally it would not OOM like current setup. Currently due to the rigid structure, we have to carefully set MAX_JOBS and NVCC_THREADS to get around compiler goes out of memory. I think this is because nvcc spawn threads for each SM architecture we are compiling to.
  • vaguely, "future proof".

Currently, the "build system" is all in here https://github.com/vllm-project/vllm/blob/main/setup.py

@WoosukKwon WoosukKwon added the help wanted Extra attention is needed label Jan 29, 2024
@lroberts7
Copy link

@rgommers would meson-python support this?

It checks most of the boxes but not sure about the multiple hardware for accelerators. If not hoping you might have some experience and opinions you'd be willing to share.

@rgommers
Copy link

The question here isn't very clear to me, I'm missing context I guess. Reading all the requirements, it should like you need a regular build system (CMake or Meson are the most commonly used and best general-purpose options). However, if you're already using the PyTorch extension builder, it sounds like that is something you do on the fly (maybe exposed to end users?) - this is a very different use case.

@simon-mo simon-mo changed the title Call for Help: Compilation Build Tool Call for Help: Proper Build System (CMake, Bazel, etc). Jan 30, 2024
@simon-mo
Copy link
Collaborator Author

Ah good to clarify here. We are really just looking for a regular build system to replace current usage of Torch extension builders.

@robertgshaw2-redhat
Copy link
Collaborator

@simon-mo, the team from Neural Magic is going to work on this

cc @tlrmchlsmth @bnellnm

@bnellnm
Copy link
Contributor

bnellnm commented Feb 19, 2024

Hi all, I wanted to give an update on this project. So far, I've got a CUDA build working (see PR #2830). The PR has a detailed description of the cmake system. I'm still working on the AMD/ROCm build which is a little trickier because of the "hipify" preprocessor that pytorch uses on the CUDA sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants