Configurable flags for the backend compiler #1617

SomeoneSerge · 2023-05-04T18:37:21Z

Hi! I see that openai/triton requires a working toolchain at run-time, including a CUDAToolkit and libpython installations for the host platform. Currently, triton attempts to guess the correct compiler flags on its own: https://github.com/openai/triton/blob/deb2c71fb4f912a5298003fa3fc789885b726607/python/triton/common/build.py#L77-L82

This includes inferring the library locations: https://github.com/openai/triton/blob/deb2c71fb4f912a5298003fa3fc789885b726607/python/triton/common/build.py#L19-L22

What this means, in practice, is that openai/triton is taking on a job that is usually performed by tools like CMake, and that certain care is to be taken when deploying openai/triton. The current flag inference logic is platform-specific and, of course, it isn't expected to be universal either. But we probably should work out a solution on how to make it configurable, so that e.g. distributions can set up their environments to meet triton's expectations.

Some concrete examples of issues that arise:

On NixOS the libcuda.so user-space driver is deployed in a special location, /run/opengl-driver/lib, and whereis wouldn't produce any reasonable output because /lib and /usr/lib do not exit. In python3Packages.torch: 1.13.1 -> 2.0.0 NixOS/nixpkgs#222273 we end up patching triton/compiler.py to pass the correct -L flag to the compiler: https://github.com/NixOS/nixpkgs/blob/e4474334415ac41efb5fda33d4cc8f312397ef05/pkgs/development/python-modules/openai-triton/default.nix#L128-L147. We also have to work around triton trying to vendor a copy of ptxas
In pytorch/pytorch there is a number of confused issues about broken -lcuda and #include <Python.h>

An off-the-shelf way of making libpython and cuda flags configurable would be pkg-config, although I'd feel weird and conflicted about setting up pkg-config at run-time side by side with pytorch. I also note that this situation is somewhat similar to that of torch.utils.cpp_extension, which also attempts to guess build flags at run-time

The text was updated successfully, but these errors were encountered:

SomeoneSerge · 2023-05-04T18:40:26Z

Extra: a configurable path to ptxas is desirable. Maybe even just accepting ptxas from $PATH?

SomeoneSerge · 2023-05-04T18:52:41Z

Update: I also notice that pytorch seems to have tried vendoring ptxas at a point, and ceased doing so since triton became their dependency. This is probably good: if pytorch just asks some function from triton to report the location of ptxas, then by making ptxas optional and configurable in triton, we'd have addressed the issue in pytorch as well

The ptxas discussion can be moved to #1618

cmpute · 2024-07-24T04:45:13Z

FWIW, I met the related issue when I use triton (2.2.0) in a conda environment. The cuda toolkit is installed in the conda env (rather than in the system), so the compiler can't find the cuda library default linking path. Below is the stacktrace

/usr/bin/ld: skipping incompatible /usr/lib/i386-linux-gnu/libcuda.so when searching for -lcuda
/usr/bin/ld: skipping incompatible /usr/lib/i386-linux-gnu/libcuda.so when searching for -lcuda
/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
Traceback (most recent call last):
  File "/home/jacobz/Downloads/01-vector-add.py", line 87, in <module>
    output_triton = add(x, y)
  File "/home/jacobz/Downloads/01-vector-add.py", line 73, in add
    add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
  File "/home/jacobz/.conda/envs/lmdeploy-build/lib/python3.10/site-packages/triton/runtime/jit.py", line 532, in run
    self.cache[device][key] = compile(
  File "/home/jacobz/.conda/envs/lmdeploy-build/lib/python3.10/site-packages/triton/compiler/compiler.py", line 614, in compile
    so_path = make_stub(name, signature, constants, ids, enable_warp_specialization=enable_warp_specialization)
  File "/home/jacobz/.conda/envs/lmdeploy-build/lib/python3.10/site-packages/triton/compiler/make_launcher.py", line 37, in make_stub
    so = _build(name, src_path, tmpdir)
  File "/home/jacobz/.conda/envs/lmdeploy-build/lib/python3.10/site-packages/triton/common/build.py", line 107, in _build
    ret = subprocess.check_call(cc_cmd)
  File "/home/jacobz/.conda/envs/lmdeploy-build/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpvdufqg5s/main.c', '-O3', '-I/home/jacobz/.conda/envs/lmdeploy-build/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/home/jacobz/.conda/envs/lmdeploy-build/include/python3.10', '-I/tmp/tmpvdufqg5s', '-shared', '-fPIC', '-L/home/jacobz/.conda/envs/lmdeploy-build/targets/x86_64-linux/lib/stubs', '-lcuda', '-o', '/tmp/tmpvdufqg5s/add_kernel.cpython-310-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu', '-L/usr/lib/i386-linux-gnu', '-L/usr/lib/i386-linux-gnu']' returned non-zero exit status 1.

I need to pass the cuda library to the compiler, and what you need is to add the cuda library path (for me it's <conda env root>/lib/stubs) to the compiler flags here:

triton/python/triton/common/build.py

Line 89 in c9ab448

cc_cmd += [f"-L{dir}" for dir in cuda_lib_dirs]

The path can be automatically found with:

def conda_cuda_dir():
    conda_path = os.environ['CONDA_PREFIX']
    return os.path.join(conda_path, "lib", "stubs")

This specific issue is fixed in main branch, where there is a env var TRITON_LIBCUDA_PATH for this.

Fixes triton-lang#1617.

ConnorBaker mentioned this issue Sep 21, 2023

python3.pkgs.{torch,triton}: two copies of llvm in the closure NixOS/nixpkgs#256571

Closed

SomeoneSerge mentioned this issue Oct 25, 2023

cuda executables: make optional #2546

Merged

ZzEeKkAa pushed a commit to ZzEeKkAa/triton that referenced this issue Aug 5, 2024

Add 4 failing tests on A770 to the skip list (triton-lang#1655)

9705efb

Fixes triton-lang#1617.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable flags for the backend compiler #1617

Configurable flags for the backend compiler #1617

SomeoneSerge commented May 4, 2023 •

edited

Loading

SomeoneSerge commented May 4, 2023

SomeoneSerge commented May 4, 2023

cmpute commented Jul 24, 2024 •

edited

Loading

Configurable flags for the backend compiler #1617

Configurable flags for the backend compiler #1617

Comments

SomeoneSerge commented May 4, 2023 • edited Loading

SomeoneSerge commented May 4, 2023

SomeoneSerge commented May 4, 2023

cmpute commented Jul 24, 2024 • edited Loading

SomeoneSerge commented May 4, 2023 •

edited

Loading

cmpute commented Jul 24, 2024 •

edited

Loading