python3.pkgs.{torch,triton}: two copies of llvm in the closure #256571

SomeoneSerge · 2023-09-21T21:08:38Z

Issue description

...one brought in by mlir, and one seemingly brought it directly. E.g. on my current system (paths included only to show they're different):

python3-3.10.12-env → python3.10-triton-2.0.0 → rocm-llvm-mlir-5.4.4 → hip-amd-5.4.4 → rocm-opencl-runtime-5.4.4 → rocm-runtime-5.4.3 → rocm-llvm-clang-5.4.4 → rocm-llvm-llvm-5.4.4
- Path: /nix/store/hg2nby9s72yc2lx2by4fn9smffnv0iyd-rocm-llvm-llvm-5.4.4
- Added Size: 1.43 GiB
python3-3.10.12-env → python3.10-triton-2.0.0 → rocm-llvm-llvm-5.4.4:
- Path: /nix/store/j0vfk0ybphyv9398svhxz303x11aanpa-rocm-llvm-llvm-5.4.4
- Added Size: 1.47 GiB

@NixOS/cuda-maintainers

The text was updated successfully, but these errors were encountered:

samuela · 2023-09-21T21:39:51Z

IIRC triton requires its own patched version of llvm, but weird that we'd have two different patched versions somehow 🤔

ConnorBaker · 2023-09-21T21:54:18Z

I thought they had moved largely towards building LLVM HEAD (or at least trying to track it more closely). I do remember the python setup script downloading a pre-built copy of LLVM with MLIR (MLIR might not be present in the default version of LLVM most distributions make available?), but I think that was more for convenience than anything else. https://github.com/openai/triton/blob/c4bc3fd92f79a1a470f0a3a086c0fa525124bc6d/python/setup.py#L69

It looks like it supports building with arbitrary LLVM (https://github.com/openai/triton#building-with-a-custom-llvm), just not arbitrary versions due to breaking changes and whatnot.

EDIT: This issue is probably also still relevant: triton-lang/triton#1617

SomeoneSerge · 2023-09-22T13:16:35Z

I thought they had moved largely towards building LLVM HEAD

Well since we're bringing this up, there's actually even more divergence: upstream Pytorch wheels are built against the HEAD of the fork of openai/triton, which in turn is built against the HEAD of LLVM. We're rebuilding LLVM for Pytorch anyway, maybe we should use both LLVM and Triton from ROCmSoftwarePlatform/

As for the original issue, we probably just need to override llvmPackages.mlir:

nixpkgs/pkgs/development/python-modules/openai-triton/default.nix

Line 150 in 378c0c6

llvmPackages.mlir

...to use this llvm:

nixpkgs/pkgs/development/python-modules/openai-triton/default.nix

Lines 31 to 38 in 378c0c6

    
           llvm = (llvmPackages.llvm.override { 
        
             llvmTargetsToBuild = [ "NATIVE" "NVPTX" ]; 
        
             # Upstream CI sets these too: 
        
             # targetProjects = [ "mlir" ]; 
        
             extraCMakeFlags = [ 
        
               "-DLLVM_INSTALL_UTILS=ON" 
        
             ]; 
        
           });

EDIT: But also, idk what exactly the llvm package includes, or how much of that openai/triton actually uses...

Madouura · 2023-10-10T23:14:08Z

You may want to see the openai-triton changes in #258328.

Madouura · 2023-11-11T05:16:31Z

Shouldn't this be fixed now?
Since openai-triton has it's own custom LLVM now?
Excluding torchWithRocm, which will probably always have rocm-llvm.

SomeoneSerge · 2023-11-11T13:38:39Z

Yes, this might have been fixed. We need to inspect nix-tree (if not add an automated check...) before we close the issue

Madouura · 2023-11-12T12:00:30Z

Inspected with nix-tree.
openai-triton only has one copy of LLVM.
torch has no copies of LLVM.
torchWithRocm only has an indirect copy of LLVM (rocm-llvm).

SomeoneSerge · 2023-11-12T16:09:26Z

I'm afraid we may still be suffering from this issue: there's only rocm-llvm-llvm in the closure of python3Packages.torchWithRocm, but there's both a rocm-llvm-llvm and an openai-triton-llvm in the closure of python3.withPackages (ps: [ ps.torchWithRocm ]) which includes the propagatedBuildInputs:

Each is about 1.5GiB of added size judging by nix-tree (these are negligible though, compared to the total 17GiB of rocm-merged)

@Madouura could you please confirm my interpretation and reopen the issue?

EDIT: the cudaSupport = true variant seems fine at this point, it only contains the openai-triton-llvm

Madouura · 2023-11-12T16:26:53Z

Ah. I didn't realize we had to use python3.withPackages for this.
Regular torch and torchWithCuda should be fine.
torchWithRocm unfortunately will have to stay with two copies, because we can't build openai-triton on rocm-llvm reliably. Thus, I don't think the issue needs to be reopened.

SomeoneSerge added the 6.topic: python label Sep 21, 2023

FliegendeWurst added the 6.topic: closure size The final size of a derivation, including its dependencies label Nov 11, 2023

Madouura closed this as completed Nov 12, 2023

rrbutani added 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related and removed 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related labels May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python3.pkgs.{torch,triton}: two copies of llvm in the closure #256571

python3.pkgs.{torch,triton}: two copies of llvm in the closure #256571

SomeoneSerge commented Sep 21, 2023

samuela commented Sep 21, 2023

ConnorBaker commented Sep 21, 2023 •

edited

Loading

SomeoneSerge commented Sep 22, 2023 •

edited

Loading

Madouura commented Oct 10, 2023

Madouura commented Nov 11, 2023

SomeoneSerge commented Nov 11, 2023

Madouura commented Nov 12, 2023

SomeoneSerge commented Nov 12, 2023 •

edited

Loading

Madouura commented Nov 12, 2023

python3.pkgs.{torch,triton}: two copies of llvm in the closure #256571

python3.pkgs.{torch,triton}: two copies of llvm in the closure #256571

Comments

SomeoneSerge commented Sep 21, 2023

Issue description

samuela commented Sep 21, 2023

ConnorBaker commented Sep 21, 2023 • edited Loading

SomeoneSerge commented Sep 22, 2023 • edited Loading

Madouura commented Oct 10, 2023

Madouura commented Nov 11, 2023

SomeoneSerge commented Nov 11, 2023

Madouura commented Nov 12, 2023

SomeoneSerge commented Nov 12, 2023 • edited Loading

Madouura commented Nov 12, 2023

ConnorBaker commented Sep 21, 2023 •

edited

Loading

SomeoneSerge commented Sep 22, 2023 •

edited

Loading

SomeoneSerge commented Nov 12, 2023 •

edited

Loading