Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocm and hip: 4.3.1 → 4.5.2 #150767

Merged
merged 10 commits into from
Dec 22, 2021
Merged

rocm and hip: 4.3.1 → 4.5.2 #150767

merged 10 commits into from
Dec 22, 2021

Conversation

Flakebi
Copy link
Member

@Flakebi Flakebi commented Dec 14, 2021

Motivation for this change

Update all rocm packages to 4.5.2.
I tested rocm-smi, some OpenCL examples and a few hip examples from https://github.com/ROCm-Developer-Tools/HIP-Examples and they ran through fine. More testing of hip would be nice, I’m not that familiar with all its features and the binaries that are shipped.

There were a few changes in the rocm build landscape:

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@acowley
Copy link
Contributor

acowley commented Dec 15, 2021

I’ve not yet had a chance to build or run this, but many of the changes look very positive! Thank you! Something missing here as compared to what I did in the overlay is to move rocm-opencl-runtime from Python 2 to 3. I don’t think that change caused any trouble, and it will help address the nixpkgs migration from 2 to 3.

Fix the build with rocclr distributed by source.
rocclr cannot be built alone and needs to be distributed by source now.
Fix compiler-rt build and use ninja for faster builds.
@Flakebi
Copy link
Member Author

Flakebi commented Dec 15, 2021

Thanks for the comment!
I updated the patches to include more of the overlay changes (diff).

@acowley
Copy link
Contributor

acowley commented Dec 15, 2021

One issue I'm having now is that when I try to build something with the hipcc from this PR, I get,

/nix/store/lbxfixyw1yk099pjyaiy3xj5dl7kxm1g-binutils-2.35.2/bin/ld: cannot find -lamdhip64
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)

So it's missing something like a -L $out/lib. Can you see where that might be missing?

@acowley
Copy link
Contributor

acowley commented Dec 15, 2021

I think it might be the way we patch HIP_LIB_PATH in hipcc. You do this nice thing where you build a hip derivation that the final hipamd derivation points to, while I took a single source view approach by copying the hip source into the hipamd build tree. One difference between those approaches is the meaning of $out in different places.

hip was split into multiple repositories.
This builds the version for AMD GPUs.
@Flakebi
Copy link
Member Author

Flakebi commented Dec 22, 2021

Thanks for testing and digging into this, the HIP_LIB_PATH is indeed wrong!

I previously tested hip with having hip in nativeBuildInputs in a shell.nix and that also added the hip/lib path to NIX_LDFLAGS, so I didn’t encounter this error in my testing.
I now tried again without hip in the build inputs and it works after removing the HIP_LIB_PATH replacement (which makes HIP_LIB_PATH default to $HIP_ROCCLR_HOME/lib, which is the correct path).

@acowley
Copy link
Contributor

acowley commented Dec 22, 2021

@Flakebi Do you by any chance have cachix setup with CI builds for your fork? Building llvm and clang to test is a real drag on cycle time.

@Flakebi
Copy link
Member Author

Flakebi commented Dec 22, 2021

I usually use my beefy work machine to compile anything that contains llvm, that alleviates the pain 🙂 I guess I could push the builds somewhere, but it looks like cachix costs something?

@acowley
Copy link
Contributor

acowley commented Dec 22, 2021

cachix has a free tier for open source projects, and it can be hooked into GitHub CI quite nicely. This is sloppy, but you can see how it's used in the nixos-rocm overlay so that users don't have to build anything.

I've not set it up myself for a nixpkgs fork, so that could be an issue. But there is also the option to manually push store paths which would work for this case.

No matter: my small tests of hip and OpenCL ran properly, so I'm happy with this. If you're satisfied with the PR now, @Flakebi , and neither @matthewbauer nor @lovesegfault chime in, I'll merge the PR later tonight.

@Flakebi
Copy link
Member Author

Flakebi commented Dec 22, 2021

That’s fine with me, thanks for testing again.
Next time, I’ll try pushing the builds to cachix.

@lovesegfault
Copy link
Member

Looks AOK for me :)

@lovesegfault lovesegfault merged commit f403d1f into NixOS:master Dec 22, 2021
@Flakebi Flakebi deleted the rocm branch December 22, 2021 23:46
@acowley
Copy link
Contributor

acowley commented Dec 22, 2021

Thank you for taking this on and seeing it through, @Flakebi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants