Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{lib}[gcccuda/2020b] NCCL v2.8.4-1 #12183

Closed

Conversation

branfosj
Copy link
Member

@branfosj branfosj commented Feb 17, 2021

(created using eb --new-pr)

easyblock in easybuilders/easybuild-easyblocks#2337 and minor fix in easybuilders/easybuild-easyblocks#2460

Fixes #12180

@branfosj branfosj marked this pull request as draft February 17, 2021 10:35
@Micket
Copy link
Contributor

Micket commented Feb 17, 2021

Should we limit it to the cuda compute capabilities that we have?

By default, NCCL is compiled for all supported architectures. To accelerate the compilation and reduce the binary size, consider redefining NVCC_GENCODE (defined in makefiles/common.mk) to only include the architecture of the target platform :

$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_70,code=sm_70"

@Flamefire
Copy link
Contributor

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml3 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), Python 2.7.5
See https://gist.github.com/ce8638c4a91f592497eef0bb085cb915 for a full test report.

@Flamefire
Copy link
Contributor

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusi8019 - Linux centos linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), Python 2.7.5
See https://gist.github.com/711ee113b01079ec87b6c5aff0ebfa6f for a full test report.

@boegel
Copy link
Member

boegel commented Feb 17, 2021

Should we limit it to the cuda compute capabilities that we have?

Yes, but ideally that's done in a custom easyblock, may be difficult to do it cleanly in an easyconfig (i.e. handle the case when --cuda-compute-capabilities is not set)

@branfosj
Copy link
Member Author

Should we limit it to the cuda compute capabilities that we have?

Yes, but ideally that's done in a custom easyblock, may be difficult to do it cleanly in an easyconfig (i.e. handle the case when --cuda-compute-capabilities is not set)

I think we can just fallback to passing nothing to the NCCL build and let NCCL build the fat binary.

@branfosj branfosj marked this pull request as ready for review February 18, 2021 17:07
@branfosj branfosj added this to the 4.x milestone Feb 19, 2021
@verdurin
Copy link
Member

Test report by @verdurin
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2337
SUCCESS
Build succeeded for 4 out of 4 (1 easyconfigs in total)
easybuild-c7.novalocal - Linux centos linux 7.9.2009, x86_64, Intel Xeon Processor (Skylake, IBRS), Python 3.6.8
See https://gist.github.com/c685c3d2b31a200a6c4275273a4c4fbf for a full test report.

@branfosj
Copy link
Member Author

branfosj commented Jun 7, 2021

Test report by @branfosj
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2337
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0212u15b.bear.cluster - Linux RHEL 8.3, x86_64, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (broadwell), Python 3.6.8
See https://gist.github.com/3417e645d85399498e9a6bad3bab4974 for a full test report.

@branfosj
Copy link
Member Author

branfosj commented Jun 7, 2021

Test report by @branfosj
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2337
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0212u15b.bear.cluster - Linux RHEL 8.3, x86_64, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (broadwell), Python 3.6.8
See https://gist.github.com/78ba58131ace34827f40bfd84ed77838 for a full test report.

@branfosj
Copy link
Member Author

branfosj commented Jun 7, 2021

Test report by @branfosj
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2337
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0212u15b.bear.cluster - Linux RHEL 8.3, x86_64, Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz (broadwell), Python 3.6.8
See https://gist.github.com/7e7d884dc2fa2453291de987dcf031e5 for a full test report.

description = """The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node collective
communication primitives that are performance optimized for NVIDIA GPUs."""

toolchain = {'name': 'gcccuda', 'version': '2020b'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about the toolchain for this. Prior we had SYSTEM-CUDA, now (full)gcccuda, which means that it can't be used with Intel toolchains, can it?
So maybe GCCcore with CUDA dep and suffix?

@easybuilders easybuilders deleted a comment from boegelbot Jun 8, 2021
@easybuilders easybuilders deleted a comment from boegelbot Jun 8, 2021
@branfosj
Copy link
Member Author

branfosj commented Jun 9, 2021

Closing this. Using #13071 instead.

@branfosj branfosj closed this Jun 9, 2021
@branfosj branfosj deleted the 20210217103451_new_pr_NCCL2841 branch August 19, 2021 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to building NCCL
5 participants