Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve RDC #167

Merged
merged 5 commits into from
Oct 6, 2023
Merged

Improve RDC #167

merged 5 commits into from
Oct 6, 2023

Conversation

cloudhan
Copy link
Collaborator

@cloudhan cloudhan commented Sep 17, 2023

  1. cuda_objects now returns CcInfo with linking context, thus, the cc_library deps can correctly propagate to cuda_library
  2. correctly handle transitive objects for archiving and device linking.

This fixed static linking issue introduced by #125, and partially address some concerns mentioned in #164

@cloudhan
Copy link
Collaborator Author

/test

@cloudhan
Copy link
Collaborator Author

cloudhan commented Sep 17, 2023

@rnburn I think you would like to try this out, I will wait for your feedback for a while.

@github-actions
Copy link

Integration Build: succeeded ✅
https://github.com/bazel-contrib/rules_cuda/actions/runs/6213974554

@rnburn
Copy link

rnburn commented Sep 18, 2023

Thank you @cloudhan - I'll try testing the transitive linking dependencies.

@cloudhan
Copy link
Collaborator Author

cloudhan commented Sep 19, 2023

Intermediate *_dlink.o completely destory the idea of transitive device link. The only acceptable way is do dlink once and only once in the whole dependency graph.

https://forums.developer.nvidia.com/t/linking-multiple-static-cuda-libs/148964

@cloudhan
Copy link
Collaborator Author

@jsharpe Any suggestion?

@rnburn If there is no problem, I'd like to keep current impl.

But it is still flawed because nvlink limitation. In the future, we will switch to explicit dlink, where attr rdc only indicate the srcs are compiled with -dc

@cloudhan
Copy link
Collaborator Author

cloudhan commented Oct 6, 2023

Backlog with an example for this. examples_deep_rdc.tar.gz

    intermediate1.a
     ↗         ↘                     cc_import and
base.a           lib_cu.a --> deep.so -----------------> main
     ↘         ↗                     link with deep.so
    intermediate2.a

@cloudhan cloudhan merged commit 894603f into main Oct 6, 2023
13 checks passed
@cloudhan cloudhan deleted the cloudhan/better-rdc branch October 6, 2023 16:47
@cloudhan
Copy link
Collaborator Author

cloudhan commented Oct 6, 2023

Backlog for already known issue:

mark base.a, intermediate1.a, intermediate2.a and lib_cu.a as alwayslink = 1

and change deep.so as the following

cc_binary(
    name = "deep",
    linkshared = 1,
    linkstatic = 1,
    deps = [
        ":base",
        ":intermediate1",
        ":intermediate2",
        ":lib_cu",
    ],
)

That is we want to include all symbols and produce a usable .so file for other user or downstream library, and this is a common valid use case, however, will trigger errors:

/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate1/intermediate1_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_3822aeeb_16_intermediate1_cu_10ef6297'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate1/intermediate1_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2ecb4e32_7_base_cu_ca1617bc'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate2/intermediate2_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2a970105_16_intermediate2_cu_955613c6'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate2/intermediate2_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2ecb4e32_7_base_cu_ca1617bc'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/base/base_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2ecb4e32_7_base_cu_ca1617bc'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
collect2: error: ld returned 1 exit status
Target //deep_rdc:main failed to build
Use --verbose_failures to see the command lines of failed build steps.

This is cause by the nvlink limitation:

  1. It does not consume a previous dlink produced _dlink.o, if you feed the obj file to nvlink, nvlink error : Undefined reference to ...
  2. Without feeding _dlink.o, in latter stage dlink, nvlink will produce duplicate symbols which have been produced in former dlinks.

examples_deep_rdc_not_good.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants