Improve RDC #167

cloudhan · 2023-09-17T13:52:02Z

cuda_objects now returns CcInfo with linking context, thus, the cc_library deps can correctly propagate to cuda_library
correctly handle transitive objects for archiving and device linking.

This fixed static linking issue introduced by #125, and partially address some concerns mentioned in #164

It is incorrect in the case of whole archive linking and prevent us to creating shared library.

cloudhan · 2023-09-17T13:52:10Z

/test

cloudhan · 2023-09-17T13:53:27Z

@rnburn I think you would like to try this out, I will wait for your feedback for a while.

github-actions · 2023-09-17T14:11:51Z

Integration Build: succeeded ✅
https://github.com/bazel-contrib/rules_cuda/actions/runs/6213974554

rnburn · 2023-09-18T17:07:43Z

Thank you @cloudhan - I'll try testing the transitive linking dependencies.

cloudhan · 2023-09-19T02:16:55Z

Intermediate *_dlink.o completely destory the idea of transitive device link. The only acceptable way is do dlink once and only once in the whole dependency graph.

https://forums.developer.nvidia.com/t/linking-multiple-static-cuda-libs/148964

cloudhan · 2023-09-25T08:51:03Z

@jsharpe Any suggestion？

@rnburn If there is no problem, I'd like to keep current impl.

But it is still flawed because nvlink limitation. In the future, we will switch to explicit dlink, where attr rdc only indicate the srcs are compiled with -dc

cloudhan · 2023-10-06T16:45:00Z

Backlog with an example for this. examples_deep_rdc.tar.gz

    intermediate1.a
     ↗         ↘                     cc_import and
base.a           lib_cu.a --> deep.so -----------------> main
     ↘         ↗                     link with deep.so
    intermediate2.a

cloudhan · 2023-10-06T17:11:01Z

Backlog for already known issue:

mark base.a, intermediate1.a, intermediate2.a and lib_cu.a as alwayslink = 1

and change deep.so as the following

cc_binary(
    name = "deep",
    linkshared = 1,
    linkstatic = 1,
    deps = [
        ":base",
        ":intermediate1",
        ":intermediate2",
        ":lib_cu",
    ],
)

That is we want to include all symbols and produce a usable .so file for other user or downstream library, and this is a common valid use case, however, will trigger errors:

/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate1/intermediate1_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_3822aeeb_16_intermediate1_cu_10ef6297'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate1/intermediate1_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2ecb4e32_7_base_cu_ca1617bc'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate2/intermediate2_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2a970105_16_intermediate2_cu_955613c6'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/intermediate2/intermediate2_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2ecb4e32_7_base_cu_ca1617bc'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
/usr/bin/ld.gold: error: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/base/base_dlink.rdc.pic.o: multiple definition of '__cudaRegisterLinkedBinary_2ecb4e32_7_base_cu_ca1617bc'
/usr/bin/ld.gold: bazel-out/k8-fastbuild/bin/deep_rdc/_objs/lib_cu/lib_cu_dlink.rdc.pic.o: previous definition here
collect2: error: ld returned 1 exit status
Target //deep_rdc:main failed to build
Use --verbose_failures to see the command lines of failed build steps.

This is cause by the nvlink limitation:

It does not consume a previous dlink produced _dlink.o, if you feed the obj file to nvlink, nvlink error : Undefined reference to ...
Without feeding _dlink.o, in latter stage dlink, nvlink will produce duplicate symbols which have been produced in former dlinks.

examples_deep_rdc_not_good.tar.gz

cloudhan added 4 commits September 17, 2023 14:45

Remove rdc output from cuda_library

641deb6

It is incorrect in the case of whole archive linking and prevent us to creating shared library.

Better handling of rdc objects archiving logic

f384591

Rename transitive*objects as archive*objects

8721b3d

Make transitive rdc cuda_library correct

4133ba6

cloudhan requested review from jsharpe and ryanleary as code owners September 17, 2023 13:52

Minor update comment

351da50

cloudhan merged commit 894603f into main Oct 6, 2023
13 checks passed

cloudhan deleted the cloudhan/better-rdc branch October 6, 2023 16:47

cloudhan mentioned this pull request Oct 6, 2023

Can't link a shared library using clang #162

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve RDC #167

Improve RDC #167

cloudhan commented Sep 17, 2023 •

edited

Loading

cloudhan commented Sep 17, 2023

cloudhan commented Sep 17, 2023 •

edited

Loading

github-actions bot commented Sep 17, 2023

rnburn commented Sep 18, 2023

cloudhan commented Sep 19, 2023 •

edited

Loading

cloudhan commented Sep 25, 2023

cloudhan commented Oct 6, 2023 •

edited

Loading

cloudhan commented Oct 6, 2023

Improve RDC #167

Improve RDC #167

Conversation

cloudhan commented Sep 17, 2023 • edited Loading

cloudhan commented Sep 17, 2023

cloudhan commented Sep 17, 2023 • edited Loading

github-actions bot commented Sep 17, 2023

rnburn commented Sep 18, 2023

cloudhan commented Sep 19, 2023 • edited Loading

cloudhan commented Sep 25, 2023

cloudhan commented Oct 6, 2023 • edited Loading

cloudhan commented Oct 6, 2023

cloudhan commented Sep 17, 2023 •

edited

Loading

cloudhan commented Sep 17, 2023 •

edited

Loading

cloudhan commented Sep 19, 2023 •

edited

Loading

cloudhan commented Oct 6, 2023 •

edited

Loading