Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incompatible_do_not_split_linking_cmdline: Order of C++ link flags affected #7687

Closed
oquenchil opened this issue Mar 11, 2019 · 12 comments
Closed
Assignees
Labels
incompatible-change Incompatible/breaking change P1 I'll work on this now. (Assignee required) team-Rules-CPP Issues for C++ rules

Comments

@oquenchil
Copy link
Contributor

oquenchil commented Mar 11, 2019

Flag: --incompatible_do_not_split_linking_cmdline
Available since: 0.25 (March 2019 release)
Will be flipped in: 0.26 (April 2019 release)
Tracking issue: #7670

This changes how flags are passed to the linker. If you don't have your own custom CROSSTOOL, then you don't need to migrate anything, but you should still test with this flag to see that nothing breaks. If it does, please let us know.

If you have your own custom CROSSTOOL:
If the CROSSTOOL doesn't support linker parameter files, then you don't have to migrate anything. If it supports linker parameter file but you do not specify the feature 'linker_param_file' in your CROSSTOOL, then you don't have to migrate anything.

If you specify the feature 'linker_param_file' and the linker is using gcc-style linker flags, then you should modify the feature so that instead of having a parameter file specified like this:
flag: '-Wl,@%{linker_param_file}'
it is like this:
flag: '@%{linker_param_file}'

In other words, the parameter file should not be passed directly to the linker with -Wl,. In addition to this you should specify the feature 'do_not_split_linking_cmdline' in your CROSSTOOL, whether it's enabled or disabled it does not matter, we only check its presence. This will tell Bazel to use the new behavior with this toolchain. After the flip of incompatible_disable_legacy_cc_provider this feature will do nothing, it can safely be deleted.

This change will put every flag inside the parameter file whereas before only flags starting with -l were put there, i.e.: all libraries. This might cause the order of link flags to change with respect to what you originally had in your CROSSTOOL, if it does you will have to re-order the corresponding flags. If you have to make any changes to the CROSSTOOL, make sure that you add the feature 'do_not_split_linking_cmdline' too. Example of change:

Old:

feature {
  name: 'linker_param_file',
  flag_set {
         action: 'c++-link-executable'
         action: 'c++-link-dynamic-library'
         action: 'c++-link-nodeps-dynamic-library'
         flag_group {
             expand_if_all_available: 'linker_param_file'
             flag: '-Wl,@%{linker_param_file}'
         }
}

New:

feature {
  name: 'do_not_split_linking_cmdline'
}
feature {
  name: 'linker_param_file',
  flag_set {
         action: 'c++-link-executable'
         action: 'c++-link-dynamic-library'
         action: 'c++-link-nodeps-dynamic-library'
         flag_group {
             expand_if_all_available: 'linker_param_file'
             flag: '@%{linker_param_file}'
         }
}
@oquenchil oquenchil added P1 I'll work on this now. (Assignee required) team-Rules-CPP Issues for C++ rules incompatible-change Incompatible/breaking change bazel 1.0 labels Mar 11, 2019
@oquenchil oquenchil self-assigned this Mar 11, 2019
@dkelmer
Copy link
Contributor

dkelmer commented Apr 3, 2019

@oquenchil, there don't seem to be associated PRs submitted. Does the flag exist already? If yes, by when do you want people to migrate? If the flag exists, it will be available in the currently being cut release, 0.25. The earliest you can require people to migrate by is the 0.26 release. Is that the plan?

@dkelmer
Copy link
Contributor

dkelmer commented Apr 3, 2019

@oquenchil Seems like the flag does exist and that it was flipped (http://google3/third_party/bazel/src/main/java/com/google/devtools/build/lib/rules/cpp/CppOptions.java?l=836&rcl=241059021). I think that the flag flip needs to be rolled back because there wasn't a migration window.

@hlopko
Copy link
Member

hlopko commented Apr 5, 2019

You're looking into the wrong flag :) This is the change that introduced this flag: 3010e57. I'm changing the labels accordingly.

@dkelmer
Copy link
Contributor

dkelmer commented Apr 5, 2019

Oh, the title of the issue is inconsistent with the beginning of the content of the issue which is why I got confused :)
Thanks for updating the labels!

@hlopko
Copy link
Member

hlopko commented Apr 8, 2019

Updated, sorry for the confusion.

bazel-io pushed a commit that referenced this issue May 14, 2019
Tested with Bazel CI downstream projects. These fail for unrelated reasons. Both together show that everything is green with my change:
https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/971
https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/969

This change makes it unnecessary to patch most CppActionConfig features only in Linux for now.

The build_interface_libraries feature is still patched. Once Bazel is released with this change: 4eb7683
we can stop patching it.

Also a separate change should do the same for Windows and Mac.

This also unblocks flipping --incompatible_do_not_split_linking_cmdline

#8303, #6926, #7687

RELNOTES:none
PiperOrigin-RevId: 248117783
bazel-io pushed a commit that referenced this issue May 28, 2019
GH issue: #7687

The flag is now true by default in Bazel.

Tested:
https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/996

Envoy is showing up as broken. Udp breakage is due to a different reason.

The Remote Execution is also fine to break with this change according to buchgr@. That project is using an old version of the default crosstool which still patches features from Bazel.

For Envoy I filed envoyproxy/envoy#6968, suggested a fix and they will fix it soonish. If they don't, please disable Envoy downstream or I might miss the 0.27 cut.

RELNOTES:
PiperOrigin-RevId: 250252235
irengrig pushed a commit to irengrig/bazel that referenced this issue Jun 18, 2019
Tested with Bazel CI downstream projects. These fail for unrelated reasons. Both together show that everything is green with my change:
https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/971
https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/969

This change makes it unnecessary to patch most CppActionConfig features only in Linux for now.

The build_interface_libraries feature is still patched. Once Bazel is released with this change: bazelbuild@4eb7683
we can stop patching it.

Also a separate change should do the same for Windows and Mac.

This also unblocks flipping --incompatible_do_not_split_linking_cmdline

bazelbuild#8303, bazelbuild#6926, bazelbuild#7687

RELNOTES:none
PiperOrigin-RevId: 248117783
irengrig pushed a commit to irengrig/bazel that referenced this issue Jun 18, 2019
GH issue: bazelbuild#7687

The flag is now true by default in Bazel.

Tested:
https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/996

Envoy is showing up as broken. Udp breakage is due to a different reason.

The Remote Execution is also fine to break with this change according to buchgr@. That project is using an old version of the default crosstool which still patches features from Bazel.

For Envoy I filed envoyproxy/envoy#6968, suggested a fix and they will fix it soonish. If they don't, please disable Envoy downstream or I might miss the 0.27 cut.

RELNOTES:
PiperOrigin-RevId: 250252235
@adzenith
Copy link
Contributor

adzenith commented Jul 9, 2019

Enabling incompatible_do_not_split_linking_cmdline appears to put all of our linker flags after all libraries have been listed. This means that flags like --no-as-needed don't work, because they need to come before the libs. I can see how I can reorder the flags among themselves in our cc_config, but I'm not sure how to move them before or after the libs. Is there an easy way to do that?

Old order (desired):   Order with incompatible_do_not_split_linking_cmdline:
-rpath                 -rpath
-L                     -L
cc_config flags        -l
-l                     cc_config flags

irengrig pushed a commit to irengrig/bazel that referenced this issue Jul 15, 2019
GH issue: bazelbuild#7687

The flag is now true by default in Bazel.

Tested:
https://buildkite.com/bazel/bazel-at-head-plus-downstream/builds/996

Envoy is showing up as broken. Udp breakage is due to a different reason.

The Remote Execution is also fine to break with this change according to buchgr@. That project is using an old version of the default crosstool which still patches features from Bazel.

For Envoy I filed envoyproxy/envoy#6968, suggested a fix and they will fix it soonish. If they don't, please disable Envoy downstream or I might miss the 0.27 cut.

RELNOTES:
PiperOrigin-RevId: 250252235
@oquenchil
Copy link
Contributor Author

Hey Nikolaus, there is no way to control the order except in cc_config.

@adzenith
Copy link
Contributor

@oquenchil Is there a way in cc_config to do what I tried to show in my previous question? That is, can I put any of my flags before the list of libraries? If not, this is a breaking change for us that I don't know how we'll work around. We have this flag disabled for now. Should I open a new ticket?

@oquenchil
Copy link
Contributor Author

Yes, you can control in cc_config the order of flags. But for custom flags that are coming in from individual rules with the attribute linkopts, they will all either before the list of libraries or they will all be behind. Unfortunately, there is no way to say these linkopts before libraries and these other linkopts after.

By cc_config I have assumed the whole time that you have your custom file like this:

There you can change the order. For example there you can put user_link_flags_feature (linkopts attributes of rules) before libraries_to_link_feature.

@adzenith
Copy link
Contributor

Yeah, I have a file similar to that, but I don't have any section that looks like this one

                                flags = ["%{libraries_to_link.object_files}"],
                                iterate_over = "libraries_to_link.object_files",

That is to say, I don't appear to have a feature anywhere that's actually adding the library paths to the linker command line. I assumed that something inside of Bazel was adding them automatically. I'm not sure how they're being added.

@oquenchil
Copy link
Contributor Author

Some features get patched in by Bazel by default when they are not present in the config file. See:

if (!existingFeatureNames.contains("libraries_to_link")) {

So all you have to do is copy over everything you see in CppActionConfigs.java between the ifLinux() brackets into your own config file. Or just copy it from the unix_cc_toolchain_config.bzl. Then once you have the libraries_to_link feature in the config file explicitly, just place it before or after any other feature you want.

@adzenith
Copy link
Contributor

adzenith commented Aug 8, 2019

Awesome! Thanks so much for the help. I'll add it in.

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this issue Nov 19, 2019
Bazel's change to legacy_whole_archive behavior is not the cause for TF's linking issues with protobuf. Protobuf's implementation and runtime are correctly being linked into TF here: https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/tensorflow/core/platform/default/build_config.bzl#L239 and https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/third_party/protobuf/protobuf.patch#L18, and I've confirmed that protobuf symbols are still present in libtensorflow_framework.so via nm.

After examining the linker flags that bazel passes to gcc, https://gist.github.com/bmzhao/f51bbdef50e9db9b24acd5b5acc95080, I discovered that the order of the linker flags was what was causing the undefined reference.

See https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/ and https://stackoverflow.com/a/12272890. Basically linkers discard the objects they've been asked to link if those objects do not export any symbols that the linker currently has kept track as "undefined".

To prove this was the issue, I was able to successfully link after moving the linking shared object flag (-l:libtensorflow_framework.so.2) to the bottom of the flag order, and manually invoking g++.

This change uses cc_import to to link against a .so in the "deps" of tf_cc_binary, rather than as the "srcs" of tf_cc_binary. This technique was inspired by the comment here: https://github.com/bazelbuild/bazel/blob/387c610d09b99536f7f5b8ecb883d14ee6063fdd/examples/windows/dll/windows_dll_library.bzl#L47-L48

Successfully built on vanilla Ubuntu 18.04 VM:
bmzhao@bmzhao-tf-build-failure-reproing:~/tf-fix/tf$ bazel build -c opt --config=cuda --config=v2 --host_force_python=PY3 //tensorflow/tools/pip_package:build_pip_package
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 2067.380s, Critical Path: 828.19s
INFO: 12942 processes: 51 remote cache hit, 12891 local.
INFO: Build completed successfully, 14877 total actions

The root cause might instead be bazelbuild/bazel#7687, which is pending further investigation.

PiperOrigin-RevId: 281341817
Change-Id: Ia240eb050d9514ed5ac95b7b5fb7e0e98b7d1e83
mihaimaruseac pushed a commit to tensorflow/tensorflow that referenced this issue Dec 4, 2019
Bazel's change to legacy_whole_archive behavior is not the cause for TF's linking issues with protobuf. Protobuf's implementation and runtime are correctly being linked into TF here: https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/tensorflow/core/platform/default/build_config.bzl#L239 and https://github.com/tensorflow/tensorflow/blob/da5765ebad2e1d3c25d11ee45aceef0b60da499f/third_party/protobuf/protobuf.patch#L18, and I've confirmed that protobuf symbols are still present in libtensorflow_framework.so via nm.

After examining the linker flags that bazel passes to gcc, https://gist.github.com/bmzhao/f51bbdef50e9db9b24acd5b5acc95080, I discovered that the order of the linker flags was what was causing the undefined reference.

See https://eli.thegreenplace.net/2013/07/09/library-order-in-static-linking/ and https://stackoverflow.com/a/12272890. Basically linkers discard the objects they've been asked to link if those objects do not export any symbols that the linker currently has kept track as "undefined".

To prove this was the issue, I was able to successfully link after moving the linking shared object flag (-l:libtensorflow_framework.so.2) to the bottom of the flag order, and manually invoking g++.

This change uses cc_import to to link against a .so in the "deps" of tf_cc_binary, rather than as the "srcs" of tf_cc_binary. This technique was inspired by the comment here: https://github.com/bazelbuild/bazel/blob/387c610d09b99536f7f5b8ecb883d14ee6063fdd/examples/windows/dll/windows_dll_library.bzl#L47-L48

Successfully built on vanilla Ubuntu 18.04 VM:
bmzhao@bmzhao-tf-build-failure-reproing:~/tf-fix/tf$ bazel build -c opt --config=cuda --config=v2 --host_force_python=PY3 //tensorflow/tools/pip_package:build_pip_package
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
  bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 2067.380s, Critical Path: 828.19s
INFO: 12942 processes: 51 remote cache hit, 12891 local.
INFO: Build completed successfully, 14877 total actions

The root cause might instead be bazelbuild/bazel#7687, which is pending further investigation.

PiperOrigin-RevId: 281341817
Change-Id: Ia240eb050d9514ed5ac95b7b5fb7e0e98b7d1e83
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incompatible-change Incompatible/breaking change P1 I'll work on this now. (Assignee required) team-Rules-CPP Issues for C++ rules
Projects
None yet
Development

No branches or pull requests

6 participants