-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33701: [C++] Add support for LTO (link time optimization) build #33847
Conversation
Some base type classes don't have hidden method implementations. It may define these classes in each translation unit. It may cause one-definition-rule violation with `-flto`. How to reproduce: ```bash CFLAGS="-flto" CXXFLAGS="-flot cmake ... cmake --build ```
@github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest |
Revision: f84418b Submitted crossbow builds: ursacomputing/crossbow @ actions-a60255052f
|
@paleolimbot This is another approach. |
@@ -288,31 +288,46 @@ std::shared_ptr<DataType> GetPhysicalType(const std::shared_ptr<DataType>& type) | |||
class ARROW_EXPORT FixedWidthType : public DataType { | |||
public: | |||
using DataType::DataType; | |||
// This is only for preventing defining this class in each | |||
// translation unit to avoid one-definition-rule violation. | |||
~FixedWidthType() override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why it causes ODR violation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't know much about this but...
I tried to create a small example to reproduce this case but couldn't create it... So I explain what I observed:
If we don't have this, .o
of a C++ code that has std::make_shared<arrow::XXXType>
such as std::make_shared<arrow::ListType>
has the following nm
result:
$ nm --demangle src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o | grep 'vtable for arrow::FloatingPointType'
00000000 W vtable for arrow::FloatingPointType
W
is a global weak symbol.
https://linux.die.net/man/1/nm
If lowercase, the symbol is local; if uppercase, the symbol is global (external).
"W"
"w"The symbol is a weak symbol that has not been specifically tagged as a weak object symbol.
If we have this, the symbol becomes a undefined symbol:
$ nm --demangle src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o | grep 'vtable for arrow::FloatingPointType'
U vtable for arrow::FloatingPointType
https://linux.die.net/man/1/nm
"U"
The symbol is undefined.
If there are multiple W vtable for arrow::FloatingPointType
in our .o
. linking libarrow.so
with -flto
is failed:
[650/1111] Linking CXX shared library debug/libarrow.so.1100.0.0
FAILED: debug/libarrow.so.1100.0.0
: && /bin/c++ -fPIC -Wno-noexcept-type -flto -fdiagnostics-color=always -Wall -Wno-conversion -Wno-sign-conversion -Wunused-result -fno-semantic-interposition -msse4.2 -g -Werror -O0 -ggdb -shared -Wl,-soname,libarrow.so.1100 -o debug/libarrow.so.1100.0.0 ...(**/*.o)... && :
cpp/src/arrow/type.h:313: error: virtual table of type ‘struct FloatingPointType’ violates one definition rule [-Werror=odr]
313 | class ARROW_EXPORT FloatingPointType : public NumberType {
|
cpp/src/arrow/type.h:313:20: note: the conflicting type defined in another translation unit
313 | class ARROW_EXPORT FloatingPointType : public NumberType {
| ^
<built-in>: note: virtual method ‘__cxa_pure_virtual’
...
It seems that W
and -flto
is a bad combination. If we change W
to U
by defining a not-inlined deconstructor, we don't get the ODR error.
Do you notice anything from this observation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why it happens. Clang doesn't complain ODR issue.
cc @pitrou, @westonpace, @lidavidm for comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why adding that destructor changes W to U. You are defining a non-inline key function. Therefore the vtable is defined in only one place.
However, I don't understand why it is an ODR error. For example, in the clang docs:
When a class has no non-pure, non-inline, virtual functions, there is no key function, and the compiler is forced to emit the vtable in every translation unit that references the class. In this case, it is emitted in a COMDAT section, which allows the linker to eliminate all duplicate copies. This is still wasteful in terms of object file size and link time, so it’s always advisable to ensure there is at least one eligible function that can serve as the key function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this is only happen with GCC (GNU ld?).
I couldn't reproduce this with CC=clang CXX=clang++ CFLAGS="-flto" CXXFLAGS="-flto" cmake ...
. (W
symbol is still generated but there is no link error.)
Is it acceptable that we add this change only for GCC and LTO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change is okay. And I wonder how you find this solution, it's not obvious from the error log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a try and error...
There are some abstract classes for arrow::DataType
family. Most of them are in error messages but arrow::DataType
isn't shown. I noticed that arrow::DataType
is U
and others are W
in nm --demangle
result. I found that arrow::DataType::~DataType()
is defined in type.cc
but other abstract classes don't define their destructors. So I tried to define them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a class has no non-pure, non-inline, virtual functions, there is no key function, and the compiler is forced to emit the vtable in every translation unit that references the class. In this case, it is emitted in a COMDAT section, which allows the linker to eliminate all duplicate copies. This is still wasteful in terms of object file size and link time, so it’s always advisable to ensure there is at least one eligible function that can serve as the key function.
Does it mean if a class has virtual functions, it's recommended that at least one of them should be in .cc
?
I find it hard to understand a class has no non-pure, non-inline, virtual functions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is what it means. I think "wasteful" is too strong of a word though. I think the consequence should generally be minor enough that it isn't really a concern.
Thank you @kou! I don't know enough about LTO or how/why this is a fix to say that this should or should not be defensively merged. Not having warnings during LTO linking makes my life easier (I don't have to figure out how to silence that nightly job or to attempt justifying anything to CRAN; however, I don't know enough about the details to make a recommendation otherwise. |
@@ -288,31 +288,46 @@ std::shared_ptr<DataType> GetPhysicalType(const std::shared_ptr<DataType>& type) | |||
class ARROW_EXPORT FixedWidthType : public DataType { | |||
public: | |||
using DataType::DataType; | |||
// This is only for preventing defining this class in each | |||
// translation unit to avoid one-definition-rule violation. | |||
~FixedWidthType() override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need to define the destructor separately?
~FixedWidthType() override; | |
~FixedWidthType() override = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried = default
. If we add = default
, generated symbol is still W
not U
. It seems that we need to define the destructor separately for U
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Benchmark runs are scheduled for baseline = 108a028 and contender = 24da3be. 24da3be is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…ild (apache#33847) ### Rationale for this change Some base type classes don't have hidden method implementations. It may define these classes in each translation unit. It may cause one-definition-rule violation with `-flto`. ### What changes are included in this PR? Define at least one hidden method to prevent defining these base type classes in each translation unit. ### Are these changes tested? How to reproduce: ```bash CFLAGS="-flto" CXXFLAGS="-flto" cmake ... cmake --build ... ``` This reports link errors without this change. ### Are there any user-facing changes? No. * Closes: apache#33701 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Yibo Cai <yibo.cai@arm.com>
…ild (apache#33847) ### Rationale for this change Some base type classes don't have hidden method implementations. It may define these classes in each translation unit. It may cause one-definition-rule violation with `-flto`. ### What changes are included in this PR? Define at least one hidden method to prevent defining these base type classes in each translation unit. ### Are these changes tested? How to reproduce: ```bash CFLAGS="-flto" CXXFLAGS="-flto" cmake ... cmake --build ... ``` This reports link errors without this change. ### Are there any user-facing changes? No. * Closes: apache#33701 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Yibo Cai <yibo.cai@arm.com>
…ild (apache#33847) ### Rationale for this change Some base type classes don't have hidden method implementations. It may define these classes in each translation unit. It may cause one-definition-rule violation with `-flto`. ### What changes are included in this PR? Define at least one hidden method to prevent defining these base type classes in each translation unit. ### Are these changes tested? How to reproduce: ```bash CFLAGS="-flto" CXXFLAGS="-flto" cmake ... cmake --build ... ``` This reports link errors without this change. ### Are there any user-facing changes? No. * Closes: apache#33701 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Rationale for this change
Some base type classes don't have hidden method implementations. It may define these classes in each translation unit. It may cause one-definition-rule violation with
-flto
.What changes are included in this PR?
Define at least one hidden method to prevent defining these base type classes in each translation unit.
Are these changes tested?
How to reproduce:
This reports link errors without this change.
Are there any user-facing changes?
No.