-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducer and fix for issue encountered in smart_holder update. #4228
Conversation
…ermination to detail/common.h So that it actually is defined in pybind11.h
…ate. This commit tested in isolation on top of current master + first version of reproducer (62311eb). Succeeds with Debian Clang 14.0.6 C++17 (and probably all other compilers). Fails for CUDA 11.7: ``` cd /build/tests && /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dpybind11_tests_EXPORTS -I/mounted_pybind11/include -isystem=/usr/include/python3.10 -g --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler=-fPIC -Xcompiler=-fvisibility=hidden -Werror all-warnings -std=c++17 -MD -MT tests/CMakeFiles/pybind11_tests.dir/test_class.cpp.o -MF CMakeFiles/pybind11_tests.dir/test_class.cpp.o.d -x cu -c /mounted_pybind11/tests/test_class.cpp -o CMakeFiles/pybind11_tests.dir/test_class.cpp.o /mounted_pybind11/tests/test_class.cpp(53): error: more than one instance of overloaded function "pybind11::class_<type_, options...>::def [with type_=test_class::pr4220_tripped_over_this::Empty0, options=<>]" matches the argument list: function template "pybind11::class_<test_class::pr4220_tripped_over_this::Empty0> &pybind11::class_<type_, options...>::def(const char *, Func &&, const Extra &...) [with type_=test_class::pr4220_tripped_over_this::Empty0, options=<>]" /mounted_pybind11/include/pybind11/pybind11.h(1557): here function template "pybind11::class_<test_class::pr4220_tripped_over_this::Empty0> &pybind11::class_<type_, options...>::def(const T &, const Extra &...) [with type_=test_class::pr4220_tripped_over_this::Empty0, options=<>]" /mounted_pybind11/include/pybind11/pybind11.h(1586): here argument types are: (const char [8], <unknown-type>) object type is: pybind11::class_<test_class::pr4220_tripped_over_this::Empty0> 1 error detected in the compilation of "/mounted_pybind11/tests/test_class.cpp". ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that better to narrow the work-around.
Can we make the smart holder branch part of CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to consider: sfinae is slow. Potentially keep the ifdefs for the work-around?
Also the comments are gone to describe why we do the construct.
Do you have measurements or references? I believe this change will have an unmeasurable impact on the compilation time, but it would be interesting to see data, if someone wants to spend the time collecting them. However, I believe (very strongly) that any such micro-level optimizations have a negative value (timer better spent somewhere else), at least from my perspective (I regularly get asked what I did with my time and what the impact was).
It's not a workaround anymore IMO, but a clearly better alternative: it keeps implementation details of operators.h (
Because it's a universally better alternative. I believe it's wise to not touch that code again unless we have some kind of breakage again. I'm hoping to deploy the smart_holder branch update (#4227) Google-internally tomorrow morning. I jumped in here only to not set you back with a rollback because of my timeline. However, if you feel this needs more work, could you please merge #4230 and work on it in your time? It's pretty quick for me to redo #4227 after the rollback.
The smart_holder branch doubles the CI load, because it needs to be tested with 1. smart_holder as add-on/option, 2. smart_holder as the default holder. I believe it would be too burdensome at this time to double the CI load that on master. (There are a few other reasons to keep it separate that are complicated to explain.) That's why I reduced the test that fails to build on the smart_holder branch and added it to test_class on master. |
Thanks for the details. Reading on my phone. I am ok to simplify this following your arguments. Thanks a lot for this update to the patch, def. a simpler interface now.
Ref (for curiosity) on SFINAE: generally the worst for compile time
https://thephd.dev/sol3-compile-times-binary-sizes
But instantiating a function template is not far off.
With your evaluation comment: Just generally, our time as lib developers for a header only lib is not equal 11k+ downstream developers productivity on each of their TUs. But I think we all know that :)
For serious regression testing of compile time we would need automation.
|
Thanks Axel! I can't read the article tonight but will look tomorrow. |
@ax3l I haven't looked at the paper yet, but I made a sheet with Compile times before & after merging PR #4228 ("cpp only sorted" subsheet). Everything needed to reproduce it (minus my hardware & exact OS) is archived here: My conclusion:
Please let me know if you see a flaw in how I obtained the data. |
I just took a super quick look at the paper. Nice that they put this right at the top:
After seeing those numbers I zoned out straightaway. Savings, yes, but that's barely enough to get a cup of coffee, in a 2 hour wait. My take home: Just buy another core and code on. Human time is so much more expensive. |
Thank you for doing the detailed comparison, that is super interesting. Great that is stays the same / is potentially a bit faster now! My bet would be that the function template instantiations we used before are of roughly the same cost, similar as in the big rule of Chiel figure in that post? (Purely academic interest remains.) We luckily also use them here at the entry point for the user, probably different if we had them deep down a often instantiated & recursively used template class. Anyway, all of that is luckily not relevant here 🎉 For curiosity, which compiler did you use?
Generally yes. Only for header-only libs that go into a single TU, you cannot parallelize your critical path (your longest TU). Now, if someone added OpenMP support to LLVM's compiler passes... ;-) Thinking about it, |
I learned the hard way to focus on the "What do actually I want to do?" more than worrying about the details, although they are interesting. Then profile, and optimize the bits that provably matter.
(It's in the README.md but buried in the middle.)
Yes, it's slightly on the cumbersome side, but you can basically break it down to any level of granularity. |
Description
Reproducer and fix for compilation failure related to PR #4220.
Suggested changelog entry: