Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducer and fix for issue encountered in smart_holder update. #4228

Merged
merged 6 commits into from
Oct 10, 2022

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Oct 7, 2022

Description

Reproducer and fix for compilation failure related to PR #4220.

Suggested changelog entry:

Ralf W. Grosse-Kunstleve added 3 commits October 8, 2022 01:22
…ermination to detail/common.h

So that it actually is defined in pybind11.h
…ate.

This commit tested in isolation on top of current master + first version of reproducer (62311eb).

Succeeds with Debian Clang 14.0.6 C++17 (and probably all other compilers).

Fails for CUDA 11.7:

```
cd /build/tests && /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dpybind11_tests_EXPORTS -I/mounted_pybind11/include -isystem=/usr/include/python3.10 -g --generate-code=arch=compute_52,code=[compute_52,sm_52] -Xcompiler=-fPIC -Xcompiler=-fvisibility=hidden -Werror all-warnings -std=c++17 -MD -MT tests/CMakeFiles/pybind11_tests.dir/test_class.cpp.o -MF CMakeFiles/pybind11_tests.dir/test_class.cpp.o.d -x cu -c /mounted_pybind11/tests/test_class.cpp -o CMakeFiles/pybind11_tests.dir/test_class.cpp.o
/mounted_pybind11/tests/test_class.cpp(53): error: more than one instance of overloaded function "pybind11::class_<type_, options...>::def [with type_=test_class::pr4220_tripped_over_this::Empty0, options=<>]" matches the argument list:
            function template "pybind11::class_<test_class::pr4220_tripped_over_this::Empty0> &pybind11::class_<type_, options...>::def(const char *, Func &&, const Extra &...) [with type_=test_class::pr4220_tripped_over_this::Empty0, options=<>]"
/mounted_pybind11/include/pybind11/pybind11.h(1557): here
            function template "pybind11::class_<test_class::pr4220_tripped_over_this::Empty0> &pybind11::class_<type_, options...>::def(const T &, const Extra &...) [with type_=test_class::pr4220_tripped_over_this::Empty0, options=<>]"
/mounted_pybind11/include/pybind11/pybind11.h(1586): here
            argument types are: (const char [8], <unknown-type>)
            object type is: pybind11::class_<test_class::pr4220_tripped_over_this::Empty0>

1 error detected in the compilation of "/mounted_pybind11/tests/test_class.cpp".
```
@rwgk rwgk marked this pull request as ready for review October 8, 2022 18:14
@rwgk rwgk changed the title Reproducer for issue encountered in smart_holder update. Reproducer and fix for issue encountered in smart_holder update. Oct 8, 2022
Copy link
Collaborator

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that better to narrow the work-around.

Can we make the smart holder branch part of CI?

Copy link
Collaborator

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider: sfinae is slow. Potentially keep the ifdefs for the work-around?

Also the comments are gone to describe why we do the construct.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2022

One thing to consider: sfinae is slow.

Do you have measurements or references?

I believe this change will have an unmeasurable impact on the compilation time, but it would be interesting to see data, if someone wants to spend the time collecting them. However, I believe (very strongly) that any such micro-level optimizations have a negative value (timer better spent somewhere else), at least from my perspective (I regularly get asked what I did with my time and what the impact was).

Potentially keep the ifdefs for the work-around?

It's not a workaround anymore IMO, but a clearly better alternative: it keeps implementation details of operators.h (<op_id id, op_type ot, typename L, typename R>) out of pybind11.h. It just so happened that we saw that potential only as a side-effect of looking for a workaround. Of course, it's not a generally applicable approach, because it's intrusive on the type we're targeting with the enable_if_t, but the target type here is in our detail namespace, therefore it would be silly IMO to not take advantage of the opportunity, now that we know we have it (works with all compilers on GitHub and passed global testing with the Google toolchain).

Also the comments are gone to describe why we do the construct.

Because it's a universally better alternative. I believe it's wise to not touch that code again unless we have some kind of breakage again.

I'm hoping to deploy the smart_holder branch update (#4227) Google-internally tomorrow morning. I jumped in here only to not set you back with a rollback because of my timeline. However, if you feel this needs more work, could you please merge #4230 and work on it in your time? It's pretty quick for me to redo #4227 after the rollback.

Can we make the smart holder branch part of CI?

The smart_holder branch doubles the CI load, because it needs to be tested with 1. smart_holder as add-on/option, 2. smart_holder as the default holder. I believe it would be too burdensome at this time to double the CI load that on master. (There are a few other reasons to keep it separate that are complicated to explain.) That's why I reduced the test that fails to build on the smart_holder branch and added it to test_class on master.

@ax3l
Copy link
Collaborator

ax3l commented Oct 9, 2022 via email

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 10, 2022

Thanks Axel! I can't read the article tonight but will look tomorrow.
Merging here, to then redo the smart_holder update (for simpler merge history; I don't squash), so it's ready for tomorrow morning.

@rwgk rwgk merged commit da104a9 into pybind:master Oct 10, 2022
@rwgk rwgk deleted the test_class_pr4220_issue branch October 10, 2022 04:50
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Oct 10, 2022
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 12, 2022

@ax3l I haven't looked at the paper yet, but I made a sheet with Compile times before & after merging PR #4228 ("cpp only sorted" subsheet).

Everything needed to reproduce it (minus my hardware & exact OS) is archived here:

https://github.com/rwgk/pybind11_scons/tree/b168c1e10b2fbf525a8a3856c0c372670dc5f960/compile_times_before_and_after_merging_pr4228

My conclusion:

  • If anything, this PR made the compile times a little bit shorter.
  • But I don't really believe much happened: note that column before #3 in the sheet has systematically longer times. See also the plot in the sheet, the yellow spikes. Why? I have no clue.
  • From those data I think it's definitely fair to say that the impact this PR had on the compile times is in the noise.

Please let me know if you see a flaw in how I obtained the data.

@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Oct 12, 2022
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 12, 2022

I just took a super quick look at the paper. Nice that they put this right at the top:

Version	sol2	sol3 (patron-only alpha)	sol3 (now)
Build Time	2 hours, 3 minutes	2 hours, 11 minutes	1 hour, 53 minutes
Binary Size	119.0 MB	138.0 MB	103.0 MB

After seeing those numbers I zoned out straightaway. Savings, yes, but that's barely enough to get a cup of coffee, in a 2 hour wait.

My take home: Just buy another core and code on.

Human time is so much more expensive.

@ax3l
Copy link
Collaborator

ax3l commented Oct 13, 2022

Thank you for doing the detailed comparison, that is super interesting. Great that is stays the same / is potentially a bit faster now! My bet would be that the function template instantiations we used before are of roughly the same cost, similar as in the big rule of Chiel figure in that post? (Purely academic interest remains.)

We luckily also use them here at the entry point for the user, probably different if we had them deep down a often instantiated & recursively used template class. Anyway, all of that is luckily not relevant here 🎉

For curiosity, which compiler did you use?

My take home: Just buy another core and code on.

Generally yes. Only for header-only libs that go into a single TU, you cannot parallelize your critical path (your longest TU). Now, if someone added OpenMP support to LLVM's compiler passes... ;-)

Thinking about it, .def() in pybind11 is not such a location, since we can indeed parallelize that by splitting in user code 👍

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 13, 2022

similar as in the big rule of Chiel figure in that post?

I learned the hard way to focus on the "What do actually I want to do?" more than worrying about the details, although they are interesting. Then profile, and optimize the bits that provably matter.

For curiosity, which compiler did you use?

$ clang++ --version
Debian clang version 14.0.6-2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

(It's in the README.md but buried in the middle.)

Thinking about it, .def() in pybind11 is not such a location, since we can indeed parallelize that by splitting in user code 👍

Yes, it's slightly on the cumbersome side, but you can basically break it down to any level of granularity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants