Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[smart_holder] fix smart_holder regressions with multiple inheritance #3635

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

virtuald
Copy link
Contributor

Description

The smart_holder does not handle multiple inheritance correctly, and this test demonstrates that. The tests work fine with normal holders.

Investigation so far

I believe I've identified the source of the problem, but I don't have any good ideas for a fix at the present moment (@rwgk if you have ideas but not time to implement them, let me know and I can give it a shot).

My gdb investigation was done using this python code:

from pybind11_tests import multiple_inheritance as m
o = m.MVF()
assert o.b == 1

Lots of stuff happens, and at some point modified_type_caster_generic_load_impl::load_impl gets called (smart_holder_type_casters.h:162). This eventually calls try_implicit_casts, which recursively tries to convert to the correct type.

If you compare to the original type caster, the modified smart holder caster does basically the same logic, with a key difference:

  • I believe the original type caster recursives until it finds the source type, then converts each value until it gets to the target type (MVF to MVE, MVE to MVD0, MVD0 to MVC, MVC to MVB) and so the final value is an MVB, which is stored in value and is the correct type
  • The smart_holder type caster only stores the implicit_cast, but does not convert the value

This is bad because at some point smart_holder_type_caster_load::convert_type on line 525 calls the implicit_cast
function from the modified_type_caster_generic_load_impl.

The implicit_cast function is a lambda from pybind11.h:1489

return static_cast<Base *>(reinterpret_cast<type *>(src));

If I'm interpreting my GDB correctly, Base is MVB and type is MVC when it's called in this case.

Which is not what it should be doing. I think (right? since src isn't a MVC, it's an MVF?).

What it probably should be doing is recursively calling the implicit cast function on the value, converting MVF to MVE, MVE to MVD0 .. etc, just like the original caster did. Or maybe it should also be storing the converted value? It's not 100% clear to me why it's different here, but I'm guessing it has something to do with ownership.

@virtuald virtuald force-pushed the smart-holder-mvi branch 2 times, most recently from 2f5ac26 to 3c287cf Compare January 21, 2022 09:02
@rwgk
Copy link
Collaborator

rwgk commented Jan 21, 2022

Thanks a lot Dustin! It looks complicated, I'll need to carve out some time to dig into this. Do you see a way to reduce the new test, based on your analysis? That would make it easier for me to get my head around the problem. But I'll look into this regardless, asap.

@virtuald
Copy link
Contributor Author

Looking at the rest results, it seems that you only need to keep MVB, MVC, and MVD0 to reproduce the problem -- so you could delete MVD1, MVE, MVF.

When the issue is finally resolved, I think it would be good to keep the whole test.

@virtuald
Copy link
Contributor Author

I believe I have a fix for the issue, using my original intuition stated above. This creates a std::vector of implicit casters that is appended to during the recursive type caster walk, and once it's time to do the actual cast it calls them in order.

I don't really like adding the std::vector to the type caster, but it's simplest? However, it's getting allocated every time we create a type caster, so it's probably a bad idea.

@virtuald virtuald changed the title [smart_holder] Add tests demonstrating smart_holder issues with multiple inheritance [smart_holder] fix smart_holder regressions with multiple inheritance Jan 22, 2022
@rwgk
Copy link
Collaborator

rwgk commented Jan 22, 2022

Awesome, thanks! I can only quickly look right now (late night here). It looks good at first glance. I don't think the overhead matters much (we could run ubench/holder_comparison.cpp to get basic timings), and maybe we can find tricks or compromises to minimize the overhead. I'll play with this as soon as I get a chance; it's at the top of the pile.

@virtuald virtuald force-pushed the smart-holder-mvi branch 2 times, most recently from 67b55bc to a4755b7 Compare January 22, 2022 07:34
@@ -251,7 +253,7 @@ class modified_type_caster_generic_load_impl {
const std::type_info *cpptype = nullptr;
void *unowned_void_ptr_from_direct_conversion = nullptr;
const std::type_info *loaded_v_h_cpptype = nullptr;
void *(*implicit_cast)(void *) = nullptr;
std::vector<void *(*)(void *)> implicit_casts;
value_and_holder loaded_v_h;
bool reinterpret_cast_deemed_ok = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the safety guard value change due to this change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the safety guard value change due to this change?

I'm thinking not. The safety guard was meant to be a very targeted sanity check, not a version check.

I've ~neglected versioning of the smart_holder internals because Google-internally we're basically building everything from sources, for each test or executable independently & hermetically. But for more safety externally, It's probably a good idea to give smart_holder internal versioning a little more thought; in a separate PR.

I'll rerun the ubench timings asap. Not sure if I get a large-enough time window over the weekend, but Monday almost certainly.

@rwgk
Copy link
Collaborator

rwgk commented Jan 25, 2022

Hi Dustin, some observations and questions:

  • I imported your PR into the Google dev environment and tested with the ASAN address sanitizer (standard procedure for all new code) and it's healthy as expected. — Caveat: I didn't get to run the global testing yet. I just initiated an overnight run. (I don't expect issues.)

  • I blindly ran the ubench timings and at worst I saw a 2% slowdown for what it exercises, more or less in the noise. (I hadn't done this for many months and wanted to get fresh numbers with and without your change.)

  • Then I commented out all implicit_cast code and the ubench timings ran anyway (I'm not surprised), so the only overhead could be from the extra std::vector member, but I doubt it. I'm tempted to take a shortcut and simply claim that the overhead is entirely insignificant. At worst I'm slightly wrong. Clearly, correct behavior is far more important. We can optimize later if we ever get the feeling it's worth it.

  • Could you please create a PR against master, with only the added tests? — My opinion is: clearly whatever we do in the future with casters, we want those test. — Small suggestion about naming: a name different than another_diamond would be nice, something like implicit_cast_diamond maybe, or pr3635_diamond, to help future readers of the code to easily connect back to useful information.

  • After the test is merged on master, I'll merge master into smart_holder, then we can rebase here, then merge this PR.

  • The // TODO: should this be a copy or move? is completely fine with me, the only suggestion I have is to use SMART_HOLDER_WIP: instead of TODO.

.def_readwrite("e", &MVE::e);
// TODO: py::multiple_inheritance is required here, but pybind11 should
// be able to detect this by looking at MVE which is already bound...
py::class_<MVF, MVE>(m, "MVF", py::multiple_inheritance())
Copy link
Collaborator

@Skylion007 Skylion007 Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about this. Is this indicating a more subtle bug with the smart holder? Why does the type_record only register a single base in rec.bases?

if (rec.bases.size() > 1 || rec.multiple_inheritance) {
or does it not realize MVE is non-simple:
else if (rec.bases.size() == 1) {

Copy link
Contributor Author

@virtuald virtuald Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It records that it has a single base because MVF only has a single immediate base. To get the correct behavior it would need to examine all of its bases in turn for the multiple inheritance flag.

Actually, if it only examined its immediate bases for the flag, that would fix it, because the flag would be propagated through all derived classes.

Copy link
Contributor Author

@virtuald virtuald Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does fix it, I'll open another PR (#3650) to have a discussion about it.

@virtuald
Copy link
Contributor Author

Thanks Ralf, I'll split the PR as suggested later today/tonight.

@rwgk
Copy link
Collaborator

rwgk commented Jan 25, 2022

Thanks Ralf, I'll split the PR as suggested later today/tonight.

Perfect.

The global testing came back unusually noisy, but after scrolling up and down through the failures for a while I'm thinking this PR didn't cause any failures. (I'll probably retry after we're through the splitting steps, hoping for cleaner results.)

@rwgk
Copy link
Collaborator

rwgk commented Jan 27, 2022

I'm not sure how to best handle the situation that the tests are on master already. There must be a way, but how?

My usual procedure is:

But if I do that now the smart_holder branch will have many broken tests.

@virtuald would it work if you did this?

  • git rebase smart_holder (only if necessary)
  • git merge master
  • Verify that everything looks good.
  • git push (possibly with force)
  • Wait for tests.

Then me doing this?

  • git checkout smart_holder; git pull . smart-holder-mvi; git push

This wouldn't squash your commits, but I think that's better than us having to coordinate.
(I think it nice/better to not squash your commits here, if we want to look back later.)
Unless there is a better way?

@rwgk
Copy link
Collaborator

rwgk commented Jan 27, 2022

Oh ... face palm.

  • Could you simply remove the tests from this PR?
  • Then I'll merge by clicking the button here. Nothing broken.
  • Then I go through my standard procedure to git merge master.

The original pybind11 holder supported multiple inheritance by recursively creating
type casters until it finds one for the source type, then converting each
value in turn to the next type via typeinfo->implicit_cast

The smart_holder only stored the last implicit_cast, which was incorrect.

This commit changes it to create a list of implicit_cast functions that are
appended to during the recursive type caster creation, and when the time comes
to cast to the destination type, it calls all of them in the correct order.
@virtuald
Copy link
Contributor Author

Just saw your note. I removed the tests.

@rwgk
Copy link
Collaborator

rwgk commented Jan 27, 2022

As soon as I see that the CI is happy I'll merge.

The one failure at this moment is a known flake, good to ignore.

@rwgk
Copy link
Collaborator

rwgk commented Jan 27, 2022

Thanks a lot Dustin! Merging now.

@rwgk rwgk merged commit fd8265e into pybind:smart_holder Jan 27, 2022
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jan 27, 2022
@rwgk rwgk removed the needs changelog Possibly needs a changelog entry label Jan 27, 2022
@virtuald virtuald deleted the smart-holder-mvi branch January 27, 2022 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants