Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update PyTryFrom for PyMapping and PySequence to more accurately check types #2477

Merged
merged 15 commits into from
Aug 10, 2022

Conversation

aganders3
Copy link
Contributor

@aganders3 aganders3 commented Jun 24, 2022

This is intended to fix #2072 - see further discussion in that thread.

The implementation is such that for both PyMapping and PyTryFrom, they will call into python to check isinstance(<obj>, collections.abc.Mapping) before successfully downcasting. Please note this is a breaking change so it is important the documentation and migration guide are clear.

Also note the implementation has changed from the initial draft of this PR. See discussion below and in #2072 for more information.

TODO

  • an entry in CHANGELOG.md
  • docs to all new functions and / or detail in the guide
  • tests for all new or changed functions

src/pyclass.rs Outdated Show resolved Hide resolved
src/types/mapping.rs Outdated Show resolved Hide resolved
tests/test_mapping.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Got a few thoughts to get us started...

src/pyclass.rs Outdated Show resolved Hide resolved
src/types/mapping.rs Outdated Show resolved Hide resolved
tests/test_mapping.rs Outdated Show resolved Hide resolved
@aganders3
Copy link
Contributor Author

aganders3 commented Jun 27, 2022

As a sanity check, I wonder if there is any evidence we can collect (or tests we can write) that this check is functionally equivalent to the slow path? If there's any lingering concern that they may be different, we probably need to always use the slow path.

I did a little research and testing on this. In normal usage I think they will give the same result, but in the end they are not identical because ultimately the collections.abc.Mapping __instancecheck__ and __subclasscheck__ don't look at tp_flags.

With custom classes there's the chance that (for example) a pyo3 user could do something like this - this feels like a bit of a contrived example but I did test it:

#[pyclass(mapping)]
struct Mapping {
    index: HashMap<String, usize>,
}

#[pymethods]
impl Mapping {
    #[new]
    fn new(elements: Option<&PyList>) -> PyResult<Self> {
    // ...
    // truncated implementation of this mapping pyclass - basically a wrapper around a HashMap
}

// force the Mapping class to have Py_TPFLAGS_MAPPING set
#[pyfunction]
fn set_tpflags_mapping(py: Python<'_>) {
    unsafe {
        let ty = Mapping::type_object_raw(py);
        (*ty).tp_flags |= ffi::Py_TPFLAGS_MAPPING;
    }
}

Then testing in Python:

import collections.abc

import pymapping

pymapping.set_tpflags_mapping()

for obj in (dict(), list(), pymapping.Mapping()):
    print(f"type: {type(obj)}")
    print(f"\tmapping_by_tpflags: {pymapping.check_tpflags(obj)}")
    print(f"\tmapping_by_isinstance: {pymapping.check_isinstance(obj)}")

Here's the output - note the discrepancy between mapping_by_tpflags and mapping_by_isinstance for the custom Mapping class:

~/src/rust/pymapping via 🐍 v3.10.2 (.venv) via 🦀 v1.61.0
❯ python test.py
type: <class 'dict'>
	mapping_by_tpflags: True
	mapping_by_isinstance: True
type: <class 'list'>
	mapping_by_tpflags: False
	mapping_by_isinstance: False
type: <class 'builtins.Mapping'>
	mapping_by_tpflags: True
	mapping_by_isinstance: False

It also gets a little hairy because in CPython there are actually two implementations of the ABCs (in C and Python). Only the C-implementation sets Py_TPFLAGS_MAPPING for Foo when you call collections.abc.Mapping.register(Foo), but this does not happen in the Python implementation. This distinction may apply to directly subclassing as well. I'm not sure under which circumstances the Python implementation is used.

@davidhewitt
Copy link
Member

Thanks for the really thorough investigation. So it seems a little unfortunate but I'm now wondering if the best thing is for us to always do the isinstance check? In combination with Mapping.register that's probably the only way to be reliable in all combinations of abi / C implementations?

We could at least cache collections.abc.Mapping in a GILOnceCell to reduce overhead.

@aganders3
Copy link
Contributor Author

Yeah it's a bit sad but it would be more consistent behavior, even if the discrepancy would be rare. Still I'll defer to your (and others') expertise and experience on this. I don't have a good feel for how big this performance hit will be and/or how common the operation is in real use.

For performance-critical situations could a user still do their own check and use try_from_unchecked? It might be sufficient to just document this in the PyMapping docs with an explanation that there is no fast perfect check for mapping types at this time.

@davidhewitt davidhewitt mentioned this pull request Jul 3, 2022
7 tasks
@davidhewitt
Copy link
Member

Sorry for the slight delay - had a backlog of things to get through.

I think consistent behaviour is most important, so I think we should go for the isinstance check always. As you say, users can use unsafe APIs to check the flags / use try_from_unchecked if they really need performance and are willing to accept the differences.

@aganders3 aganders3 force-pushed the pymapping-2072 branch 4 times, most recently from e007464 to 65061fe Compare July 6, 2022 04:34
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is shaping up well. Posted some implementation suggestions.

I'm beginning to wonder if we should do equivalent treatment for PySequence? Seems like it would be good to have both for consistency.

Also I keep going back and forth on whether we should have #[pyclass(mapping)] automatically register with the abc. I think it's still the right choice to make users opt-in, however if they don't when upgrading to 0.17 their casts to PyMapping will break so we should make a very clear entry about this in the migration guide.

src/types/mapping.rs Outdated Show resolved Hide resolved
src/types/mapping.rs Outdated Show resolved Hide resolved
src/types/mapping.rs Outdated Show resolved Hide resolved
src/types/mapping.rs Outdated Show resolved Hide resolved
@davidhewitt
Copy link
Member

Well @aganders3 nobody has raised concerns about also changing sequences, so I guess let's move ahead with that. Do you want to add to this PR, or shall I push separately?

@aganders3
Copy link
Contributor Author

Well @aganders3 nobody has raised concerns about also changing sequences, so I guess let's move ahead with that. Do you want to add to this PR, or shall I push separately?

I think it makes sense to add it here as there will be overlap at least in the docs/changelog for it. I'm just getting back from vacation but should be able to do that some time this week.

@davidhewitt
Copy link
Member

That'd be great, thanks! There's still a few items to go in #2493 before we're ready to go, so there's some time to get this in. I'm crossing things off that list when I can! :)

@aganders3 aganders3 marked this pull request as ready for review July 29, 2022 03:14
@aganders3 aganders3 changed the title Update PyTryFrom for PyMapping to more accurately check for mapping types Update PyTryFrom for PyMapping and PySequence to more accurately check types Jul 29, 2022
@aganders3 aganders3 requested a review from davidhewitt August 2, 2022 22:38
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks really solid. Just a couple of final things to decide on before we merge this in. I've commented on the sequence implementation, the equivalent ideas apply to the mapping implementation.

guide/src/migration.md Outdated Show resolved Hide resolved
src/types/sequence.rs Outdated Show resolved Hide resolved
src/types/sequence.rs Outdated Show resolved Hide resolved
@aganders3
Copy link
Contributor Author

Thanks again for the review. It was trickier than I thought to change the static to hold a PyResult, but I'm getting a better feel for result and error types now and hopefully what I did makes sense. Does the clone_ref negate the benefit of the GILOnceCell? Without that the return was Result<&PyType, &PyErr> which was more awkward to use.

I rebased on main again so hopefully it's still green. Please take another look when you have a chance!

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking brilliant! Just two final tiny suggestions and then let's merge this. Thanks again for a really solid PR here 👍

src/types/mapping.rs Outdated Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
aganders3 and others added 3 commits August 10, 2022 09:05
@aganders3
Copy link
Contributor Author

Thanks, this looks really solid. Just a couple of final things to decide on before we merge this in. I've commented on the sequence implementation, the equivalent ideas apply to the mapping implementation.

Thanks for all the feedback and review!

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks very much again!

@davidhewitt davidhewitt merged commit 5d88e1d into PyO3:main Aug 10, 2022
@Gobot1234
Copy link
Contributor

@aganders3 thank you so much for fixing this!

aganders3 added a commit to aganders3/pyo3 that referenced this pull request Aug 10, 2022
I'm not sure how I messed this up merging the changelog in PyO3#2477. Sorry!

I guess chalk it up as more evidence that PyO3#2337 would be a welcome improvement.
@aganders3 aganders3 mentioned this pull request Aug 10, 2022
@aganders3 aganders3 deleted the pymapping-2072 branch August 10, 2022 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PyMapping is inconsistent with typing.Mapping
3 participants