-
-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite _create_code() with Structural Pattern Matching (limited to tuples) #496
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice cleanup of the logic, and makes it more maintainable. Have you checked what kind of impact it has on speed? Also, as noted in comments, I'm not convinced that _shims
is the right place for the match
class. Please address the questions in the comments above.
I'm glad you asked about performance, because my first version of the function using pattern matching was performing really poorly as the first 10+ arguments almost always matched (as they are common to almost all patterns and, contrary to the built-in implementation, mine doesn't do anything clever to optimize these cases). I stepped back a little bit the abstraction degree —will push in a couple minutes. Edit: removed unused features from the Here is a summary of a microbenchmark of unpickling with
Structural Pattern Matching time divided by procedural time (current implementation). * What I don't get is why the overhead seems to depend on the function size in CPython:
Ratio of time overheads. |
fc84eb7
to
a8ad4e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you profile to see if the speed hit is localized to something, and potentially try to mitigate it a bit? I'm a bit focused on speed here because we aren't gaining any new features... only making maintenance easier. I like this PR, and hopefully we can squeeze a bit more speed out of it.
I actually did some experiments trying to optimize it, with few gains, but the code was heading more and more towards the old implementation, with lots of Edit: The overhead is not concentrated in the later dict operations, see comment below. Also, the final version I reached is a compromise between logic clearness and really minor efficiency that can be squeezed out of this. |
I'm playing with |
New benchmark! I did achieve some extra performance squeezing. Changes:
And here are the time results (with an empty code object):
Time ratio over current implementation So this is a mean overhead of 53% over the supported CPython versions (against the previous 124%), which is diluted down to 29% for a small function (78% previously). For better maintainability and smaller risk of bugs —Python 12a was just released on Friday 🙄— I think it's a good compromise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this is still too much of a speed hit for something that will be called very regularly and is purely for maintainability. If there's no way to squeeze out more performance, I'd suggest we close this. I really appreciate the effort here, and I'd like to see you get the speed down further rather than we close it... I'd expect that if you used match
from python STL in 3.10+, then you might see some additional performance for 3.10+. Let's say that we don't want to significantly impact the speed of 3.8+ -- so if that means things like rearranging some if statements for incremental gains, then do it.
I understand. I'll see if can get any more microseconds out of this. The middle column in my last comment is a variation using the |
Ah... I meant using the python |
Edit: put the Python 3.11a as the last case as it's the least likely. Look, if we don't check the code members' types, the average overhead among supported CPython versions goes down to a minimum of 10% for an empty code object (against 53% with type checks) and of 6% for a small function (was 29% with type checks). Also it's apparently faster than the current implementation in PyPy.
Ratio between new implementation time and current implementation time. The only different in behavior will be, if the code object changes again to a format with the same number of members from previous versions but other composition, I think this version is faster in general than the native pattern matching (didn't test this time), as there are no variable/keyword bindings in the commonest case. Anyway, if in the future it is necessary to distinguish between code objects of the same size but different types, the framework to do it is there. |
This is a proposal to simplify the
_create_code()
logic using Structural Pattern Matching. Sadly, it's a Python 3.10+ only feature, but we just need a limited subset of its functionality that can be implemented with a simple class.Seems to be working across Python versions –I get the same fails for the version combinations as with the current version. However, I'm not sure the logic is 100% equivalent (mainly for
lnotab
).Note:
match
andcase
are "soft keywords" and can be used freely outside amatch
statement.Related to #488 and #495.