Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callattr secondary slowpath #860

Open
wants to merge 3,051 commits into
base: master
Choose a base branch
from

Conversation

tjhance
Copy link
Contributor

@tjhance tjhance commented Aug 24, 2015

Been extremely busy lately, but I finally found time to attempt to finish this up. (I think. We'll see how the build goes.)

Adds the ability to give a "secondary slowpath" to the rewriter, after which point it will call that slowpath when the guard fails.

  • First, it kills the concept of live_outs within the rewriter proper, and it always pushes them and pops them later.
  • Rather than rearrange args before jmping when a guard fails, the guard jmp goes to a short section which rearranges the args and then jumps to the slowpath. Doing the guard rearrangement outside the main path was kind of tricky because the (very stateful) rewriter doesn't have any support for branching. I basically just dump the locations and then write a function which uses that info to rearrange the args. I imagine we'll get a better story for branching later, so I don't see this function being around long-term.

When I measured it, the above seemed to have very little perf effect.

Then for dealing with the callattr slowpath, I basically add an interface to the rewriter that adds a new function and new args, and then does the same guard flow as before (except it has to call the slowpath rather than jmping to the end).

On the whole this is pretty hacky :\

tjhance and others added 30 commits July 14, 2015 01:57
If both a metaclass and an instance of it are defined in C extensions,
the metaclass doesn't have attrs_offset != 0 or tp_dictoffset != 0
which causes the call to PyType_Ready on the instance to fail.
None are enabled in this commit, but add some of the
helpers for handling exceptions if they were to be thrown.
Eventually we'll want to have a more sophisticated way of
determining when to switch to capi exceptions; they do add
noticeable overhead when no exception gets thrown.
- Temporarily disable code for ctypes that requires class descriptors.
- Skip the attribute lookup for __new__ for the base type class and
  directly call a special function assigned to tp_new.
- Add a #define from CPython's ./configure
- Add test.
The test was passing before by using the fallback behavior, but
importing ctypes successfully makes it try to use it and fails.
have the llvm jit start emitting some capi-style calls
It wouldn't rerun cmake if cmakelists.txt got updated.
Add tip on how to debug infinite recursive bugs from tp_ slots.
Small bjit improvements + fixes the build
kmod and others added 22 commits August 20, 2015 21:42
Actually, for now in all cases unless __init__ is a Python function.
more small exceptions optimizations
Forgot to do this when I switched away from an StlCompatAllocator,
so we were leaking memory pretty badly.
- Check for the allocation of empty tuples and just return the singleton
- Try to avoid creating the kwargs dict since it might end up being empty
- Let unicode-creation special case apply to all argument types
- Fix type(obj) to be fast again (got superceded by a different special case)
- Do fewer allocations in int()
This is a temporary fix for the fact that "-1" is currently getting
parsed as "-(1)", which will cause us to call '(1).__neg__()' with
the associated overhead and allocation.

It should be useful even after that gets fixed though.
Reduces boxing during the import process
For some reason CPython allocates an extra "item" for generic
variable-sized objects, but it looks like it doesn't do that for
tuples.  We had been doing that, so let's try not doing that
and saving 8 bytes per tuple.
Use adjustment-post division to avoid int division overflow
Somehow most of them did not make it in the second merge.
ie a simple bump-pointer allocator that will release all
of its memory when the rewrite itself gets deallocated.

Not a huge help itself, since most of the mallocs have been removed
from the rewriter, except for the std::functions which don't take
a custom allocator (at least not in libstdc++ 4.8 [and 4.9 I think]).
Maybe a better overall approach is to not convert it out of its
original flat format; we keep that original memory around anyway
and I don't think it's that much faster to scan than our parsed
version.

But for now, optimize the current approach:
- convert vector->SmallVector
- call ensure since we usually know how many elements the vectors will have
- use a StringMap instead of an unordered_map<std::string>
We were storing and passing them as std::unordered_map (and sometimes
switching to std::vector).  But the set can only contain the integers
0 through 15, so just represent it as a bitset.
This commit works by adding a SmallFunction class that behaves
like std::function, but allocates its data inline rather than through
a separate allocation.  It probably could have also worked by taking
a custom allocator and using the new RegionAllocator.

It adds a bit more restrictions than std::function does (the types
caught by the closure have to be more "trivial" than std::function
supports), so some of this change is marking some types as trivial,
or copying data into a trivial format from things that aren't (ex SmallVector).
Don't malloc-allocate rewriter lambdas
Reduce some codegen allocations
@tjhance
Copy link
Contributor Author

tjhance commented Aug 24, 2015

damn, I have to fix the merge conflicts to get travis-ci to run? :(

@tjhance tjhance force-pushed the callattr-secondary-slowpath branch 3 times, most recently from 41b50dd to 206fd3c Compare August 25, 2015 07:02
@kmod
Copy link
Collaborator

kmod commented Aug 25, 2015

You can set up travis-ci on your personal fork :) but the cache will be cold so it will take a long time.

I'm still a bit worried about how applicable this particular feature ("set a new slowpath function") will be. For example, in the callattr case (when we've gotten a descriptor we didn't special case), we're not going to know what our caller will want to do after calling callattrInternal -- sometimes the caller will be the top-level callattr(), but sometimes it will be something like binop(). I feel like it'd be nice to keep it more self-contained, ie saying that we are doing a new sub-rewrite. That way the outer rewrite can see the sub-rewrite as a single unit, and the inner rewrite could potentially just operate independently.

I think in theory we could almost do this right now. We could just create a sub-rewriter, tell it to write starting from the current offset, and that all the currently-used variables are live_outs. Then the outer rewrite just sees the subrewrite as a (large) function call, and the inner rewrite doesn't really have to know what's going on. And by adding nesting, we don't have to add any more branching support than the rewriter already has: the subrewrite just does guarding, and the outer rewrite doesn't need any control flow (it's all contained in the inner rewrite).

Anyway, sorry that's wordy. I'm not opposed to the approach here of managing the slowpath; I guess I just want to make sure we will end up serving the current crop of use cases.

@kmod kmod added the wip label Aug 29, 2015
@kmod kmod force-pushed the master branch 2 times, most recently from 352fd89 to 6488a3e Compare October 28, 2020 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants