-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callattr secondary slowpath #860
base: master
Are you sure you want to change the base?
Conversation
If both a metaclass and an instance of it are defined in C extensions, the metaclass doesn't have attrs_offset != 0 or tp_dictoffset != 0 which causes the call to PyType_Ready on the instance to fail.
None are enabled in this commit, but add some of the helpers for handling exceptions if they were to be thrown.
Eventually we'll want to have a more sophisticated way of determining when to switch to capi exceptions; they do add noticeable overhead when no exception gets thrown.
- Temporarily disable code for ctypes that requires class descriptors. - Skip the attribute lookup for __new__ for the base type class and directly call a special function assigned to tp_new. - Add a #define from CPython's ./configure - Add test.
The test was passing before by using the fallback behavior, but importing ctypes successfully makes it try to use it and fails.
have the llvm jit start emitting some capi-style calls
Add tip on how to debug infinite recursive bugs from tp_ slots.
Small bjit improvements + fixes the build
Import the ctypes module.
Actually, for now in all cases unless __init__ is a Python function.
more small exceptions optimizations
Forgot to do this when I switched away from an StlCompatAllocator, so we were leaking memory pretty badly.
- Check for the allocation of empty tuples and just return the singleton - Try to avoid creating the kwargs dict since it might end up being empty - Let unicode-creation special case apply to all argument types - Fix type(obj) to be fast again (got superceded by a different special case) - Do fewer allocations in int()
This is a temporary fix for the fact that "-1" is currently getting parsed as "-(1)", which will cause us to call '(1).__neg__()' with the associated overhead and allocation. It should be useful even after that gets fixed though.
Reduces boxing during the import process
For some reason CPython allocates an extra "item" for generic variable-sized objects, but it looks like it doesn't do that for tuples. We had been doing that, so let's try not doing that and saving 8 bytes per tuple.
Use adjustment-post division to avoid int division overflow
Reduce allocations
Undo testsuite split
Somehow most of them did not make it in the second merge.
Fix submodules
ie a simple bump-pointer allocator that will release all of its memory when the rewrite itself gets deallocated. Not a huge help itself, since most of the mallocs have been removed from the rewriter, except for the std::functions which don't take a custom allocator (at least not in libstdc++ 4.8 [and 4.9 I think]).
Maybe a better overall approach is to not convert it out of its original flat format; we keep that original memory around anyway and I don't think it's that much faster to scan than our parsed version. But for now, optimize the current approach: - convert vector->SmallVector - call ensure since we usually know how many elements the vectors will have - use a StringMap instead of an unordered_map<std::string>
We were storing and passing them as std::unordered_map (and sometimes switching to std::vector). But the set can only contain the integers 0 through 15, so just represent it as a bitset.
This commit works by adding a SmallFunction class that behaves like std::function, but allocates its data inline rather than through a separate allocation. It probably could have also worked by taking a custom allocator and using the new RegionAllocator. It adds a bit more restrictions than std::function does (the types caught by the closure have to be more "trivial" than std::function supports), so some of this change is marking some types as trivial, or copying data into a trivial format from things that aren't (ex SmallVector).
Don't malloc-allocate rewriter lambdas
Reduce some codegen allocations
damn, I have to fix the merge conflicts to get travis-ci to run? :( |
41b50dd
to
206fd3c
Compare
You can set up travis-ci on your personal fork :) but the cache will be cold so it will take a long time. I'm still a bit worried about how applicable this particular feature ("set a new slowpath function") will be. For example, in the callattr case (when we've gotten a descriptor we didn't special case), we're not going to know what our caller will want to do after calling callattrInternal -- sometimes the caller will be the top-level callattr(), but sometimes it will be something like binop(). I feel like it'd be nice to keep it more self-contained, ie saying that we are doing a new sub-rewrite. That way the outer rewrite can see the sub-rewrite as a single unit, and the inner rewrite could potentially just operate independently. I think in theory we could almost do this right now. We could just create a sub-rewriter, tell it to write starting from the current offset, and that all the currently-used variables are live_outs. Then the outer rewrite just sees the subrewrite as a (large) function call, and the inner rewrite doesn't really have to know what's going on. And by adding nesting, we don't have to add any more branching support than the rewriter already has: the subrewrite just does guarding, and the outer rewrite doesn't need any control flow (it's all contained in the inner rewrite). Anyway, sorry that's wordy. I'm not opposed to the approach here of managing the slowpath; I guess I just want to make sure we will end up serving the current crop of use cases. |
352fd89
to
6488a3e
Compare
Been extremely busy lately, but I finally found time to attempt to finish this up. (I think. We'll see how the build goes.)
Adds the ability to give a "secondary slowpath" to the rewriter, after which point it will call that slowpath when the guard fails.
When I measured it, the above seemed to have very little perf effect.
Then for dealing with the callattr slowpath, I basically add an interface to the rewriter that adds a new function and new args, and then does the same guard flow as before (except it has to call the slowpath rather than jmping to the end).
On the whole this is pretty hacky :\