Callattr secondary slowpath #860

tjhance · 2015-08-24T03:56:43Z

Been extremely busy lately, but I finally found time to attempt to finish this up. (I think. We'll see how the build goes.)

Adds the ability to give a "secondary slowpath" to the rewriter, after which point it will call that slowpath when the guard fails.

First, it kills the concept of live_outs within the rewriter proper, and it always pushes them and pops them later.
Rather than rearrange args before jmping when a guard fails, the guard jmp goes to a short section which rearranges the args and then jumps to the slowpath. Doing the guard rearrangement outside the main path was kind of tricky because the (very stateful) rewriter doesn't have any support for branching. I basically just dump the locations and then write a function which uses that info to rearrange the args. I imagine we'll get a better story for branching later, so I don't see this function being around long-term.

When I measured it, the above seemed to have very little perf effect.

Then for dealing with the callattr slowpath, I basically add an interface to the rewriter that adds a new function and new args, and then does the same guard flow as before (except it has to call the slowpath rather than jmping to the end).

On the whole this is pretty hacky :\

If both a metaclass and an instance of it are defined in C extensions, the metaclass doesn't have attrs_offset != 0 or tp_dictoffset != 0 which causes the call to PyType_Ready on the instance to fail.

None are enabled in this commit, but add some of the helpers for handling exceptions if they were to be thrown.

Eventually we'll want to have a more sophisticated way of determining when to switch to capi exceptions; they do add noticeable overhead when no exception gets thrown.

- Temporarily disable code for ctypes that requires class descriptors. - Skip the attribute lookup for __new__ for the base type class and directly call a special function assigned to tp_new. - Add a #define from CPython's ./configure - Add test.

The test was passing before by using the fallback behavior, but importing ctypes successfully makes it try to use it and fails.

have the llvm jit start emitting some capi-style calls

It wouldn't rerun cmake if cmakelists.txt got updated.

Add tip on how to debug infinite recursive bugs from tp_ slots.

Small bjit improvements + fixes the build

Import the ctypes module.

Actually, for now in all cases unless __init__ is a Python function.

more small exceptions optimizations

Forgot to do this when I switched away from an StlCompatAllocator, so we were leaking memory pretty badly.

- Check for the allocation of empty tuples and just return the singleton - Try to avoid creating the kwargs dict since it might end up being empty - Let unicode-creation special case apply to all argument types - Fix type(obj) to be fast again (got superceded by a different special case) - Do fewer allocations in int()

This is a temporary fix for the fact that "-1" is currently getting parsed as "-(1)", which will cause us to call '(1).__neg__()' with the associated overhead and allocation. It should be useful even after that gets fixed though.

Reduces boxing during the import process

For some reason CPython allocates an extra "item" for generic variable-sized objects, but it looks like it doesn't do that for tuples. We had been doing that, so let's try not doing that and saving 8 bytes per tuple.

Use adjustment-post division to avoid int division overflow

Reduce allocations

Undo testsuite split

Somehow most of them did not make it in the second merge.

Fix submodules

ie a simple bump-pointer allocator that will release all of its memory when the rewrite itself gets deallocated. Not a huge help itself, since most of the mallocs have been removed from the rewriter, except for the std::functions which don't take a custom allocator (at least not in libstdc++ 4.8 [and 4.9 I think]).

Maybe a better overall approach is to not convert it out of its original flat format; we keep that original memory around anyway and I don't think it's that much faster to scan than our parsed version. But for now, optimize the current approach: - convert vector->SmallVector - call ensure since we usually know how many elements the vectors will have - use a StringMap instead of an unordered_map<std::string>

We were storing and passing them as std::unordered_map (and sometimes switching to std::vector). But the set can only contain the integers 0 through 15, so just represent it as a bitset.

This commit works by adding a SmallFunction class that behaves like std::function, but allocates its data inline rather than through a separate allocation. It probably could have also worked by taking a custom allocator and using the new RegionAllocator. It adds a bit more restrictions than std::function does (the types caught by the closure have to be more "trivial" than std::function supports), so some of this change is marking some types as trivial, or copying data into a trivial format from things that aren't (ex SmallVector).

Don't malloc-allocate rewriter lambdas

Reduce some codegen allocations

tjhance · 2015-08-24T03:57:12Z

damn, I have to fix the merge conflicts to get travis-ci to run? :(

kmod · 2015-08-25T07:32:41Z

You can set up travis-ci on your personal fork :) but the cache will be cold so it will take a long time.

I'm still a bit worried about how applicable this particular feature ("set a new slowpath function") will be. For example, in the callattr case (when we've gotten a descriptor we didn't special case), we're not going to know what our caller will want to do after calling callattrInternal -- sometimes the caller will be the top-level callattr(), but sometimes it will be something like binop(). I feel like it'd be nice to keep it more self-contained, ie saying that we are doing a new sub-rewrite. That way the outer rewrite can see the sub-rewrite as a single unit, and the inner rewrite could potentially just operate independently.

I think in theory we could almost do this right now. We could just create a sub-rewriter, tell it to write starting from the current offset, and that all the currently-used variables are live_outs. Then the outer rewrite just sees the subrewrite as a (large) function call, and the inner rewrite doesn't really have to know what's going on. And by adding nesting, we don't have to add any more branching support than the rewriter already has: the subrewrite just does guarding, and the outer rewrite doesn't need any control flow (it's all contained in the inner rewrite).

Anyway, sorry that's wordy. I'm not opposed to the approach here of managing the slowpath; I guess I just want to make sure we will end up serving the current crop of use cases.

tjhance and others added 30 commits July 14, 2015 01:57

rearrange args after the jump

b4e870b

check if all args are in place

c18b215

clean up the way uses for args at guards work

10734f5

add a secondary slowpath for callattr

f4b9906

run make format

206fd3c

I guess at some point this changed

7405db0

Get the _ctypes module to compile.

39b3238

Add dummy function implementations to be able to link _ctypes module.

bf585e2

Handle edge case with C extension metaclasses.

d751470

If both a metaclass and an instance of it are defined in C extensions, the metaclass doesn't have attrs_offset != 0 or tp_dictoffset != 0 which causes the call to PyType_Ready on the instance to fail.

Temporarily remove assertions to be able to import _ctypes.

f05bdc2

Refactor: remove extra ExceptionStyle namespace

69880e2

Have the llvm jit support receiving capi exceptions

65f9d4b

None are enabled in this commit, but add some of the helpers for handling exceptions if they were to be thrown.

Make some getattrs (in try blocks) use capi exceptions

d8bf75e

Eventually we'll want to have a more sophisticated way of determining when to switch to capi exceptions; they do add noticeable overhead when no exception gets thrown.

Add rewrites for capi-style getattrs

2b4fd9c

Make import ctypes work.

719d4e9

- Temporarily disable code for ctypes that requires class descriptors. - Skip the attribute lookup for __new__ for the base type class and directly call a special function assigned to tp_new. - Add a #define from CPython's ./configure - Add test.

Make test fallback to not using CTypes even though we can import it.

ce091ae

The test was passing before by using the fallback behavior, but importing ctypes successfully makes it try to use it and fails.

Merge pull request pyston#786 from kmod/exceptions3

7ece575

have the llvm jit start emitting some capi-style calls

PGO fix

988e16e

It wouldn't rerun cmake if cmakelists.txt got updated.

some changing about exception handling and remove blank line

4fc2444

Inherit Py_TPFLAGS_CHECKTYPES.

15655a6

More test diagnostics if pyston crashes early

81825c8

remove the the methods that not belong instance and add __index__

8a87d9d

adjust old style class test, add __index__ test

a4515f9

fix build and support interpreter frames inside getCurrentPythonLine()

407a627

bjit: directly inline recordType() function

ceb3e44

Cache SourceInfo::getScopeInfo

dd7aa0d

Add tip on how to debug infinite recursive bugs from tp_ slots.

9000217

Merge pull request pyston#789 from rudi-c/docs_slots

bdf0764

Add tip on how to debug infinite recursive bugs from tp_ slots.

Merge pull request pyston#788 from undingen/bjit_opt2

2f60cce

Small bjit improvements + fixes the build

Merge pull request pyston#767 from rudi-c/ctypes

b04977a

Import the ctypes module.

kmod and others added 22 commits August 20, 2015 21:42

Call tp_init in typeCall for extension classes

3a8884e

Actually, for now in all cases unless __init__ is a Python function.

Add some more asserts for threading_local.py

dbada80

Merge pull request pyston#852 from kmod/perf4

bd2ee53

more small exceptions optimizations

Fix: free dict+set internal memory

978974b

Forgot to do this when I switched away from an StlCompatAllocator, so we were leaking memory pretty badly.

Make a couple things not gc-allocated

d7934a4

Add unaryop to our codegen type system

2e409bd

This is a temporary fix for the fact that "-1" is currently getting parsed as "-(1)", which will cause us to call '(1).__neg__()' with the associated overhead and allocation. It should be useful even after that gets fixed though.

Convert parts of the import system to use BoxedStrings

d50f760

Reduces boxing during the import process

Don't allocate an extra tuple element

2b9fd97

For some reason CPython allocates an extra "item" for generic variable-sized objects, but it looks like it doesn't do that for tuples. We had been doing that, so let's try not doing that and saving 8 bytes per tuple.

Merge pull request pyston#853 from Daetalus/test_int

0e5f167

Use adjustment-post division to avoid int division overflow

Merge pull request pyston#855 from kmod/perf5

432fcb5

Reduce allocations

Merge pull request pyston#848 from kmod/undo-testsuite

10dac49

Undo testsuite split

Fix submodules

6b12ddf

Somehow most of them did not make it in the second merge.

Merge pull request pyston#858 from kmod/submodules

08fe381

Fix submodules

Store live_outs as a bitset

9407f64

We were storing and passing them as std::unordered_map (and sometimes switching to std::vector). But the set can only contain the integers 0 through 15, so just represent it as a bitset.

Add even more threading asserts

ceed353

Merge pull request pyston#856 from kmod/perf4

d3fb236

Don't malloc-allocate rewriter lambdas

Merge pull request pyston#857 from kmod/perf5

0fc53b7

Reduce some codegen allocations

remove all live_outs

3f60b02

tjhance force-pushed the callattr-secondary-slowpath branch 3 times, most recently from 41b50dd to 206fd3c Compare August 25, 2015 07:02

kmod added the wip label Aug 29, 2015

kmod force-pushed the master branch 2 times, most recently from 352fd89 to 6488a3e Compare October 28, 2020 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callattr secondary slowpath #860

Callattr secondary slowpath #860

tjhance commented Aug 24, 2015

tjhance commented Aug 24, 2015

kmod commented Aug 25, 2015

Callattr secondary slowpath #860

Are you sure you want to change the base?

Callattr secondary slowpath #860

Conversation

tjhance commented Aug 24, 2015

tjhance commented Aug 24, 2015

kmod commented Aug 25, 2015