RFC: Try to generate reusable jlcall wrapper #11439

yuyichao · 2015-05-26T12:22:04Z

Update2:

The goal of this PR is to lower the overhead of creating jlcall wrappers by generating a generic version of the wrapper that looks up the function pointer in the function object passed in and cache them in a global table. The wrapper function can be later used on other functions with the same signature.

This PR should be "feature complete" now. A few things that I'm not so sure about is listed here:

The table that caches the functions is a std::map.
I assume the return type (jl_value_t) is cached and never freed, it that always true? (and if not, how do I tell if it is cached and only cache the wrapper function in that case)
Reloading the functions from the cache works. However, I don't know how to teach llvm about the global function that I want to inject. Currently I'm using a wrapper function to fallback on looking up in a map I set up in order to test if it works. I suppose there must be better ways to do it.

As for performance.

The number of wrappers goes down from 2400+ to 500+. sys.so size goes from 3.4M to 3.3M.
This should also allow more aggressive c specialization as in Generate c signature when possible #11306 without overhead of generating the wrapper.
The wrapper will probably be slower. I hope that doesn't matter and I don't know how to benchmark it because AFAIK, the jlcall wrapper is only used in slow path anyway.
There's probably some other optimizations that can be done (use the original jl_lambda_info_t instead of a new one) but I think I would like to get feedback on the overall design first.

Original messages and other updates see first comment below.

yuyichao · 2015-05-26T12:22:27Z

Update1 (out-of-date):

The cache is per-session and is not restored from the sysimg. I know that I can write a global variable in the sysimg to store a list of pointers but I'm not sure how to store the type info as well.

Original message (with some useless stuff (backtrace, old update) deleted):

~~Don't merge! The sysimg doesn't initialize correctly (yet).~~

Following the discussion yesterday #11306 and a 3-year-old TODO from @JeffBezanson, this is my attempt toward reducing the overhead of generating jlcall wrappers.

I think a good way to go here might be to allow signatures with 0, 1, 2, or 3 jl_value_t* arguments as part of the jlcall convention. In other words, we would need a tag along with every jl_fptr_t to tell us what arguments it takes. Then we wouldn't need so many jlcall wrappers.

IMHO, the reusable jlcall wrapper can serve as a tag and it can even work for arbitrary number of arguments.

In it's current state, I'm just trying to make the wrapper function pull the specialized function from it's currently unused first argument (literally implementing the todo) so each function still gets it's own wrapper and there's no saving yet.

This seems to work for sysimg generation. However, when I try to run the final binary, it fails because the new specFunctionPtr is not initialized after sysimg is loaded as clearly shown by the backtrace below.

What I want to ask specifically are (although other comments are welcome too),

Is this the right way to do it? IMHO, something like this should still be useful even after a redesign of the current system Towards typed lambdas #10269 (maybe not how exactly the jlcall wrapper are generated, but a globaly cached generic wrapper)
How to make the new field initialized when the sysimg is loaded?
How to implement caching of the wrapper. Doing this at runtime should be easy so I'm more interested in how to cache it in the sysimg (or what existing mechanism can I use). Probably related (or replaces) the previous point.

yuyichao · 2015-05-26T13:59:18Z

As far as I understand, the normal fptr works by assigning a function that does codegen and replaces the pointer after that? Doesn't this mean that the function will be regenerated even if it is already in the sysimg? Should I implement something similar?

~~Edit: I guess for this case it is more complicated since the signature is not known....~~

yuyichao · 2015-05-27T02:48:41Z

Not sure about the CI (i.e. older llvm version) yet but it passes the test for me locally with llvm 3.6.1.

What I don't really understand is why it is fine to leave specFunctionPtr as NULL in emit_function. Is sth like jl_generate_fptr guaranteed to be called before a wrapper can be called? (Edit: because of jl_trampoline?)

vtjnash · 2015-05-27T13:44:22Z

What I don't really understand is why it is fine to leave specFunctionPtr as NULL in emit_function. Is sth like jl_generate_fptr guaranteed to be called before a wrapper can be called? (Edit: because of jl_trampoline?)

no, it isn't guaranteed. on llvm3.6, i think it happens almost always however. that part of the code definitely needs some cleanup since it is now an odd mix of a large number of llvm versions.

Update: I think I've figured out how to restore the function pointer now (although it is still not very clear for me what exactly is jl_fptr_to_llvm doing)

it takes a raw function pointer and creates an llvm::Function object around it for linking into the JIT

yuyichao · 2015-05-27T14:40:20Z

@vtjnash

no, it isn't guaranteed. on llvm3.6, i think it happens almost always however.

I guess the version on the CI is also not so unhappy about it either....

In this case, what is the earliest point I can get that function pointer? I thought the function should be in the memory after verifyFunction so I tried to get it's address at the end of to_function (which is also the latest point before the code path blows up) but I still get NULL.

Or is it better to define a getter function to extract the function pointer from lambda_info and generate it if necessary? This would introduce a overhead in each jlcall wrapper although maybe the fast path will be fast enough.

that part of the code definitely needs some cleanup since it is now an odd mix of a large number of llvm versions.

OT: Is there a upgrade guide for llvm (or any other document on the difference between versions)?

it takes a raw function pointer and creates an llvm::Function object around it for linking into the JIT

What I'm confused about is that why is it not necessary for imaging_mode. Are functionObject and specFunctionObject never used when building a sysimg?

yuyichao · 2015-05-28T14:21:28Z

PR text updated. (see above)

yuyichao · 2015-05-28T14:40:59Z

Another impact of this is that the function object passed is has to be kept alive now.
@carnaval I remember you had some comment about that. Is there anything that needs to be changed? (e.g. root mfunc in jl_apply_generic/jl_gf_invoke before passing to jl_apply?)

carnaval · 2015-05-28T15:49:22Z

@JeffBezanson will correct me if I'm wrong but since the argument is unused in almost all cases, and jl_apply_generic is a very hot function, I think it makes sense to break the rooting convention here and root inside the callee at the very begining if you do actually need this function. For example I think it's safe to move the jl_gc_pop of jl_apply_generic before the return (I introduced this one I think...) to hopefully get a tailcall here.

However, this is only performance bikesched without benchmark so just root it for testing and we'll figure out something later ;)

yuyichao · 2015-05-28T16:31:08Z

@carnaval Actually you remind me that I only need root for the slow path (which never triggers). (The fast path only has pointer dereference)

Generate generic wrappers

yuyichao · 2015-05-28T21:07:27Z

Should be "feature complete" now. Few questions remain.

The table that caches the functions is a std::map.
I assume the return type (jl_value_t) is cached and never freed, it that always true? (and if not, how do I tell if it is cached and only cache the wrapper function in that case)
Reloading the functions pointers and type infos from the cache works. However, I don't know how to teach llvm about the global function that I want to inject. Currently I'm using a wrapper function to fallback on looking up in a map I set up in order to test if it works. I suppose there must be better ways to do it.

vtjnash · 2016-02-09T02:09:54Z

fwiw, we may still want something like this general-purpose code, but the limited case (jlcall with 0,1,2 jl_value_t args) can be implemented now by extending the jl_lambda_info_t->jlcall_api field to enumerate additional calling conventions

tkelman · 2016-12-22T17:56:59Z

"this should not be on the milestone" - @yuyichao

yuyichao force-pushed the jlcall-wrapper branch 2 times, most recently from 6659c5f to f2bbe99 Compare May 27, 2015 02:46

yuyichao force-pushed the jlcall-wrapper branch 3 times, most recently from 80f3bbe to fd39aa5 Compare May 27, 2015 12:16

yuyichao referenced this pull request May 27, 2015

Generational behavior for the garbage collector.

7c8acce

yuyichao force-pushed the jlcall-wrapper branch 4 times, most recently from c6df553 to 93776b0 Compare May 28, 2015 14:00

yuyichao force-pushed the jlcall-wrapper branch 2 times, most recently from 7ee97ff to 6283739 Compare May 28, 2015 20:45

Try to generate reusable jlcall wrapper

26f6d09

Generate generic wrappers

yuyichao force-pushed the jlcall-wrapper branch from 6283739 to 26f6d09 Compare May 28, 2015 20:59

yuyichao mentioned this pull request May 30, 2015

Inline callsite of invoke when possible (invoke improvement No. 1) #10964

Closed

yuyichao added the compiler:codegen Generation of LLVM IR and native code label Jul 2, 2015

StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016

tkelman removed this from the 0.6.0 milestone Dec 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Try to generate reusable jlcall wrapper #11439

RFC: Try to generate reusable jlcall wrapper #11439

yuyichao commented May 26, 2015

yuyichao commented May 26, 2015

yuyichao commented May 26, 2015

yuyichao commented May 27, 2015

vtjnash commented May 27, 2015

yuyichao commented May 27, 2015

yuyichao commented May 28, 2015

yuyichao commented May 28, 2015

carnaval commented May 28, 2015

yuyichao commented May 28, 2015

yuyichao commented May 28, 2015

vtjnash commented Feb 9, 2016

tkelman commented Dec 22, 2016

RFC: Try to generate reusable jlcall wrapper #11439

Are you sure you want to change the base?

RFC: Try to generate reusable jlcall wrapper #11439

Conversation

yuyichao commented May 26, 2015

yuyichao commented May 26, 2015

yuyichao commented May 26, 2015

yuyichao commented May 27, 2015

vtjnash commented May 27, 2015

yuyichao commented May 27, 2015

yuyichao commented May 28, 2015

yuyichao commented May 28, 2015

carnaval commented May 28, 2015

yuyichao commented May 28, 2015

yuyichao commented May 28, 2015

vtjnash commented Feb 9, 2016

tkelman commented Dec 22, 2016