Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Try to generate reusable jlcall wrapper #11439

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yuyichao
Copy link
Contributor

Update2:

The goal of this PR is to lower the overhead of creating jlcall wrappers by generating a generic version of the wrapper that looks up the function pointer in the function object passed in and cache them in a global table. The wrapper function can be later used on other functions with the same signature.

This PR should be "feature complete" now. A few things that I'm not so sure about is listed here:

  1. The table that caches the functions is a std::map.
  2. I assume the return type (jl_value_t) is cached and never freed, it that always true? (and if not, how do I tell if it is cached and only cache the wrapper function in that case)
  3. Reloading the functions from the cache works. However, I don't know how to teach llvm about the global function that I want to inject. Currently I'm using a wrapper function to fallback on looking up in a map I set up in order to test if it works. I suppose there must be better ways to do it.

As for performance.

  1. The number of wrappers goes down from 2400+ to 500+. sys.so size goes from 3.4M to 3.3M.
  2. This should also allow more aggressive c specialization as in Generate c signature when possible #11306 without overhead of generating the wrapper.
  3. The wrapper will probably be slower. I hope that doesn't matter and I don't know how to benchmark it because AFAIK, the jlcall wrapper is only used in slow path anyway.
  4. There's probably some other optimizations that can be done (use the original jl_lambda_info_t instead of a new one) but I think I would like to get feedback on the overall design first.

Original messages and other updates see first comment below.

@yuyichao
Copy link
Contributor Author

Update1 (out-of-date):

  1. The cache is per-session and is not restored from the sysimg. I know that I can write a global variable in the sysimg to store a list of pointers but I'm not sure how to store the type info as well.

Original message (with some useless stuff (backtrace, old update) deleted):

Don't merge! The sysimg doesn't initialize correctly (yet).

Following the discussion yesterday #11306 and a 3-year-old TODO from @JeffBezanson, this is my attempt toward reducing the overhead of generating jlcall wrappers.

I think a good way to go here might be to allow signatures with 0, 1, 2, or 3 jl_value_t* arguments as part of the jlcall convention. In other words, we would need a tag along with every jl_fptr_t to tell us what arguments it takes. Then we wouldn't need so many jlcall wrappers.

IMHO, the reusable jlcall wrapper can serve as a tag and it can even work for arbitrary number of arguments.

In it's current state, I'm just trying to make the wrapper function pull the specialized function from it's currently unused first argument (literally implementing the todo) so each function still gets it's own wrapper and there's no saving yet.

This seems to work for sysimg generation. However, when I try to run the final binary, it fails because the new specFunctionPtr is not initialized after sysimg is loaded as clearly shown by the backtrace below.

What I want to ask specifically are (although other comments are welcome too),

  1. Is this the right way to do it? IMHO, something like this should still be useful even after a redesign of the current system Towards typed lambdas #10269 (maybe not how exactly the jlcall wrapper are generated, but a globaly cached generic wrapper)
  2. How to make the new field initialized when the sysimg is loaded?
  3. How to implement caching of the wrapper. Doing this at runtime should be easy so I'm more interested in how to cache it in the sysimg (or what existing mechanism can I use). Probably related (or replaces) the previous point.

@yuyichao
Copy link
Contributor Author

As far as I understand, the normal fptr works by assigning a function that does codegen and replaces the pointer after that? Doesn't this mean that the function will be regenerated even if it is already in the sysimg? Should I implement something similar?

Edit: I guess for this case it is more complicated since the signature is not known....

@yuyichao yuyichao force-pushed the jlcall-wrapper branch 2 times, most recently from 6659c5f to f2bbe99 Compare May 27, 2015 02:46
@yuyichao
Copy link
Contributor Author

Not sure about the CI (i.e. older llvm version) yet but it passes the test for me locally with llvm 3.6.1.

What I don't really understand is why it is fine to leave specFunctionPtr as NULL in emit_function. Is sth like jl_generate_fptr guaranteed to be called before a wrapper can be called? (Edit: because of jl_trampoline?)

@yuyichao yuyichao force-pushed the jlcall-wrapper branch 3 times, most recently from 80f3bbe to fd39aa5 Compare May 27, 2015 12:16
@vtjnash
Copy link
Sponsor Member

vtjnash commented May 27, 2015

What I don't really understand is why it is fine to leave specFunctionPtr as NULL in emit_function. Is sth like jl_generate_fptr guaranteed to be called before a wrapper can be called? (Edit: because of jl_trampoline?)

no, it isn't guaranteed. on llvm3.6, i think it happens almost always however. that part of the code definitely needs some cleanup since it is now an odd mix of a large number of llvm versions.

Update: I think I've figured out how to restore the function pointer now (although it is still not very clear for me what exactly is jl_fptr_to_llvm doing)

it takes a raw function pointer and creates an llvm::Function object around it for linking into the JIT

@yuyichao
Copy link
Contributor Author

@vtjnash

no, it isn't guaranteed. on llvm3.6, i think it happens almost always however.

I guess the version on the CI is also not so unhappy about it either....

In this case, what is the earliest point I can get that function pointer? I thought the function should be in the memory after verifyFunction so I tried to get it's address at the end of to_function (which is also the latest point before the code path blows up) but I still get NULL.

Or is it better to define a getter function to extract the function pointer from lambda_info and generate it if necessary? This would introduce a overhead in each jlcall wrapper although maybe the fast path will be fast enough.

that part of the code definitely needs some cleanup since it is now an odd mix of a large number of llvm versions.

OT: Is there a upgrade guide for llvm (or any other document on the difference between versions)?

it takes a raw function pointer and creates an llvm::Function object around it for linking into the JIT

What I'm confused about is that why is it not necessary for imaging_mode. Are functionObject and specFunctionObject never used when building a sysimg?

@yuyichao
Copy link
Contributor Author

PR text updated. (see above)

@yuyichao
Copy link
Contributor Author

Another impact of this is that the function object passed is has to be kept alive now.
@carnaval I remember you had some comment about that. Is there anything that needs to be changed? (e.g. root mfunc in jl_apply_generic/jl_gf_invoke before passing to jl_apply?)

@carnaval
Copy link
Contributor

@JeffBezanson will correct me if I'm wrong but since the argument is unused in almost all cases, and jl_apply_generic is a very hot function, I think it makes sense to break the rooting convention here and root inside the callee at the very begining if you do actually need this function. For example I think it's safe to move the jl_gc_pop of jl_apply_generic before the return (I introduced this one I think...) to hopefully get a tailcall here.

However, this is only performance bikesched without benchmark so just root it for testing and we'll figure out something later ;)

@yuyichao
Copy link
Contributor Author

@carnaval Actually you remind me that I only need root for the slow path (which never triggers). (The fast path only has pointer dereference)

@yuyichao yuyichao force-pushed the jlcall-wrapper branch 2 times, most recently from 7ee97ff to 6283739 Compare May 28, 2015 20:45
Generate generic wrappers
@yuyichao
Copy link
Contributor Author

Should be "feature complete" now. Few questions remain.

  1. The table that caches the functions is a std::map.
  2. I assume the return type (jl_value_t) is cached and never freed, it that always true? (and if not, how do I tell if it is cached and only cache the wrapper function in that case)
  3. Reloading the functions pointers and type infos from the cache works. However, I don't know how to teach llvm about the global function that I want to inject. Currently I'm using a wrapper function to fallback on looking up in a map I set up in order to test if it works. I suppose there must be better ways to do it.

@yuyichao yuyichao added the compiler:codegen Generation of LLVM IR and native code label Jul 2, 2015
@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 9, 2016

fwiw, we may still want something like this general-purpose code, but the limited case (jlcall with 0,1,2 jl_value_t args) can be implemented now by extending the jl_lambda_info_t->jlcall_api field to enumerate additional calling conventions

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@tkelman tkelman removed this from the 0.6.0 milestone Dec 22, 2016
@tkelman
Copy link
Contributor

tkelman commented Dec 22, 2016

"this should not be on the milestone" - @yuyichao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants