-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][flang] Trampolines for internal procedures. #66157
Conversation
I would like to start a discussion about the ways for modifying the current trampolines approach for Fortran internal procedures used as actual arguments or pointer targets. As Peter Klausler noted before the current approach implies security risks due to writeable and executable stack requirement. We may need to agree on a new scheme that does not have this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good write-up. One suggestion: trampolines could be allocated in a ring buffer, and instead of being immediately freed for reuse at the end of a host procedure, they could be redirected to point to a runtime library error routine that complains about a dead internal procedure being called.
Good point! I will add it to an enhancements section later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together, I like the flexibility of this plan.
I'm not opposed to either of those solutions. For AArch64, currently LLVM Trampolines won't work, they are unimplemented, see bug referenced below. Clang uses some other method to make Lambda's and similar nested function calls work. I don't think it's a huge amount of work to fix the trampoilines for AArch64 - it just needs a good enough use-case and someone sufficiently motivated to actually implement it. :) |
Thanks @vzakhari for proposing this solution for pointers to internal procedures. Form a quick read,
|
I'm looking into how clang does this, but the example as given produces some really complicated indirection (to do with how |
Lambdas are function objects; their |
Hello @madvenka786 and @atgreen, I wonder if you have something to share on this topic. |
Here's a good description of libffi's static trampolines:
https://blog.lazym.io/2021/07/29/Cast-a-Closure-to-a-Function-Pointer-How-libffi-closure-works/
I'm still open to extracting some of that code into a separate library
provided by libffi if that would be useful. (The libstatictramp idea)
…On Tue, Sept 19, 2023, 5:20 p.m. Slava Zakharin ***@***.***> wrote:
Hello @madvenka786 <https://github.com/madvenka786> and @atgreen
<https://github.com/atgreen>, I wonder if you have something to share on
this topic.
—
Reply to this email directly, view it on GitHub
<#66157 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAV7CNE4COBK5PKFF2LQHDX3IEA5ANCNFSM6AAAAAA4VRXCGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hello, I'll comment about the implementation of Fujitsu compiler, which is referred in the last technical call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The libffi proposal looks good to me. I guess we can workout subsequently on how to integrate this into Flang.
Let us know if you need any help. Approving to signal there are not concerns from my side. Please feel free to submit or continue to enhance the proposal here.
end subroutine host | ||
``` | ||
|
||
Procedure code generated for subprogram `inernal()` must have access to the scope of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: inernal -> internal
### Option #2: LLVM/compiler-rt support | ||
|
||
It may be beneficial for projects besides Flang to use the alternative trampolines | ||
implementation, so does it sound reasonable to actually put the support | ||
into LLVM/compiler-rt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable to me. You might want to sample interest in discourse. If speed of this feature being available is important you can consider implementing in Flang Runtime and then moving it to compiler-rt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a language that needs uplevel references and has the requirement that procedure pointers/arguments be simple code addresses would benefit from a general facility.
used via host association inside `internal()`. We will call this extra argument | ||
a static chain link. | ||
|
||
Fortran standard 2008 allowed using internal procedures as actual arguments or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: or -> for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Fortran's "static chains" only ever have one link in them.
|
||
```fortran | ||
subroutine host() | ||
integer :: local = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether this is the best example, since local
becomes a global since it is saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch! Will change the code.
|
||
Procedure code generated for subprogram `inernal()` must have access to the scope of | ||
its host procedure, e.g. to access `local` variable. Flang achieves this by passing | ||
an extra argument to `internal()` that is a tuple of references to all variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the tuple include procedure pointers as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. In particular, it is possible inside an internal procedure to use a procedure pointer via a host association, where the procedure pointer resolves to an address of another (or the same) internal procedure. For example:
module other
abstract interface
function callback()
integer :: callback
end function callback
end interface
contains
subroutine foo(fptr)
procedure(callback), pointer :: fptr
! `fptr` is pointing to `callee1`, which needs the static chain link.
print *, fptr()
end subroutine foo
end module other
subroutine host(local)
use other
integer :: local
procedure(callback), pointer :: fptr1
procedure(callback), pointer :: fptr2
fptr1 => callee1
fptr2 => callee2
call foo(fptr1)
return
contains
function callee1()
integer :: callee1
callee1 = fptr2()
end function callee1
function callee2()
integer :: callee2
callee2 = local
end function callee2
end subroutine host
program main
call host(10)
end program main
* The trampoline area initialization implies writing target specific binary code | ||
for the trampolines. Are there utils that the runtime implementation | ||
can reuse? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is the advantage of the llvm intrinsic
approach. Where the lowering of the intrinsic can be handled by the backend for the target. The runtime approach will need this to be handled for all targets in the runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is the main advantage of reusing existing LLVM support for the trampolines.
At the same time, it is not uncommon to have custom trampolines implementations for different purposes. For example, there are Orc
trampolines in https://github.com/llvm/llvm-project/blob/main/llvm/lib/ExecutionEngine/Orc/OrcABISupport.cpp that (I think) have a requirement to preserve the call context completely, so they save/restore all the registers.
The libffi
implementation also goes an extra half-mile to save original values of the scratch registers used in the trampoline code (see Scratch register
section in https://sourceware.org/pipermail/libffi-discuss/2021/002579.html).
I think if we add a reasonable requirement that the trampoline code used for Fortran pointers to internal procedures must be as fast as possible, then we may be better off sticking to our own implementation of the trampolines. I think we should always be able to find scratch registers to clobber in the trampoline code without saving/restoring them. It does imply that we will have to be able to emit the trampoline code for all targets that Flang needs to support, and reproduce correct handling of all the caveats for different targets (e.g. Intel CET endbranch at the beginning of the trampoline code, PowerPC __clear_cache
for the trampoline area, etc.). It should not be a big deal, though.
So, currently, I am more inclined to having Fortran runtime implementation for trampolines because it gives us predictable performance behavior regardless of the requirements of external components like libffi
.
|
||
* Each trampoline invocation implies two indirect accesses with this approach. | ||
|
||
### Option #1: Fortran runtime support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a small description here about this option. i.e this a runtime based implementation for supporting trampolines and will not use llvm trampoline intrinsics (atleast in the default flow).
Could you add the equivalent for the following?
In MLIR LLVM dialect the replacement looks like this:
llvm.call @llvm.init.trampoline(%8, %9, %7) : (!llvm.ptr<i8>, !llvm.ptr<i8>, !llvm.ptr<i8>) -> ()
%10 = llvm.call @llvm.adjust.trampoline(%8) : (!llvm.ptr<i8>) -> !llvm.ptr<i8>
%11 = llvm.bitcast %10 : !llvm.ptr<i8> to !llvm.ptr<func<void ()>>
llvm.call @_QMotherPfoo(%11) {fastmathFlags = #llvm.fastmath<fast>} : (!llvm.ptr<func<void ()>>) -> ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
* Added IR example. * Added a note about performance overhead of other implementations.
I started taking a look at this for AArch64. I may have a patch soon implementing the trampolines. |
One word of caution, I think the calling convention on MacOS is using the register that you could use in Linux to implement it on Aarch64. |
Yes, macOS uses x18. I'll have that in mind, thanks! |
FWIW, you may look up the ABI rule used by LLVM for the static chain handling in
|
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Updates llvm#27174 Updates llvm#66157
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157 (cherry picked from commit c4b66bf)
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157 (cherry picked from commit c4b66bf)
Summary: Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes #65573 Fixes #76927 Fixes #83555 Updates #66157 Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250624
Add support for llvm.init.trampoline and llvm.adjust.trampoline intrinsics for AArch64. Fixes llvm#65573 Fixes llvm#76927 Fixes llvm#83555 Updates llvm#66157 (cherry picked from commit c4b66bf)
I would like to start a discussion about the ways for modifying
the current trampolines approach for Fortran internal procedures
used as actual arguments or pointer targets.
As Peter Klausler noted before the current approach implies security
risks due to writeable and executable stack requirement. We may need
to agree on a new scheme that does not have this issue.