-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cranelift: add support for the Mac aarch64 calling convention #2742
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -171,6 +171,21 @@ impl ABIMachineSpec for AArch64MachineDeps { | |
let has_baldrdash_tls = call_conv == isa::CallConv::Baldrdash2020; | ||
|
||
// See AArch64 ABI (https://c9x.me/compile/bib/abi-arm64.pdf), sections 5.4. | ||
// | ||
// MacOS aarch64 is slightly different, see also | ||
// https://developer.apple.com/documentation/xcode/writing_arm64_code_for_apple_platforms. | ||
// We are diverging from the MacOS aarch64 implementation in the | ||
// following ways: | ||
// - sign- and zero- extensions of data types less than 32 bits are not | ||
// implemented yet. | ||
// - i128 arguments passing isn't implemented yet in the standard (non | ||
// MacOS) aarch64 ABI. | ||
// - we align the arguments stack space to a 16-bytes boundary, while | ||
// the MacOS allows aligning only on 8 bytes. In practice it means we're | ||
// slightly overallocating when calling, which is fine, and doesn't | ||
// break our other invariants that the stack is always allocated in | ||
// 16-bytes chunks. | ||
|
||
let mut next_xreg = 0; | ||
let mut next_vreg = 0; | ||
let mut next_stack: u64 = 0; | ||
|
@@ -264,13 +279,24 @@ impl ABIMachineSpec for AArch64MachineDeps { | |
*next_reg += 1; | ||
remaining_reg_vals -= 1; | ||
} else { | ||
// Compute size. Every arg takes a minimum slot of 8 bytes. (16-byte | ||
// stack alignment happens separately after all args.) | ||
// Compute the stack slot's size. | ||
let size = (ty_bits(param.value_type) / 8) as u64; | ||
let size = std::cmp::max(size, 8); | ||
// Align. | ||
|
||
let size = if call_conv != isa::CallConv::AppleAarch64 { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When depending on new features of a dependency could you also update the dependency requirement in |
||
// Every arg takes a minimum slot of 8 bytes. (16-byte stack | ||
// alignment happens separately after all args.) | ||
std::cmp::max(size, 8) | ||
} else { | ||
// MacOS aarch64 allows stack slots with sizes less than 8 | ||
// bytes. They still need to be properly aligned on their | ||
// natural data alignment, though. | ||
size | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could the fast and cold call conv use this too? They are unstable anyway and this saves stack usage. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe? It's a bit out of scope for this PR, so I'll try not to accidentally introduce new changes there. Plus, I am not sure if these conventions are used; I seem to recall that there fast implies the default calling convention in some cases, and if we're not being careful that might mean subtly breaking other calling conventions. We should probably audit the "fast"/"cold" calling conventions at some point, and re-design them from the ground up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 -- I actually think we should do something about the "fast" and "cold" conventions, but we should be a little more explicit in designing them, and probably give them better names. I'm not a huge fan of generic terms like "fast ABI" because the ambiguity is confusing -- both on the user/embedder side ("what guarantees does this have? how fast is fast? when can I use it?") and on the implementer side (do we just choose an array of features that lead to better speed, and add more as we think of them? IMHO an evolving ABI is a recipe for subtle bugs as we mutate invariants over time). So I'd rather design a "fast internal Cranelift ABI v1", implement it, and then keep that as a first-class, well-defined ABI alongside the others, and retire names like "fast" and "cold", just for clarity's sake. But, that's a deeper discussion for another day, I think! |
||
}; | ||
|
||
// Align the stack slot. | ||
debug_assert!(size.is_power_of_two()); | ||
next_stack = align_to(next_stack, size); | ||
|
||
ret.push(ABIArg::stack( | ||
next_stack as i64, | ||
param.value_type, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I noticed here: "If the total number of bytes for stack-based arguments is not a multiple of 8 bytes, insert padding on the stack to maintain the 8-byte alignment requirements." However below we align the final stack-arg area size up to a 16-byte alignment.
(Related to this, my understanding is that the trap-on-not-16-aligned-SP behavior of aarch64 is configurable with a mode bit as well; maybe this means Apple runs with only an 8-aligned stack?)
I think this is OK as it should be fine to reserve extra space at a callsite, but we should document that we diverge and why it's OK (and verify that it is!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and it maintains our invariant that the stack is always allocated in 16-bytes chunked, thus always aligned, which is nice. (Otherwise would require more changes when generating prologues and epilogues.)