-
Notifications
You must be signed in to change notification settings - Fork 19
x86 ABI
This documents the ABI (Application Binary Interface) for MLWorks running on x86 processors. This includes the calling convention, the stack frame layout, allocation sequences, etc.
When one ML function calls another, the register usage is as follows:
Register | Preserved | Description |
---|---|---|
EDI | Yes | Callee Closure - the closure of the called function. |
EBP | Yes | Caller Closure - the closure of the calling function. |
EBX | No | Argument - the function argument. |
ESP | Yes | Stack Pointer |
ESI | Yes | Implicit Vector (Thread) |
EAX | Yes | general callee-save |
EDX | Yes | general callee-save |
ECX | No | (scratch register) |
On function return, the result is in EBX. The other registers apart from ECX are all unchanged (i.e. callee-save).
ESI points to the 'implicit vector', which is a per-thread structure managed by the runtime which ML machine code uses to access any non-closed-over state (for instance, the allocation pointer). See below.
Frames on the ML stack are linked: every frame has a 'frame pointer' to the next frame. Each frame looks like this (offsets are from the frame pointer):
Offset | Description |
---|---|
0 | Frame pointer |
4 | Closure |
... | GC stack slots (saves and spills) |
... | non-GC stack slots (saves and spills) |
fp-4N-4 | Return address |
... | |
fp-8 | stack argument 1 |
fp-4 | stack argument 0 (pushed by caller) |
fp | next frame |
So at the point of function entry, the stack looks like this:
Offset | Description |
---|---|
0 | Return address |
4 | stack argument 0 |
8 | stack argument 1 |
... | |
4N+4 | stack argument N |
4N+8 | caller's stack frame |
Here N
is function-dependent and stored in the ancillary word for
the function, as is the size of the non-GC area of the stack frame
(they are the CCODEARGS
and CCODENONGC
slots in the ancillary
word, respectively; see the Object Format page).
The ESI register points to a structure managed by the MLWorks runtime,
which contains a large number of values used by ML code at runtime.
Most of these values are pointers to code in the runtime; others are
pointers to key data structures or memory areas. Within the runtime,
this is a struct thread_state
(see threads.h
), which begins
with a struct implicit_vector
(see implicit.h
).
The runtime build system makes rts/gen/__implicit.sml
from
rts/src/implicit.h
using rts/awk/__implicit.awk
. The ML file
defines an SML structure structure ImplicitVector_
containing the
offsets (as ML integers) of each slot in the implicit vector. This
system depends on a couple of facts:
- The layout of
implicit.h
is very consistent; - Every slot in the implicit vector is one word (32 bits) in size.
The implicit vector is the same for every platform, and changes very rarely. On register-rich architectures, several slots shadow registers, which contain the 'live' values: the values on the implicit vector may be out of date. On x86, there are no such slots: all the values in the implicit vector are 'live'.
It is laid out as follows:
Offset | Name | Description |
---|---|---|
0 | ref_chain | List of arrays modified by ML code since the last GC |
4 | gc | code to enter GC |
8 | gc_leaf | code to enter GC for a leaf function |
c | external | code to lookup an environment function |
10 | extend | code to handle stack overflow |
14 | raise_code | code to raise an exception |
18 | leaf_raise_code | code to raise an exception in a leaf function |
1c | replace | code for replacing a function |
20 | replace_leaf | code for replacing a leaf function |
24 | intercept | code for intercepting a function |
28 | intercept_leaf | code for intercepting a leaf function |
2c | interrupt | flag indicating that we are in an interrupt handler |
30 | event_check | code to handle an asynchronous event |
34 | event_check_leaf | code to handle an asynchronous event in a leaf function |
38 | profile_alloc | code for allocation during space profiling |
3c | profile_alloc_2 | (see below) |
40 | profile_alloc_3 | (see below) |
44 | profile_alloc_leaf | code for allocation in a leaf function during space profiling |
48 | profile_alloc_leaf_2 | (see below) |
4c | profile_alloc_leaf_3 | (see below) |
50 | gc_base | the current allocation point |
54 | gc_limit | the limit of the allocation area (except when space profiling on x86) |
58 | real_gc_limit | the actual allocation area limit |
5c | handler | linked list of handler frames (see below) |
60 | stack_limit | the true ML stack limit |
64 | register_stack_limit | the stack limit, or -1 if there is a pending interrupt |
On SPARC platforms, the code for allocation during space profiling is
actually stored on the implicit vector, which is why three words are
used. On other platforms, the profile_alloc_2
and
profile_alloc_3
slots may be used for temporaries during the
space-profiling allocation.
When space profiling on x86 platforms, the gc_limit
slot is set to the base of the allocation area, so that every allocation enters the runtime (on other platforms, allocation code sequences are modified). The actual limit of the allocation area is in the real_gc_limit
slot.
Exception handling is done via a linked list of "handler frames" which
are 4-tuples allocated on the stack. The head of the list is on the
implicit vector (implicit->handler
). To create a handler a new
handler frame is allocated in the current stack frame, filled in, and
pushed on the head of this list. When control flow passes out of the
handler's scope, the handler is popped off the list.
Each handler frame has the following contents:
Offset | Name | Description |
---|---|---|
-1 | previous | Previous handler |
3 | sp | Stack pointer of creator |
7 | closure | Handler function closure |
b | continuation | Offset within creator of continuation code |
The handler frame pointer is offset by 1 from the first field, so that it is tagged as a pointer, and its contents can be accessed by code as if it were a tuple.
Code in the runtime (ml_raise
in interface.S
) raises an
exception by building a fake stack frame and calling the handler. If
the handler returns (and therefore the exception has been successfully
handled), ml_raise
then unwinds the stack to the creator's frame
and jumps to the continuation. If the handler doesn't handle the
exception, or raises a different one, it calls ml_raise
again.
- More on the calling convention
- Something on larger-scale stack organisation
- Something on allocation