Skip to content
Nick Barnes edited this page May 18, 2013 · 2 revisions

x86 ABI

This documents the ABI (Application Binary Interface) for MLWorks running on x86 processors. This includes the calling convention, the stack frame layout, allocation sequences, etc.

Calling Convention

Registers

When one ML function calls another, the register usage is as follows:

Register Preserved Description
EDI Yes Callee Closure - the closure of the called function.
EBP Yes Caller Closure - the closure of the calling function.
EBX No Argument - the function argument.
ESP Yes Stack Pointer
ESI Yes Implicit Vector (Thread)
EAX Yes general callee-save
EDX Yes general callee-save
ECX No (scratch register)

On function return, the result is in EBX. The other registers apart from ECX are all unchanged (i.e. callee-save).

ESI points to the 'implicit vector', which is a per-thread structure managed by the runtime which ML machine code uses to access any non-closed-over state (for instance, the allocation pointer). See below.

Stack

Frames on the ML stack are linked: every frame has a 'frame pointer' to the next frame. Each frame looks like this (offsets are from the frame pointer):

Offset Description
0 Frame pointer
4 Closure
... GC stack slots (saves and spills)
... non-GC stack slots (saves and spills)
fp-4N-4 Return address
...  
fp-8 stack argument 1
fp-4 stack argument 0 (pushed by caller)
fp next frame

So at the point of function entry, the stack looks like this:

Offset Description
0 Return address
4 stack argument 0
8 stack argument 1
...  
4N+4 stack argument N
4N+8 caller's stack frame

Here N is function-dependent and stored in the ancillary word for the function, as is the size of the non-GC area of the stack frame (they are the CCODEARGS and CCODENONGC slots in the ancillary word, respectively; see the Object Format page).

Implicit Vector

The ESI register points to a structure managed by the MLWorks runtime, which contains a large number of values used by ML code at runtime. Most of these values are pointers to code in the runtime; others are pointers to key data structures or memory areas. Within the runtime, this is a struct thread_state (see threads.h), which begins with a struct implicit_vector (see implicit.h).

The runtime build system makes rts/gen/__implicit.sml from rts/src/implicit.h using rts/awk/__implicit.awk. The ML file defines an SML structure structure ImplicitVector_ containing the offsets (as ML integers) of each slot in the implicit vector. This system depends on a couple of facts:

  • The layout of implicit.h is very consistent;
  • Every slot in the implicit vector is one word (32 bits) in size.

The implicit vector is the same for every platform, and changes very rarely. On register-rich architectures, several slots shadow registers, which contain the 'live' values: the values on the implicit vector may be out of date. On x86, there are no such slots: all the values in the implicit vector are 'live'.

It is laid out as follows:

Offset Name Description
0 ref_chain List of arrays modified by ML code since the last GC
4 gc code to enter GC
8 gc_leaf code to enter GC for a leaf function
c external code to lookup an environment function
10 extend code to handle stack overflow
14 raise_code code to raise an exception
18 leaf_raise_code code to raise an exception in a leaf function
1c replace code for replacing a function
20 replace_leaf code for replacing a leaf function
24 intercept code for intercepting a function
28 intercept_leaf code for intercepting a leaf function
2c interrupt flag indicating that we are in an interrupt handler
30 event_check code to handle an asynchronous event
34 event_check_leaf code to handle an asynchronous event in a leaf function
38 profile_alloc code for allocation during space profiling
3c profile_alloc_2 (see below)
40 profile_alloc_3 (see below)
44 profile_alloc_leaf code for allocation in a leaf function during space profiling
48 profile_alloc_leaf_2 (see below)
4c profile_alloc_leaf_3 (see below)
50 gc_base the current allocation point
54 gc_limit the limit of the allocation area (except when space profiling on x86)
58 real_gc_limit the actual allocation area limit
5c handler linked list of handler frames (see below)
60 stack_limit the true ML stack limit
64 register_stack_limit the stack limit, or -1 if there is a pending interrupt

On SPARC platforms, the code for allocation during space profiling is actually stored on the implicit vector, which is why three words are used. On other platforms, the profile_alloc_2 and profile_alloc_3 slots may be used for temporaries during the space-profiling allocation.

When space profiling on x86 platforms, the gc_limit slot is set to the base of the allocation area, so that every allocation enters the runtime (on other platforms, allocation code sequences are modified). The actual limit of the allocation area is in the real_gc_limit slot.

Exception Handling

Exception handling is done via a linked list of "handler frames" which are 4-tuples allocated on the stack. The head of the list is on the implicit vector (implicit->handler). To create a handler a new handler frame is allocated in the current stack frame, filled in, and pushed on the head of this list. When control flow passes out of the handler's scope, the handler is popped off the list.

Each handler frame has the following contents:

Offset Name Description
-1 previous Previous handler
3 sp Stack pointer of creator
7 closure Handler function closure
b continuation Offset within creator of continuation code

The handler frame pointer is offset by 1 from the first field, so that it is tagged as a pointer, and its contents can be accessed by code as if it were a tuple.

Code in the runtime (ml_raise in interface.S) raises an exception by building a fake stack frame and calling the handler. If the handler returns (and therefore the exception has been successfully handled), ml_raise then unwinds the stack to the creator's frame and jumps to the continuation. If the handler doesn't handle the exception, or raises a different one, it calls ml_raise again.

TODO

  • More on the calling convention
  • Something on larger-scale stack organisation
  • Something on allocation
Clone this wiki locally