Skip to content

Stack Tracing in Encore

TobiasWrigstad edited this page Apr 22, 2015 · 4 revisions

Encore Stack Map Creation

There are multiple optimisations possible with the idea described below. The goal with this text is not to describe those, but to describe the general idea which can be optimised further at a later stage.

The idea behind Encore stack maps is to superimpose a shadow stack on the proper Encore stack. This may well turn out to be a bad idea, but a prototype will be needed to figure this out.

The idea is the following. For each method (closure, etc.) in Encore, create a C struct which will be used to holds its local pointer variables. For example, imagine the following Encore program:

passive class Bar
  ...

class Foo
  f1:int
  f2:Bar
  def init(x:Bar, y:Baz, i:int) : void
    let
      z = y
    in
      ...
  def frob(z:Baz, j:int) : void
   z.barcode()

For this class we will create two C structs like so (the actual names are not important, but I preserve names to keep the mapping clear.):

struct stackmap_Foo_init 
{
  struct stackmap_base;
  void *x;
  void *y;
  void *z;
}; 

and

struct stackmap_Foo_frob 
{
  struct stackmap_base;
  void *z;
}; 

Where struct stackmap_base is globally defined like this:

struct stackmap_base
{
  void parent;          // _ prefix omitted
  void (*trace)(void*); // _ prefix omitted
};

Furthermore, two functions will be created:

void Foo_init_trace(stackmap_Foo_init *frame) 
{
  pony_traceobject(frame->x, Bar_trace);
  pony_traceactor(frame->y);
  pony_traceactor(frame->z);
  if (frame->parent)
    {
      frame->next->trace(frame->parent);
    }
}

and

void Foo_init_frob(stackmap_Foo_frob *frame) 
{
  pony_traceactor(frame->z);
  if (frame->parent)
    {
      frame->next->trace(frame->parent);
    }
}

The code that gets generated from Foo's init should create a local variable "struct stackmap_Foo_init frame" that lives on the stack frame. The frame variable should be updated so that frame->var holds the contents of the variable var. One possible way to achieve this is to remove all local pointer variables and just manipulate the frame struct instead. Another possible way to achieve this is to synchronise local variables with the struct before every method (closure, etc.) call. Furthermore, every method should take one additional argument, a void pointer, called parent. Calling a synchronous method should pass &frame in as the argument to parent, otherwise NULL or nothing if the parameter can be avoided.

For example, imagine that this is the code created by compiling init today:

void init(Foo *this, Bar *x, Baz *y, int64_t i)
{
  Baz *z = y;
  Bar_frob(this, z, i);
  x = this.f2;
  i = i * 2;
  this.f1 = i;
  Bar_frob(this, z, i);
}

The first case would be this:

void init(void *parent, Foo *this, Bar *x, Baz *y, int64_t i)
{
  struct stackmap_Foo_init frame = { .x = x, .y = y, .z = z, 
                                     .parent = parent,
				 .trace = Foo_init_trace };
  frame->z = y;
  Bar_frob(&frame, this, frame->z, i);
  frame->x = this.f2;
  i = i * 2;
  this.f1 = i;
  Bar_frob(&frame, this, frame->z, i);
}

The second case would be this:

void init(void *parent, Foo *this, Bar *x, Baz *y, int64_t i)
{
  struct stackmap_Foo_init frame = { .parent = parent,
  	     		       	       	 .trace = Foo_init_trace };
  z = y;			       
      // Sync before call
      frame.x = x; 
      frame.y = y; 
      frame.z = z; 
  Bar_frob(&frame, this, z, i);
  x = this.f2;
  i = i * 2;
  this.f1 = i;
      // Sync before call
      frame.x = x; 
      frame.y = y; 
      frame.z = z; 
  Bar_frob(&frame, this, z, i);
}

Which one is best is not clear to me at this point although I have a feeling that the latter may be more efficient with respect to register optimisation, but also creates a 2x overhead for pointer values. Experimentation with optimisation will be key.

Note how this compilation scheme introduces a "linked list" of stackmaps on the stack, where each one points to the previous stack frame. If top is a pointer to a stackmap_base *, then we should be able to do the following:

top->trace(top); 

This should now call the proper trace functions for all the values on the stack.

Whenever await or suspend is called, we should add the current top to a list of active stacks in the object. The trace function for the corresponding class should go through that list and perform the equivalent of top->trace(top); for each stack pointer in the list.

Caveats and thoughts:

  • UPDATE Seems that the GC already avoids this, see the implementation of gc_markobject(). END OF UPDATE We should not trace pointers that point to objects in other heaps. Some additional magic is needed to solve this. For example, find out the address range of the current local heap and chain this through the trace functions above as two void pointers "upper" and "lower", and prefic each call to pony_trace with "if (lover <= frame->var && frame->var < upper)".

    I am not certain that all memory in a local heap is consecutive, for example large blocks that don't fit in buckets may not be. This might require extra work to check where a pointer is pointing.

    Another possibility is to use existing Pony actor_ref tools to see whether a pointer is local or remote.

  • This design does not yet deal with polymorphism! A possible solution uses an alternative to hard coding the type-specific trace functions in the frame-specific trace function but requires storing the type of each object in the object itself, by which it could get access to its trace function (this is probably not a bad thing for other reasons). Something like:

    my_encore_ptr->type->trace_function

    The lines for tracing some variable var whose type is not statically known would then obtain the trace function using this method.

  • If all trace function pointers are retrieved like the case just above, frame-specific trace functions are not needed. stackmap_base can in this case be defined thus:

    struct stackmap_base 
    {
      int size;
      void *next;
    };

And there is just one global trace function needed:

    void stackmap_trace(const struct stackmap_base *frame)
    {
      void **ptrs = ((void *)frame)+sizeof(stackmap_base);
      for (int i = 0; i < frame->size; ++i)
        {	
          pony_traceobject(ptrs[i], ptrs[i]->type->trace_function);
        }
      if (frame->parent)
        {
          stackmap_trace(frame->parent); 
        }
    }

This may be broken for active objects but I don't think so. (It is a simple fix in that case, f.ex. let each stackmap have two "arrays", one of passive objects and one of active objects.)

  • We could optimise the sizes of the stackmaps by calculating if two variables cannot be live at the same time and then use the same slot in the stackmap for them. This requires that the "polymorphic way" to get a trace function is used. (Think register allocation.)

  • Perhaps needless to point out, the current transformation into SSA complicates this design. SSA introduces lots of variables which are possibly/hopefully mostly removed depending on how C compilers work (removing local variables is non-trivial due to the possibility of stack pointers etc.).

A Possibly Flawed Stack Scanning Implementation

In the dispatch method of every active object, add something like this:

void *bottom = &bottom; 
this->stack_bottom = bottom; 

which requires that you add a void *stack_bottom field to every Encore object. Also add a void *stack_top and a pointer to the type.

Then, when you suspend or await, don't block the GC, and do, right before you call the suspend or await functions, something like this:

void *top = &top; 
this->stack_top = top; 

Then, add to the trace function for each class the following:

if (stack_top)
   {
     for(void *cursor = this->stack_bottom; cursor < this->stack_top - sizeof(void*); ++cursor)
       if (cursor->type->passive)
         pony_traceobject(cursor, cursor->type->trace);
       else
         pony_traceactor(cursor);
   }

which also requires that we add a boolean flag to each type struct to track the active/passive status. Quite possibly I am mixing up stack bottom/top.

In summary:

  • Extend dispatch to save stack bottom
  • Extend suspend/await to save stack top before calling the corresponding function
  • Extend types to track active/passive
  • Extend objects with pointers to their type
  • Extend the trace function of classes to scan the stack as outlined

Notably, this design only works for ONE stack. It needs to be extended to deal with multiple simultaneous suspends and awaits. Further, stack_bottom/stack_top information must be removed when dispatch methods exit.