change liveslots to use WeakRefs to track objects and report their disuse #2660

warner · 2021-03-16T21:39:27Z

What is the Problem Being Solved?

The next phase of #2615 is to enhance liveslots to be able to sense imports becoming unused. This requires changes to slotToVal and valToSlot, to use WeakRefs and FinalizationRegistries. We need to not retain a strong reference to imports. And we must notice (and report) dropped imports at the right time.

Description of the Design

There are two parts to this. The first part is to change the tables to use a WeakRef, which requires the creation of an exports table to keep a strong reference to exports now that slotToVal is no longer doing so. The notes in #1872 (comment) (and the subsequent comments about Promises) apply, as well as the task list in #1872 (comment), however I think we no longer need a counter. This first part only needs to continue to behave correctly: it does not need to actually create a FinalizationRegistry, or at least it does not need to pay attention to the results.

The second part is to use the FR to report a syscall.dropImports at the right time. This will require changes to the way liveslots is run. First, we need to provide a gc() function to liveslots, so it can provoke engine-level GC towards the end of the crank. We can live without this (e.g. in a non-consensus solo machine, under Node.js, we might not give it gc(), and merely rely upon organic GC calls), at the expense of non-deterministic dropImports calls, and also that vats won't be able to drop their imports until the next time they're called after GC happens (which might take hours or days). Liveslots must tolerate gc() being empty, and we'll need higher-level flags to indicate whether we're supposed to be in a deterministic consensus mode (related to #2519) or not.

Previously, liveslots was not given setImmediate or waitForQuiescent, and the vat supervisor gets control as soon as liveslots+userspace becomes idle. In the new approach, liveslots needs to get control when userspace becomes idle, so it can call gc() and then deliberately allow the IO queue to be serviced so that finalizers get a chance to run. This means we'll need to pass waitForQuiescent into liveslots. I think this means the calls into liveslots (dispatch.deliver/etc) should return a Promise, and the supervisor should simply wait for that Promise to fire. But we need to look carefully at the supervisor used by xsnap and make sure whatever it is waiting for is compatible with this, as I think it waits directly for the underlying engine to become idle, rather than approximating that check with setImmediate.

Security Considerations

Test Plan

Testing the execution of gc() is thorny. Under XS it will probably be determinisic (and we're certainly depending upon it to be so), but under Node.js it's no so clear.

I have a test from an earlier branch that seemed to work correctly about 50% of the time, and silently failed to do anything useful the other 50%, which is 100% better than nothing. I'll start with that.

The text was updated successfully, but these errors were encountered:

Liveslots now uses WeakRefs and a FinalizationRegistry to track the state of each import: UNKNOWN -> REACHABLE -> UNREACHABLE -> COLLECTED -> FINALIZED -> UNKNOWN. Reintroduction can move it from UNREACHABLE/COLLECTED/FINALIZED back to REACHABLE at any time. Liveslots maintains a local `deadSet` that contains all the vrefs which are in the FINALIZED state. They will remain in that state (and in `deadSet`) until a later change which uses `syscall.dropImports` to inform the kernel, and remove them from `deadSet`. Promises are retained until resolved+retired, even if userspace somehow drops all references to them. We might do better in the future, but the story is a lot more complicated than it is for Presences. Exported Remotables are still retained indefinitely. A later change (#2664) will wire `dropExports()` up to drop them. refs #2660

Modify xsnap.c to add a `gc()` function to the globals of the initial ("start") Compartment. This function should trigger an immediate, synchronous, full GC sweep. As a non-standard global, the `gc()` function will be filtered out of the globals in all child Compartments by SES as usual. Note that this changes the snapshot format: heap snapshots written before this change cannot be read by code after this change. This happens because `gc()` (which is implemented in C) is a new "callback" (a C function made available to JS code), which is an "exit" from the reference graph. It must be recognized during serialization, and re-attached during reload, and xsnap cannot handle loading snapshots with a different set of exits, even purely additive changes. closes #2682 refs #2660 refs #2615

warner · 2021-03-19T18:52:05Z

Ok so the second part of this is going to look rather different to work with XS.

In Node, finalizer callbacks are either pushed onto the promise/micro-task queue, or they're pushed onto some other queue that's higher priority than whatever setImmediate uses (I don't know how to tell the difference). So regular code, with access to gc() and setImmediate(), can provoke GC and stall itself long enough for the finalizers to get a chance to run. In that world, liveslots (give gc+setImmediate) can emit dropImports at the end of the crank, before giving control back to the supervisor.

In XS, finalizer callbacks aren't queued. Instead, they're called inline from a function named cleanupFinalizationRegistries. When the supervisor calls into XS, it's a C function (with no XS code on the C stack) that does something like:

xsBeginHost(machine);
xsDoSomethingThatCallsAJSFunction();
xsEndHost(machine);

where xsBeginHost and xsEndHost are macros that expand into a preamble/postamble which does various XS setup and cleanup tasks. Part of the xsEndHost macro is to call fxEndHost() (there are other times that fxBeginHost/fxEndHost are called, I think any time the host calls into XS, which might be reentrant). When fxEndHost() is called for the "last" time (meaning the JS stack is empty), it calls fxEndJob, and fxEndJob() calls cleanupFinalizationRegistries (in xsMapSet.c), which calls all the finalizers.

So XS finalizers obey the JS rules that they run in their own turn, but it doesn't schedule those turns in a place where JS code can wait for them: they only run as the engine is returning control back to the host application (after the outermost invocation has finished: in fact as it is finishing).

So, that kind of scuttles the plan to let liveslots drive things. I can think of a couple of approaches:

Find out if it'd be safe to expose a C callback that calls the finalizers while JS is on the stack waiting for it. I'm dubious, but if it works, it'd be the tidiest approach, and would line up with the Node.js implemention better than anything else.
Enhance our supervisor (xsnap.c) to have a special command type which makes two calls into JS, not just one. We'd use the first to deliver our usual VatDeliveryObject to liveslots (to deliver a message or resolve some promises). That delivery would need to call gc(), so that the registries would be primed, then their finalizers would be run on the way back out of that invocation. The second call would be to a special part of liveslots that reacts to the finalizer consequences and emits dropImports for the deadSet.
Leave xsnap.c alone, but have manager-subprocess-xsnap send two messages. The first is the usual message/notify dispatch, the second is a bringOutYourDead dispatch which queries the deadSet and emits dropImports.

The downside of the last approach is latency, we're sending an extra (although small) message over the pipe for each crank. The downside of the last two approaches is the lack of parallelism with Node.js, and it might be tempting to use bringOutYourDead with Node.js as well (which would be synthesized in manager-local.js just after each normal delivery).

dtribble · 2021-03-19T20:23:05Z

Is it an option to add a downcall from JS to say "send me your finalization notes now"?

erights · 2021-03-19T21:23:59Z

... calls the finalizers while JS is on the stack waiting for it. ... line up with the Node.js [implementation] better than anything else.

The spec says that the finalizers are called in separate turns. There was a part of the proposed API for enable some finalization during a turn, but IIRC that was dropped from the proposal before it advanced in tc39. I just checked my Node v14 and it does not implement that extra API.

erights · 2021-03-19T21:39:36Z

Is it an option to add a downcall from JS to say "send me your finalization notes now"?

That's the part of our original WeakRef proposal that did not survive tc39.

warner · 2021-03-19T22:42:57Z

I've been chatting with Peter, he suggested a simple two-line change that would run the finalizers every time the promise queue is drained, which would let us stick with the simpler approach.

Each time the kernel process sends over a message on the pipe, we execute that message (evaluate its string in the right context), then call a function named fxRunLoop:

agoric-sdk/packages/xsnap/src/xsnap.c

Lines 1047 to 1090 in b5bda04

    
           void fxRunLoop(txMachine* the) 
        
           { 
        
           	c_timeval tv; 
        
           	txNumber when; 
        
           	txJob* job; 
        
           	txJob** address; 
        
           	for (;;) { 
        
           		while (the->promiseJobs) { 
        
           			the->promiseJobs = 0; 
        
           			fxRunPromiseJobs(the); 
        
           		} 
        
           		c_gettimeofday(&tv, NULL); 
        
           		when = ((txNumber)(tv.tv_sec) * 1000.0) + ((txNumber)(tv.tv_usec) / 1000.0); 
        
           		address = (txJob**)&(the->timerJobs); 
        
           		if (!*address) 
        
           			break; 
        
           		while ((job = *address)) { 
        
           			if (job->the) { 
        
           				if (job->when <= when) { 
        
           					(*job->callback)(job); 
        
           					if (job->the) { 
        
           						if (job->interval) { 
        
           							job->when += job->interval; 
        
           						} 
        
           						else { 
        
           							xsBeginHost(job->the); 
        
           							xsResult = xsAccess(job->self); 
        
           							xsForget(job->self); 
        
           							xsSetHostData(xsResult, NULL); 
        
           							xsEndHost(job->the); 
        
           							job->the = NULL; 
        
           						} 
        
           					} 
        
           					break; // to run promise jobs queued by the timer in the same "tick" 
        
           				} 
        
           				address = &(job->next); 
        
           			} 
        
           			else { 
        
           				*address = job->next; 
        
           				c_free(job); 
        
           			} 
        
           		} 
        
           	} 
        
           }

It basically does:

while True:
  while not promiseQueueIsEmpty:
    processPromiseQueue()
  if not someTimerJobsAreReady:
    break
  while someTimerJobsAreReady:
    processReadyTimers()

What we're looking at is to let all finalizers run at the beginning of the loop, before we examine the promise queue. (His original suggestion was to run them after processing the promise queue but before the timers.. I figure earlier is better, in case a finalizer callback pushes something onto the promise queue):

while True:
  processAllFinalizationRegistries()  # new
  while not promiseQueueIsEmpty:
    processPromiseQueue()
  if not someTimerJobsAreReady:
    break
  while someTimerJobsAreReady:
    processReadyTimers()

That will run finalizers before running promises, which is aggressive and .. different, but I think still within spec, and I don't think it would cause us any problems. And it would allow a simple gc(); await new Promise(setImmediate); to ensure that 1: the promise queue is empty (so user code has lost agency), 2: a full GC pass has happened, 3: all finalizers have had a chance to run. Which is exactly what we need.

The specific change to xsnap.c would be to introduce an empty:

  xsBeginHost(the);
  xsEndHost(the);

pair at the top of the for (;;) loop (usualy you'd put other code between those two calls, but in this case we leave it empty). All FinalizationRegistries are processed during xsEndHost call, and we have several of those elsewhere, but by having an explicit begin/end pair at the start of the loop, all the finalizers will be run before the promise queue is serviced.

This wouldn't trigger GC by itself: we still have to make an explicit gc() call to make sure that happens. This is a change to our xsnap.c program, not anything upstream in the Moddable repository.

I don't know what sort of impact this will have on performance. The code I'm looking at is:

https://github.com/Moddable-OpenSource/moddable/blob/7abc931d799f70bda545bfb7808c8dc786d8cfcb/xs/sources/xsMapSet.c#L1540

and it looks like it walks through every registered object every time it is polled, testing each finalizationCell to see if it's empty or not, and then running the callback if empty. That sounds O(N) in the number of objects, which might be annoying.

Liveslots now uses WeakRefs and a FinalizationRegistry to track the state of each import: UNKNOWN -> REACHABLE -> UNREACHABLE -> COLLECTED -> FINALIZED -> UNKNOWN. Reintroduction can move it from UNREACHABLE/COLLECTED/FINALIZED back to REACHABLE at any time. Liveslots maintains a local `deadSet` that contains all the vrefs which are in the FINALIZED state. They will remain in that state (and in `deadSet`) until a later change which uses `syscall.dropImports` to inform the kernel, and remove them from `deadSet`. We remove imported objects from the deadSet if/when they are re-introduced. Promises are retained until resolved+retired, even if userspace somehow drops all references to them. We might do better in the future, but the story is a lot more complicated than it is for Presences. Exported Remotables are still retained indefinitely. A later change (#2664) will wire `dropExports()` up to drop them. We only register finalizers for imported objects: not imported promises, and not exports of any flavor. Liveslots is not yet calling syscall.dropImports, but by mocking WeakRef and FinalizationRegistry, we can test to make sure it updates the deadSet correctly. refs #2660

`dispatch()`, the low-level interface to each vat (generally provided by liveslots), is now async. Vats are responsible for not resolving the promise returned by `dispatch()` until the user-level code has finished running and the crank is complete. Vats are given `waitUntilQuiescent` in their `gcTools` argument to facilitate this. This will make it possible for liveslots to run `gc()` and wait long enough to give finalizers a chance to run (and then call `dropImports`) before the crank is considered complete. closes #2671 refs #2660

warner · 2021-05-20T06:05:24Z

Peter suggested a different fix a few weeks later. He found out that fxEndJob() is effectively a simple "run all finalizers" function, and it can safely be called from our run loop. So the code that looks like:

	for (;;) {
		while (the->promiseJobs) {
			the->promiseJobs = 0;
			fxRunPromiseJobs(the);
		}
		c_gettimeofday(&tv, NULL);
        ...

can become:

	for (;;) {
		while (the->promiseJobs) {
			while (the->promiseJobs) {
				the->promiseJobs = 0;
				fxRunPromiseJobs(the);
			}
			fxEndJob(the);
		}
		c_gettimeofday(&tv, NULL);
      ...

I think this will behave like:

while True:
  while not promiseQueueIsEmpty:
    while not promiseQueueIsEmpty:
      processPromiseQueue()
    runFinalizers()
  if not someTimerJobsAreReady:
    break
  processFirstReadyTimer()

Our front-of-crank sequence is to queue up the dispatch invocation on the promise queue, call waitForQuiescent(), and await the result. Since userspace cannot use timers or setImmediate, we'll spin in the inner while not promiseQueueIsEmpty: loop until userspace is done and loses agency. We'll do one runFinalizers() at that point, but unless GC happened to occur during the crank, it won't find any work to do.

Then our mid-crank pause happens. I found a gcAndFinalize() implementation that works on both Node.js and XS:

async function gcAndFinalize() {
  if (typeof gc !== 'function') {
    console.log(`unable to gc(), skipping`);
    return;
  }
  // on Node.js, GC seems to work better if the promise queue is empty first
  await new Promise(setImmediate);
  // on xsnap, we must do it twice for some reason
  await new Promise(setImmediate);
  gc();
  // this gives finalizers a chance to run
  await new Promise(setImmediate);
}

This drains the promise queue (twice, without that xsnap didn't work), the last queue entry will call gc() and then the waitForQuiescent() equivalent. The call to gc() will prep the finalizers, then the empty promise queue will call runFinalizers(), then (since our finalizers won't be resolving any promises) it drops through to checking the timers, which will fire the setImmediate, which kicks us back to the promise-queue loop, and lets the post-gc() code run to process the finalizer results.

Once that is complete, the end-of-crank sequence can proceed, which looks at the finalizer results and makes a bunch of GC syscalls.

This adds a platform-specific `gcAndFinalize()` function, which returns a Promise that resolves when GC has been provoked and FinalizationRegistry callbacks have had a chance to run. On Node.js, the application must be run with --expose-gc . On `xsnap`, a small change was made to the run loop to let finalizers run before after the promise queue is empty and before timer events run. refs #2660

This changes the xsnap.c run loop to give finalizers a chance to run just after the promise queue drains. With this change, userspace can do a combination of `gc()` and `setImmediate` that lets it provoke a full GC sweep, and wait until finalizers have run. SwingSet will use this during a crank, after userspace has become idle, and before end-of-crank GC processing takes place. This combination is implemented in a function named `gcAndFinalize()`. We copy this function from its normal home in SwingSet so the xsnap.c behavior it depends upon can be tested locally. refs #2660