Execution context to enable flow-like scheduler #7875
Labels
cosmic-swingset
package: cosmic-swingset
enhancement
New feature or request
needs-design
performance
Performance related issues
SwingSet
package: SwingSet
What is the Problem Being Solved?
Our current scheduler implemented by swingset and cosmic-swingset is simplistic and has run to completion semantics for every I/O event: an input event (cosmos action, time update) adds entries on the swingset run queue, and
controller.run()
executes until no more work can be done (the run queue is empty), modulo block execution limits.This leaves the execution vulnerable to runaway message loops, either direct or timer based. See #7847.
We need to switch to a more advanced scheduler design that is capable of interleaving executions so that new I/O events are not preempted by existing work.
Description of the Design
The high level approach is to associate messages together in an "execution flow", which would be roughly (but not exactly) equivalent to the messages that would have been processed in the "run to completion" scheduler. Unlike run to completion, multiple "execution flows" may be active at the same time. These "execution flows" are managed by the host application (cosmic-swingset), which may use them to instruct Swingset which message should be processed next.
This assumes Swingset has multiple queues of messages that are eligible to be processed next, instead of the single run queue we have now. Most likely these would be the per vat inbound and outbound queues described in #5025 in order to implement the message ordering guarantees defined in #3465: messages from a given vat to a same presence must be delivered in the same order they were sent, even if these messages are associated to separate flows.
At its core, every message in a queue would be associated with a "flow id" or "execution context". Swingset is responsible for replicating this "flow id" / "execution context" during vat execution: every message send, promise resolution or subscription made during a delivery automatically inherits the execution flow of the message which triggered the vat execution.
The run policy would be replaced by a mechanism that allows the host application to select which queue to process next, if any. Information such as the queue details (vat inbound or outbound, or promise queue, queue depth, etc.) as well as the details of the top most message of the queue (message type, "flow id"/"execution context") would be available to the host application.
Even if the host does not use the "execution flow" to decide which queue to process next, this would allow users to gain better visibility into the state of the execution triggered by their action.
Single stream limitation
This "execution flow" is an implicit dynamic context which is not revealed to vats. Because of that, we have 2 general limitations:
AsyncContext
proposal does not enable us to detect this.Interaction with timers / devices
In order to properly associate a timer event with an "execution flow" the host implementation of the timer device should be provided with the "flow id" / "execution context" that triggered the queueing. That way once the host selects a new event from the timer queue, it can restore the correct "execution flow". This assumes #7846.
Similarly, when executing devices (timer wake, bridge inbound), the host must be able to set what the current "flow id" / "execution context" is. This is actually how new "execution flows" are created.
When a vat makes a device call, swingset would provide the related context info to the host, which can then transmit it forward if appropriate. One possible use case is to automatically annotate vstorage writes with the transaction info that originally triggered it (which may be different from the blockheight / time where the write happens)
Deferral of prioritization decision
Swingset itself would not have any logic to decide which active queue should be serviced next. The active queues simply enforce basic ordering guarantees, and in the future will enable partial parallelization of execution. The scheduling decision is offloaded to the host application by allowing it to select the order in which active queues should be processed.
The mechanism described here does not define how the host should implement its prioritization. It simply adds a dimension to the information available to the host to make scheduling decisions.
One possibility would be for the host to select the next queue / message to process based on the amount of execution a flow has seen to date, prioritizing flows that are in progress but have not been executing for too long (like a bell curve), and maybe mixing in a priority to certain flows.
It's also possible that the prioritization of certain flows may be influenced by some available economic data, like paid prioritization.
References
Security Considerations
None I can think of right now as this is only information internal to the host application and never exposed to contract code.
Scaling Considerations
By itself this issue does not impact scaling, however it enables various scheduling changes which will likely have impact on perceived performance.
Test Plan
Since this issue is about associating flow information with existing messages in queues, the only testing surface is making sure the flow information is propagated as expected.
A cosmic-swingset scheduler built on top of this information would be the interesting bit to test.
The text was updated successfully, but these errors were encountered: