Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution context to enable flow-like scheduler #7875

Open
mhofman opened this issue Jun 1, 2023 · 3 comments
Open

Execution context to enable flow-like scheduler #7875

mhofman opened this issue Jun 1, 2023 · 3 comments
Assignees
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request needs-design performance Performance related issues SwingSet package: SwingSet

Comments

@mhofman
Copy link
Member

mhofman commented Jun 1, 2023

What is the Problem Being Solved?

Our current scheduler implemented by swingset and cosmic-swingset is simplistic and has run to completion semantics for every I/O event: an input event (cosmos action, time update) adds entries on the swingset run queue, and controller.run() executes until no more work can be done (the run queue is empty), modulo block execution limits.

This leaves the execution vulnerable to runaway message loops, either direct or timer based. See #7847.

We need to switch to a more advanced scheduler design that is capable of interleaving executions so that new I/O events are not preempted by existing work.

Description of the Design

The high level approach is to associate messages together in an "execution flow", which would be roughly (but not exactly) equivalent to the messages that would have been processed in the "run to completion" scheduler. Unlike run to completion, multiple "execution flows" may be active at the same time. These "execution flows" are managed by the host application (cosmic-swingset), which may use them to instruct Swingset which message should be processed next.

This assumes Swingset has multiple queues of messages that are eligible to be processed next, instead of the single run queue we have now. Most likely these would be the per vat inbound and outbound queues described in #5025 in order to implement the message ordering guarantees defined in #3465: messages from a given vat to a same presence must be delivered in the same order they were sent, even if these messages are associated to separate flows.

At its core, every message in a queue would be associated with a "flow id" or "execution context". Swingset is responsible for replicating this "flow id" / "execution context" during vat execution: every message send, promise resolution or subscription made during a delivery automatically inherits the execution flow of the message which triggered the vat execution.

The run policy would be replaced by a mechanism that allows the host application to select which queue to process next, if any. Information such as the queue details (vat inbound or outbound, or promise queue, queue depth, etc.) as well as the details of the top most message of the queue (message type, "flow id"/"execution context") would be available to the host application.

Even if the host does not use the "execution flow" to decide which queue to process next, this would allow users to gain better visibility into the state of the execution triggered by their action.

Single stream limitation

This "execution flow" is an implicit dynamic context which is not revealed to vats. Because of that, we have 2 general limitations:

  • an execution cannot generate new execution flows. In particular if a vat was to process a single message bundling multiple messages from a remote swingset, these individual messages would not be able to carry explicit "execution flows". We may be able to add a privileged API for this later.
  • merging multiple flows of execution inside a vat results in execution associated to the last flow. There is currently no way to detect merging of flows in JavaScript, and even the AsyncContext proposal does not enable us to detect this.

Interaction with timers / devices

In order to properly associate a timer event with an "execution flow" the host implementation of the timer device should be provided with the "flow id" / "execution context" that triggered the queueing. That way once the host selects a new event from the timer queue, it can restore the correct "execution flow". This assumes #7846.

Similarly, when executing devices (timer wake, bridge inbound), the host must be able to set what the current "flow id" / "execution context" is. This is actually how new "execution flows" are created.

When a vat makes a device call, swingset would provide the related context info to the host, which can then transmit it forward if appropriate. One possible use case is to automatically annotate vstorage writes with the transaction info that originally triggered it (which may be different from the blockheight / time where the write happens)

Deferral of prioritization decision

Swingset itself would not have any logic to decide which active queue should be serviced next. The active queues simply enforce basic ordering guarantees, and in the future will enable partial parallelization of execution. The scheduling decision is offloaded to the host application by allowing it to select the order in which active queues should be processed.

The mechanism described here does not define how the host should implement its prioritization. It simply adds a dimension to the information available to the host to make scheduling decisions.

One possibility would be for the host to select the next queue / message to process based on the amount of execution a flow has seen to date, prioritizing flows that are in progress but have not been executing for too long (like a bell curve), and maybe mixing in a priority to certain flows.

It's also possible that the prioritization of certain flows may be influenced by some available economic data, like paid prioritization.

References

Security Considerations

None I can think of right now as this is only information internal to the host application and never exposed to contract code.

Scaling Considerations

By itself this issue does not impact scaling, however it enables various scheduling changes which will likely have impact on perceived performance.

Test Plan

Since this issue is about associating flow information with existing messages in queues, the only testing surface is making sure the flow information is propagated as expected.

A cosmic-swingset scheduler built on top of this information would be the interesting bit to test.

@mhofman mhofman added enhancement New feature or request SwingSet package: SwingSet cosmic-swingset package: cosmic-swingset performance Performance related issues labels Jun 1, 2023
@mhofman
Copy link
Member Author

mhofman commented Jun 13, 2023

Updated to add propagation of context info to upstream golang calls by the cosmic-swingset host.

@mhofman
Copy link
Member Author

mhofman commented Aug 24, 2023

Propagating a "flow id" from delivery to syscalls could be done today before we add per vat queues. The hardest part is propagation through timer wakeups, but that might be mitigated in part through promise subscriptions.

This is also likely related to adding messageIds when performing send/notify to associate them to the resulting delivery, although in the other direction: #6501

@mhofman
Copy link
Member Author

mhofman commented Oct 24, 2023

The mechanism described here does not define how the host should implement its prioritization. It simply adds a dimension to the information available to the host to make scheduling decisions.

Discussing scheduling with @dtribble and @zmanian, one idea that was floated was to allow the block proposer to decide which activity/flow would be executed during the block. A combination of newer cosmos-sdk/CometBFT and of Skip SDK would allow the host to model pending swingset activities as one of "lanes" that get included in a block alongside new activities from txs included in the block. The motivation is that the block proposer may have information regarding which activity may be more important to execute.

One approach would be for the block proposer to decide how many computrons get allocated for each activity, possibly with a prioritization between activities (strict or not). Then the scheduler would pick deliveries from queues based on which activity the delivery is associated to, and keep track of the accumulated computron usage of each activity. How the scheduler, cosmic-swingset and swingset interface together is still TBD within these new requirements, as long as it allows tracking the computron usage of activities. It would likely require modeling new txs as new "queues" with a single delivery representing the starting action, unless there is a strict prioritization between activities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request needs-design performance Performance related issues SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

3 participants