-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mark some vats as critical, panic the kernel if one dies #4279
Labels
audit-zestival
Vulnerability assessment of ERTP + Zoe
enhancement
New feature or request
SwingSet
package: SwingSet
Milestone
Comments
Assigning to @mhofman , I think this might overlap with pausing vats on meter underflow. |
Moving this to @FUDCo , the crash-or-not aspect is independent of the ability to pause vats. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
audit-zestival
Vulnerability assessment of ERTP + Zoe
enhancement
New feature or request
SwingSet
package: SwingSet
What is the Problem Being Solved?
Currently, the only way for userspace code to panic the kernel is if the
bootstrap()
message (to the bootstrap vat) rejects its result promise. The kernel panics if a device throws an error, but devices are not really user code. The kernel also has a bunch of invariant checks that can cause a panic, but those are all internal and should not be provokable by userspace code.However, in any given deployment, there are probably some userspace vats which are considered so critical that when they malfunction, we'd rather have the overall application crash than have the kernel kill off just the faulty vat. The comms and vat-vattp vats always fall into this category. In the Agoric chain, vat-zoe and all the vats that make up the token economy are chain-critical. It will be easier to recover a fault by halting the chain, than to have the kernel delete vat-zoe and reap all of the objects it has ever exported.
So the requirement is for some vats to be marked as "critical". If the kernel ever finds a reason to kill that vat (illegal syscall, metering fault, explicit
syscall.terminate
), the kernel should panic instead. This should causecontroller.run()
to reject its return promise instead of fulfilling it, and the host application should react by exiting immediately rather than committing the state changes.Description of the Design
The
vatAdminService~.createVat()
call should accept acritical
option. Iftrue
, the kernel should panic instead of killing this vat. The kernel config object should also accept this flag, for static vats.This will go into the
managerOptions
. Inkernel.js
, when a vat is about to be killed, it should check the flag, andpanic()
if true (with some details about which vat triggered the panic).For now, we'll allow any call to
createVat
to set this flag, meaning that any code which can create vats can also panic the kernel. vat-zoe is the only chain-side vat which gets access tovatAdminService
, and the power to panic the kernel is comparable to the authority to create unmetered vats, whichvatAdminService
also provides. Later, if we find a use case for it, we can have vatAdminService produce less-powerful facets. Or, riffing on an approach @erights suggested for Meters, the service could provide a specialcriticality
object, and the only way to mark a vat as critical is to pass amanagerOptions
of{ critical: criticalityObject }
, in a rights-amplification pattern that needs access to bothvatAdminService
andcriticality
. But for now, only zoe has access tovatAdminService
, so we don't need to build anything too fancy.Security Considerations
Building a deliberately-crashing vat and marking it
critical
is a kernel-killing power, so we should be conscious how it is disseminated.We should think carefully about potential bugs in critical vats and decide whether halting the host application really is the best approach. If multiple services can be severed from one another, then allowing some portion of a chain to keep running even though others are dead might be better. We should war-game a bug in a core service somehow.
Test Plan
Unit tests
The text was updated successfully, but these errors were encountered: