Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting external PVMs #81

Closed
tomusdrw opened this issue Aug 20, 2024 · 5 comments
Closed

Supporting external PVMs #81

tomusdrw opened this issue Aug 20, 2024 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@tomusdrw
Copy link
Contributor

tomusdrw commented Aug 20, 2024

We would love to support other PVM implementations than typeberry.

If you'd like to be listed in the select box on the PVM disassembler page, please add a comment in this issue.

The idea is to compile the PVM to WASM (if possible) and expose a common interface that is yet to be fully defined (we are open for discussion).

The proposed interface for now is:

interface PvmMinimal {
  /**
   * Re-initialize the PVM with given generic PVM program.
   *
   * This function is optional. The support is indicated in the metadata.
   * 
   * Note: memory initialisation is deliberately missing for now. A good format for this is TBD.
   */
  resetGeneric(program: Uint8Array, registers: Uint8Array, gas: i64): void;
  /**
   * Returns the current program counter of PVM
   */
  getProgramCounter(): u32;
  /**
   * Set next program counter that will be applied on the next step.
   * This is needed to control the entry point.
   */
  setNextProgramCounter(pc: u32): void;
  /**
   * Get the current status of PVM
   * 
   * Status {
   *  Ok = 255,
   *  Halt = 0,
   *  Panic = 1,
   *  Fault = 2,
   *  Host = 3,
   *  OutOfGas = 4,
   * }
   */
  getStatus(): u8;
  /**
   * Return an exit argument in case the PVM stopped with one.
   * 1. In case of a page fault this is going to be the address,
   * 2. In case of a host call this will be the host call index.
   */
  getExitArg(): u32;
  /**
   * Return gas left.
   */
  getGasLeft(): i64;
  /**
   * Set how much gas is left (e.g. after returning from a host call).
   */
  setGasLeft(gas: i64);
  /**
   * Return registers dump.
   * 
   * We expect 13 values, 4 bytes each, representing the state of all registers as a single byte array.
   */
  getRegisters(): Uint8Array;
  /**
   * Perform a single step of PVM execution.
   *
   * Returns false when the machine cannot make any more progress (i.e. it halted, panicked, or went out of gas). 
   */
  nextStep(): boolean;
  /**
   * Returns a fixed-length page of memory.
   * 
   * it's up to the implementation to decide if this is going to return just a single memory cell,
   * or a page of some specific size.
   * The page sizes should always be the same though (i.e. the UI will assume that if page 0 has size `N`
   * every other page has the same size).
   */
  getPageDump(pageIndex: u32): Uint8Array;
}
/** Each function from this set is optional and may or may not be supported by a PVM. */
type PvmOptionals {
   * Re-initialize the PVM with given generic PVM program and memory layout.
   *
   * This function is optional. The support is indicated in the metadata.
   * 
   * PageMap: `sequence(tuple(address: u32 , length: u32))`
   * Chunks: `sequence(tuple(address: u32, length: u32, data: sequence(u8)))`
   */
  resetGenericWithMemory(program: Uint8Array, registers: Uint8Array, pageMap: Uint8Array, memoryChunks: Uint8Array, gas: i64): void;
  /**
   * Re-initialize the PVM with given PVM program for JAM in Standard Program Initialisation container format.
   * 
   * This function is optional. The support is indicated in the metadata.
   */
  resetJAM(program: Uint8Array, gas: i64): void;
  /**
   * Re-initialize the PVM with given PVM program in PolkaVM container.
   * 
   * This function is optional. The support is indicated in the metadata.
   */
  resetPolkaVM(program: Uint8Array, gas: i64): void;
  /**
   * Overwrite all registers.
   * 
   * We expect 13 values, 4 bytes each, representing the state of all registers as a single byte array.
   */
  setRegisters(registers: Uint8Array): void;
  /**
   * Write a bunch of bytes to the memory.
   * 
   * The data should fit on one page - in case it spans multiple pages it's UB.
   */
  setMemory(address: u32, data: Uint8Array): void;
  /**
   * Return all page indices that have values that were explicitly set by the program.
   * Each memory page index should be represented by 4 bytes.
   */
  getDirtyMemoryPages(): Uint8Array
}
type Metadata = {
  name: string;
  version: string;
  capabilities: {
     resetJAM: boolean;
     resetPolkaVM: boolean;
     resetGeneric: boolean;
     resetGenericWithMemory: boolean;
  },
  wasmBlobUrl: string
}

I imagine that the teams will provide an URL for the JSON file with metadata. That file will be fetched by the UI at start to decouple deployment process of PVM implementations and the UI.

@tomusdrw tomusdrw added the documentation Improvements or additions to documentation label Aug 20, 2024
@tomusdrw
Copy link
Contributor Author

Example Rust version of this API can be found here: https://github.com/FluffyLabs/pvm-shell/blob/main/src/lib.rs#L54

I think we might consider changing it to something like:

const pointer = newPvm(program, registers, gas);
const gasLeft = getGasLeft(pointer);

We should also consider returning pointers to WASM memory for getRegisters and especially getPageDump to avoid passing too much data between WASM and the browser.

@koute
Copy link

koute commented Aug 22, 2024

This is pretty cool!

I would be interested in officially adding my PolkaVM to it.

API-wise, maybe you could also take a look at my API for inspiration.

Few notes:

  • If you want to also support running JAM program blobs (the standard program initialization from the GP) I'd suggest making a separate initialization entry point (so maybe have resetJam and resetGeneric, or something like that?). Otherwise there's no clean way to differentiate between a JAM program blob and a raw PVM code blob, since neither have e.g. any magic bytes to easily check which one is which. (You could try to parse both and see whichever parses without error, but that's a little janky.)
  • The JAM-specific entry point should probably be optional. (In case of PolkaVM it's meant to essentially be a general-purpose VM, which means that strictly JAM-specific parts of it will live inside our JAM node, and that isn't currently public. Although I guess I could probably add some shim code here to support it anyway, as it isn't a lot of code.)
  • Are you interested in things like e.g. supporting debug info? JAM-specific program blobs are meant to be as minimal as possible, which means that they inherently can't and won't support things like debug info.
  • Currently there are essentially three different "things" out there which you could conceivably call a "PVM program":
    • raw PVM code blob -- essential only contains the code of the program; if you're familiar with WASM this is more-or-less equivalent to the "code" section from a .wasm blob without the rest of the .wasm
    • a JAM program blob -- this contains the raw PVM code blob, plus a few extra minimal bits, and it expects to be initialized and called in a certain way; in WASM terms this is like a .wasm blob where the "code" section can be changed and the rest of the sections are mostly hardcoded
    • a PolkaVM program blob -- in WASM terms this is like more-or-less equivalent to a .wasm blob, but simplified; it's essentially a superset of a JAM program blob, it's general purpose, and designed to support things like e.g. debug info and other use cases with minimum extra complexity (e.g. our upcoming smart contracts will probably just use this)

@tomusdrw
Copy link
Contributor Author

Hey @koute! Thanks for the write-up.

It would be great to support PolkaVM and all of the other possible use cases you've mentioned. Let me address some things specifically.

  1. Your API looks definitely like something we would be striving for long-term. Initially my goal was to make the API as minimal as possible to make it easier for external teams to get integrated. I think we can even consider multiple levels of integration, where some PVMs would support just the basic API and some others a more complex one (I think we will end up with some metadata file about a particular PVM).
  2. Thank you for pointing out the different possible PVM blobs. I think it would be great to support all of the in the UI, however I agree that we don't necessarily need all PVM intepreters to support all of them (again something that could land in metadata).
  3. Supporting debug info would be amazing, I'd say that if the tool proves to be useful we would be happy to take a shot at implementing that support. We lack a bit of knowledge and experience on this one though, so any pointers would be great.
  4. The vision of being able to debug Solidity code right from your browser is staggering. For this to be actually useful I think we would need to be able to emulate different execution environments coming with their own set of host functions (and for instance having an in-browser storage that can be populated from on-chain data).

I've updated the interface code to encompass the different blob kinds. The API currently is well suited for wasm_bindgen-like output, I think it is going to become a bit more raw (i.e. passing pointers instead of Uint8Array) to make it easier for other implementations. I'm planning to get in touch with teams writing the JAM PVM in Go to figure out what output they can provide.

@koute
Copy link

koute commented Aug 23, 2024

3. Supporting debug info would be amazing, I'd say that if the tool proves to be useful we would be happy to take a shot at implementing that support. We lack a bit of knowledge and experience on this one though, so any pointers would be great.

In general the easiest thing here would most likely be to piggyback on PolkaVM's crates compiled to WASM (at very least until I can get the PolkaVM program blob format somewhat standardized like it is for WASM).

The main two types of interest are ProgramBlob and ProgramParts. These are, essentially, mostly equivalent, except a ProgramParts is just a ProgramBlob split into parts.

So, the bare minimum to do to be able to load PolkaVM blobs would be something like this:

#[wasm_bindgen]
pub fn polkavm_to_code_blob(raw_blob: Vec<u8>) -> Vec<u8> {
    let parts = polkavm::ProgramParts::from_bytes(&raw_blob).unwrap();
    return parts.code_and_jump_table.to_vec();
}

This will give you a raw PVM code blob which you can already ingest.

Now, to get debug info working you'd have to use ProgramBlob::parse or ProgramBlob::from_parts to create a ProgramBlob, keep the ProgramBlob around, and then you can use get_debug_line_program. You give the function a program counter/byte offset into the code, and it will return you an iterator which produces FrameInfo structs, which in turn tell you the function name and/or the source path/line of where the given piece of code comes from. (So if you'd display the source code side-by-side you can use this to make a source-level debugger.)

Currently the debug info support is limited to being able to extract the locations of the code in the original sources, but I'm also planning to add support for getting backtraces and also for reading/writing to local variables, etc. (Basically I want to support full blown rich debugging experience.)

@tomusdrw
Copy link
Contributor Author

tomusdrw commented Sep 3, 2024

It's now possible to load wasm-bindgen compatible WASM blob (either via URL pointing to the metadata JSON or via direct upload of WASM file) #94.

We've also added PolkaVM to the dropdown list as one of the default choices: #99.
PolkaVM is compiled from https://github.com/tomusdrw/polkavm/blob/master/pvm-shell/src/lib.rs
The original, koute's polkaVM, is a submodule in that repo and we just plug it into the pvm-shell API.

I've extracted support for debug symbols to a separate issue #100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants