Add initial PVM test vectors #3

koute · 2024-06-24T14:24:56Z

Initial PVM test vectors/test suite.

This is still incomplete; not every instruction is covered yet and only very simple test cases were added. I will be expanding this aggressively.

Since we will still be making some changes (e.g. 64-bit support) I'll be explicitly versioning this, with a detailed changelog so that anyone who uses these tests can easily keep up.

gavofyork · 2024-06-25T05:36:46Z

pvm/README.md

+* `name` -- a unique identifier for the test
+* `initial_regs` -- the initial value of each of the 13 registers; these need to be set *before* the test program is executed
+* `code` -- the code blob of the program to be executed as part of the test
+* `expected_status` -- the way the program is supposed to end; currently it can be one of the following:
+   - `"trap"` -- the execution ended with a trap (the `trap` instruction was executed, the execution went "out of bounds", an invalid jump was made, or an invalid instruction was executed)
+   - `"halt"` -- the execution finished gracefully (a dynamic jump to address `0xffff0000` was made)
+* `expected_regs` -- the expected values of each of the 13 registers *after* the test program is executed


Would be nice if these could be done to the level of the Safrole test vectors - i.e. referencing the relevant symbols and equations in the GP and providing a good ASN definition.

Specifically, it looks like these test vectors are for (207) Psi, but omit the arguments and results for gas, memory and initial instruction counter.

Would be nice if these could be done to the level of the Safrole test vectors - i.e. referencing the relevant symbols and equations in the GP and providing a good ASN definition.

Sure, I can do that.

Specifically, it looks like these test vectors are for (207) Psi, but omit the arguments and results for gas, memory and initial instruction counter.

Yes. The omissions were intentional, since no tests use them right now. This is only temporary of course.

The test writes a zero here, so make sure the register initially contains something *other* than a zero so that we can actually see that it was modified.

sourabhniyogi · 2024-07-08T01:01:23Z

@koute Can you kindly add test cases for the host functions of Appendix B.6/B.7/B.8?

Here are 4 groups, in priority order:

LOOKUP, READ, WRITE, SOLICIT, HISTORICAL_LOOKUP, IMPORT, EXPORT [7]
NEW, MACHINE, PEEK, POKE, INVOKE, TRANSFER [6]
QUIT, INFO, GAS, CHECKPOINT, FORGET [5]
EMPOWER, DESIGNATE, ASSIGN, UPGRADE, EXPUNGE [5]

The first group is DA-centric, the second group is service+VM setup/invocations -- the first 2 groups are valuable to connect to code up connections to state merklization + erasure coding, whereas the latter 2 groups can be done later as they are bookkeeping oriented and are easy to get right once we solve the first 2 groups.

Thank you!

ec2 · 2024-07-08T21:30:17Z

How are these programs supposed to be consumed? The program blob parse expects things like the program to start with BLOB_MAGIC.

koute · 2024-07-09T07:36:09Z

Can you kindly add test cases for the host functions of Appendix B.6/B.7/B.8?

For these initial test vectors the priority is to get basic tests for the instruction set ready. I will also add some host call tests later, but comprehensive test suite for all host calls is probably out-of-scope, at least for pure PVM tests.

How are these programs supposed to be consumed?

Take a look at the schema to see to which parameters of the Ψ equation from the Gray Paper they correspond to, and use them accordingly to test your own PVM implementation.

The program blob parse expects things like the program to start with BLOB_MAGIC.

Yes, my PolkaVM uses its own container format for the program blobs which the GP doesn't use. PolkaVM is not the source of truth for how a PVM should work, the GP is.

ec2 · 2024-07-09T17:49:45Z

@koute Good point on the GP being the source of truth.

Take a look at the schema to see to which parameters of the Ψ equation from the Gray Paper they correspond to, and use them accordingly to test your own PVM implementation.

For context I'm building FFI bindings to PolkaVM. So would you say that these tests in particular are for folks who are implementing the PVM from scratch?

sourabhniyogi · 2024-07-13T05:03:30Z

Since we will still be making some changes (e.g. 64-bit support) I'll be explicitly versioning this, with a detailed changelog so that anyone who uses these tests can easily keep up.

We passed all the test vectors you provided so far. What is the reason for the GP needing to support 32-bit registers while the contracts pallet should definitely aim for 64-bit? After you have 64-bit PVM engineered for contracts pallet shouldn't the GP be adjusted to be 64-bit?

koute · 2024-07-13T08:30:21Z

So would you say that these tests in particular are for folks who are implementing the PVM from scratch?

Yes.

What is the reason for the GP needing to support 32-bit registers while the contracts pallet should definitely aim for 64-bit?

This is only temporary. We will be migrating GP to 64-bit too; we just need to first prototype the design in PolkaVM to make sure it's solid. (Otherwise we might end up with a design that looks good on paper but is bad in practice.) We're working on it right now.

(That said, the changes when migrating to 64-bit won't be huge - the registers will be extended and there will be a couple of new instructions, but that's about it as far as major changes go.)

As far as the instruction set and the core semantics are concerned, we aim to have both PolkaJAM and pallet-contracts in alignment and we're making effort to make sure they don't diverge. (With PolkaJAM having the priority here, but I believe we can support both with the same VM.)

ec2 · 2024-07-17T22:05:23Z

In GP(A.1), the program is defined as follows:

E(|c|) is the SCALE compact integer encoding on the length of c. So in this test case, |c| should be 3 (8, 135, 9), so shouldn't E(|c|) be [12] instead of 3?

Edit: Looks like I misunderstood |k| = |c|. It seems like this is talking about the bit length of the mask being equal to the the byte length of the instructions rounded to 8. Encoding question still stands though.

koute · 2024-07-18T09:12:49Z

@ec2 Where did you read that these are SCALE compact integers? These are not SCALE compact integers. From the GP:

If you look at this equation and crosscheck it with how parity-scale-codec encodes compact integers you can see that they're not the same.

This is a slightly different varint serialization format which:

only supports up to 64-bits,
is more efficient to decode,
encodes numbers which fit within 7-bits as if raw-encoded,
is uniform and more compact when dealing with small numbers (SCALE compact encoding uses one more byte to encode integers between 64..128 and 16384..2097152, but needs one less byte to encode integers between 268435456..1073741824).

ec2 · 2024-07-18T09:20:22Z

@koute GP(Appendix I.3) says that E() is the SCALE encode function. I also did see the screenshot you posted from the GP.

I'm not super familiar with SCALE so I assumed that the screenshot just formally describes how SCALE does variable int encoding.

ec2 · 2024-07-19T21:00:25Z

@koute Sorry to keep hounding you here! I think I found a discrepancy between the testcases and the GP.
The test case inst_move_reg.json tests move_reg (opcode 82).

According to the GP,

In the test, the supplied arg to move_reg is [121]. And so, r_A = min(12, 121%16) = 9 and r_D = min(12, 121/16) = 7. So the mutation will end up being reg[7] = reg[9].

The test case has only initial-regs[7] set to 1 and 0 elsewhere. And so the expected mutation is reg[7] = reg[9] = 0.

TLDR: I think the impl of PVM that made these test cases have the arguments for move_reg flipped.

sourabhniyogi · 2024-07-24T16:08:56Z

I will also add some host call tests later, but comprehensive test suite for all host calls is probably out-of-scope, at least for pure PVM tests.

Alright, we don't want to interrupt your deep work but legend has it you implemented PVM in a day =) so if its not too much to ask ... could you give us the simplest "Jam Service" byte code (for a refine+accumulate) for us to implement many of the basic host functions? Given one good example we can probably fill in the rest and provide a few more back.

My idea of the simplest "Jam Service" byte code is to compute the sum of squares for a set of integer work items, like
Work Items in a Work Package: 5, 7, 9
Refine: squares the work items, exports 25, 49, 81
Accumulate: reads the result of refine ( 25, 49, 81 ) and writes to a service's storage
We can attempt to build the byte code by hand like it is 1964 but maybe you already have something "simple" like this that you can share?

If not, do you have a better recommendation for simplest "Jam Service"? Or, a strategy that is better than hand building byte code?

This sort of baby JAM test case will help teams get baby JAM implementations blood flowing, and set up a low V (like V=6) cluster complete with QUIC, erasure coding, Patricia Merkle Trie, BMT proofs, and so on.

xlc · 2024-07-30T02:13:49Z

I am confused about trap vs halt vs panic in PVM. In GP, the trap instruction will exit with the black square, so does the jump to 2^32-2^16 address. To my understanding, that is exit 0. But in the pvm testvector, trap is panic and the test of the trap instruction will result trap exit status but the inst_ret_halt test results halt. There is some inconsistency.

jamtestvectors/pvm/schema.asn

Lines 53 to 55 in a2b1870

    
           -- (called "panic" in the Graypaper) 
        
           -- the execution ended with a trap (the `trap` instruction was executed, the execution went "out of bounds", an invalid jump was made, or an invalid instruction was executed) 
        
           trap,

Another question. jump_ind is using djump which can be used to exit the program. But how about the one using branch? e.g. jump. What happen to jump into the exit address? panic or halt?

koute · 2024-07-30T07:43:25Z

Sorry to keep hounding you here! I think I found a discrepancy between the testcases and the GP.

@ec2 Yes, indeed, there is. We will fix it soon. Thanks! We highly appreciate anyone who helps crosscheck these.

Alright, we don't want to interrupt your deep work but legend has it you implemented PVM in a day =) so if its not too much to ask ... could you give us the simplest "Jam Service" byte code (for a refine+accumulate) for us to implement many of the basic host functions?

@sourabhniyogi The rumors of my exploits seem to be grossly exaggerated; it was actually two days, not one. :P

Anyway, we will most likely put up some more tests out in the future, but for now if you quickly want something to test with then your best bet would be to build one yourself.

You don't have to build a blob by hand; you could use my work-in-progress PVM assembler. For example:

$ git clone https://github.com/koute/polkavm.git
$ cd polkavm
$ cargo run -p polkatool -- assemble tools/spectool/spec/src/inst_branch_greater_or_equal_signed_ok.txt -o output.polkavm
$ cargo run -p polkatool disassemble --show-raw-bytes output.polkavm

This will output the program in a PolkaVM-specific container (which is not part of the GP), but you can extract the code blob with a simple Rust program - use polkavm_common::program::ProgramParts::from_bytes to load the blob and then the code_and_jump_table field will have the raw program bytes.

I am confused about trap vs halt vs panic in PVM. In GP, the trap instruction will exit with the black square, so does the jump to 2^32-2^16 address.

@xlc

"halt" is meant to be a normal termination (dynamic jump to 0xffff0000)
"panic" (called a "trap" here) is meant to be an abnormal termination

Hm, you're right that the trap instruction in the GP is specified to halt instead of panicking; this should have been a panic instead. I'll see about correcting this; thanks.

clw8998 · 2024-09-03T02:10:07Z

Hello @koute , I recently encountered some issues while using your PVM.

Here’s my code:

pub @main:
    a1 = 0

When I use the following command to compile:

cargo run -p polkatool -- assemble ./test_txt_code/test.txt -o test.pvm

The bytecode content of test.pvm is as follows:

[80, 86, 77, 0, 1, 5, 7, 1, 0, 4, 109, 97, 105, 110, 6, 6, 0, 0, 2, 4, 8, 253, 0]

My question is, how do I extract the pure program portion as defined in GP_0.36(213), because it seems the first part contains some ASCII-encoded section names.

ASCII encoded section name:

[80, 86, 77, 0, 1, 5, 7, 1, 0, 4, 109, 97, 105, 110, 6, 6]
// [80, 86, 77] "PVM" in ASCII
// [109, 97, 105, 110] "main" in ASCII

GP_0.36(213) should be:

[0, 0, 2, 4, 8, 253, 0]

clw8998 · 2024-09-03T02:54:49Z

Also found some weird encoding results.

Missing operend:

txt code:

pub @main:
    a1 = 0

after assembly:
It should put 0 into reg 8, but program counter at 0 missing 0

    Finished dev [unoptimized + debuginfo] target(s) in 0.86s
     Running `target/debug/polkatool disassemble --show-raw-bytes test.pvm`
// RO data = 0/0 bytes
// RW data = 0/0 bytes
// Stack size = 0 bytes

// Instructions = 1
// Code size = 2 bytes

      :                          @0 [export #0: 'main']
     0: 04 08                    a1 = 0x0

Jump to a weird position:

txt code:

pub @main:
    @sub_1:
    a0 = 1
    a1 = 0xFEFE0000
    a2 = 12
    ecalli 16
    a1 = 2
    jump @init if a0 != a1

    @sub_2:
    a1 = u32[0xFEFE0004]
    a2 = u32[0xFEFE0008]
    u32[0xFEFE0004] = a1
    a1 = a1 + a2
    u32[0xFEFE0008] = a1

    a3 = 1
    a0 = u32[0xFEFE0000]
    a0 = a0 + a3

    @sub_3:
    a0 = 0xFEFE0000
    a1 = 12
    ecalli 17
    trap

    @init:
    u32 [0xFEFE0000] = 0x00000001
    jump @sub_2

after assembly:
Program counter at 17 jump into 55(0x37), which is part of an operend.
Program counter at 79 jump into 197(0xc5), which is OOB.

    Finished dev [unoptimized + debuginfo] target(s) in 1.19s
     Running `target/debug/polkatool disassemble --show-raw-bytes test.pvm`
// RO data = 0/0 bytes
// RW data = 0/0 bytes
// Stack size = 0 bytes

// Instructions = 21
// Code size = 81 bytes

      :                          @0 [export #1: 'main'] [export #2: 'sub_1']
     0: 04 07 01                 a0 = 0x1
     3: 04 08 00 00 fe fe        a1 = 0xfefe0000
     9: 04 09 0c                 a2 = 0xc
    12: 4e 10                    ecalli 16 // INVALID
    14: 04 08 02                 a1 = 0x2
    17: 1e 87 37                 jump 72 if a0 != a1
      :                          @1 [export #3: 'sub_2']
    20: 0a 08 04 00 fe fe        a1 = u32 [0xfefe0004]
    26: 0a 09 08 00 fe fe        a2 = u32 [0xfefe0008]
    32: 16 08 04 00 fe fe        u32 [0xfefe0004] = a1
    38: 08 98 08                 a1 = a1 + a2
    41: 16 08 08 00 fe fe        u32 [0xfefe0008] = a1
    47: 04 0a 01                 a3 = 0x1
    50: 0a 07 00 00 fe fe        a0 = u32 [0xfefe0000]
    56: 08 a7 07                 a0 = a0 + a3
    59: 11                       fallthrough
      :                          @2 [export #4: 'sub_3']
    60: 04 07 00 00 fe fe        a0 = 0xfefe0000
    66: 04 08 0c                 a1 = 0xc
    69: 4e 11                    ecalli 17 // INVALID
    71: 00                       trap
      :                          @3 [export #0: 'init']
    72: 26 04 00 00 fe fe 01     u32 [0xfefe0000] = 1
    79: 05 c5                    jump 20

koute · 2024-09-03T05:19:53Z

My question is, how do I extract the pure program portion as defined in GP_0.36(213), because it seems the first part contains some ASCII-encoded section names.

That's a .polkavm container. It's not and won't be part of the GP (see this comment of mine where I explain different types of program blobs). If you want to extract a pure code blob you either need to parse it (the format is trivial; see my parsing code for more details) or use my polkavm crate and call ProgramParts::from_bytes and access it in code_and_jump_table field of that struct.

It should put 0 into reg 8, but program counter at 0 missing 0

No, it's not missing. It's just a zero length varint, and that is expected behavior. To save space a trailing varint in an instruction doesn't have to be encoded if it's zero.

Side note: this instruction can be encoded in multiple ways, for example:

04 08
04 08 00
04 08 00 00
04 08 00 00 00
04 08 00 00 00 00

All of these encodings are valid and encode to the same instruction a1 = 0.

Program counter at 17 jump into 55(0x37), which is part of an operend.
Program counter at 79 jump into 197(0xc5), which is OOB.

You're incorrectly parsing the jump destinations as absolute, but they are encoded relative. (Notice: 17 + 55 = 72)

EclesioMeloJunior · 2024-09-03T20:47:58Z

hey @koute, I've noticed the inst_branch_greater_or_equal_signed_imm_nok test is not passing, then after debugging it looks like the program is correct but the test expects a trap at PC 7 which is not happening bc the branch_ge_s_imm is true, lemme describe here:

Here is the inst_branch_greater_or_equal_signed_imm_nok program:

"program": [0,0,14,4,7,246,45,23,10,5,0,4,7,239,190,173,222,137,193],

where the actual code is:

[4,7,246,45,23,10,5,0,4,7,239,190,173,222]

breaking down the code we have:

// load_imm: reg (7), imm (246)
[4, 7, 246] 

// branch_ge_s_imm: reg (7), imm (10), offset (5)
[45,23,10,5]

// when we compare the value at reg 7 is 246 which is greater than the imm value 10
// then it skips the trap and proceed to
// load_imm: reg (7), imm (3735928559)
[4,7,239,190,173,222]

So, the expected value: 4294967286 is different from the actual 7th register (3735928559) and also the expected pc is 7 (which is the offset where the trap is placed) but the actual PC is 14 (end of the code). So I would like to double check with you'll, thanks!

tomusdrw · 2024-09-03T21:03:50Z

@EclesioMeloJunior You can check out the disassembler/debugger we've put together here https://pvm.fluffylabs.dev/

Recently polkavm support was merged, so it might help you debug the exact problem you're having - obviously don't take it as a source of truth - that should only be the Gray Paper.

I think the issue is with how you interpret the bytes 246 (0xf6), it's a compact signed encoding, and you should end up with 4294967286 (0xfffffff6) in the register.

koute · 2024-09-04T06:27:38Z

@EclesioMeloJunior: @tomusdrw is correct; you're not properly sign extending the value from load_imm, and I've deliberately engineered this test case to catch this.

Remember that the varints are always sign extended to full 32-bits, that is: if the most significant bit of the value (as it is encoded) is 1 then all of the bits "to the left" in the decoded value are also filled with 1s.

Here are some examples:

7a encoded (122 in decimal, 01111010 in binary) would get decoded as 7a (00000000_00000000_00000000_01111010 in binary)
fa encoded (250 in decimal, 11111010 in binary) would get decoded as ffffff7a (4294967226 in decimal, 11111111_11111111_11111111_11111010 in binary)
fa 00 (00000000_11111010 in binary, remember these are serialized as little endian hence the least significant bytes are encoded first) would get decoded as fa (no sign extension because the most significant bit is 0)

emielsebastiaan · 2024-09-12T07:35:09Z

Most testvectors should be fixed (are currently invalid) due to a strict decoding issue of k (fixed length bitsequence).
#13

Polkadot-Forum · 2024-09-24T14:42:54Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/contracts-on-assethub-roadmap/9513/25

…ue of reg

charliewinston14 · 2024-11-04T02:23:03Z

Hi. How do the signed values work?

For example in "inst_div_signed"?

rA=8 (value: 2147483664)
rB=7 (value: 7)
rD=9

How is register 9 expected to be 3988183920?

koute · 2024-11-04T05:22:18Z

How do the signed values work?

I recommend reading this article on Wikipedia.

How is register 9 expected to be 3988183920?

It's not. It's expected to be -2147483632 / 7 = -306783376.

Add initial PVM test vectors

e285daa

gavofyork reviewed Jun 25, 2024

View reviewed changes

koute added 12 commits June 25, 2024 17:18

Pre-clobber the output reg in inst_div_unsigned_with_overflow

19d2ad8

The test writes a zero here, so make sure the register initially contains something *other* than a zero so that we can actually see that it was modified.

Replace underscores with hypens, because ASN.1 doesn't like underscores

9ba4b5b

Add JSON Schema and ASN.1 schema

e89f6b7

Add comments to the ASN.1 schema, reference the paper, update README

96025c7

Add the instruction/program counter to the tests

1d43b7d

Correct the expected PC for when the program halts

9b35cf4

Add tests involving memory

f4c9f3f

Remove accidental OPTIONAL from the ASN.1 schema

dbbfbd4

Add gas

3e18a7f

Rename code -> program

e267a75

Fix some typos in the ASN.1 schema

5581280

Align reg3 serialization with the GP and add extra testcases

a2b1870

pantrif mentioned this pull request Aug 1, 2024

Data items serialization eigerco/strawberry#6

Closed

asmie mentioned this pull request Aug 12, 2024

PVM interpreter implementation eigerco/strawberry#39

Closed

13 tasks

koute added 8 commits October 4, 2024 19:42

PVM tests v0.2: add 'invalid' to disassemblies to mark implicit traps

0a4c99b

PVM tests v0.2: bitmask paddings are now filled with zeros

8659541

PVM tests v0.2: inst_rem_signed: make output non-zero

fb4de8a

PVM tests v0.2: inst_rem_signed_with_overflow: non-zero initial val…

fa6cf0a

…ue of reg

PVM tests v0.2: inst_set_*: non-zero initial value of output reg

83596b5

PVM tests v0.2: add more load/store tests

51e134e

PVM tests v0.2: update README

30ba85a

PVM tests v0.3: remove tests which expose gas precharging

a54d7cf

mateuszsikora mentioned this pull request Oct 11, 2024

Adjust the code to the newest PVM version and changes in test vectors FluffyLabs/pvm-debugger#109

Merged

davxy mentioned this pull request Nov 19, 2024

M1 Conformance #21

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial PVM test vectors #3

Add initial PVM test vectors #3

koute commented Jun 24, 2024 •

edited

Loading

gavofyork Jun 25, 2024 •

edited

Loading

koute Jun 25, 2024

sourabhniyogi commented Jul 8, 2024

ec2 commented Jul 8, 2024

koute commented Jul 9, 2024

ec2 commented Jul 9, 2024

sourabhniyogi commented Jul 13, 2024

koute commented Jul 13, 2024

ec2 commented Jul 17, 2024 •

edited

Loading

koute commented Jul 18, 2024

ec2 commented Jul 18, 2024

ec2 commented Jul 19, 2024

sourabhniyogi commented Jul 24, 2024

xlc commented Jul 30, 2024

koute commented Jul 30, 2024

clw8998 commented Sep 3, 2024

clw8998 commented Sep 3, 2024

koute commented Sep 3, 2024

EclesioMeloJunior commented Sep 3, 2024

tomusdrw commented Sep 3, 2024

koute commented Sep 4, 2024

emielsebastiaan commented Sep 12, 2024

Polkadot-Forum commented Sep 24, 2024

charliewinston14 commented Nov 4, 2024 •

edited

Loading

koute commented Nov 4, 2024

Add initial PVM test vectors #3

Are you sure you want to change the base?

Add initial PVM test vectors #3

Conversation

koute commented Jun 24, 2024 • edited Loading

gavofyork Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

koute Jun 25, 2024

Choose a reason for hiding this comment

sourabhniyogi commented Jul 8, 2024

ec2 commented Jul 8, 2024

koute commented Jul 9, 2024

ec2 commented Jul 9, 2024

sourabhniyogi commented Jul 13, 2024

koute commented Jul 13, 2024

ec2 commented Jul 17, 2024 • edited Loading

koute commented Jul 18, 2024

ec2 commented Jul 18, 2024

ec2 commented Jul 19, 2024

sourabhniyogi commented Jul 24, 2024

xlc commented Jul 30, 2024

koute commented Jul 30, 2024

clw8998 commented Sep 3, 2024

clw8998 commented Sep 3, 2024

Missing operend:

Jump to a weird position:

koute commented Sep 3, 2024

EclesioMeloJunior commented Sep 3, 2024

tomusdrw commented Sep 3, 2024

koute commented Sep 4, 2024

emielsebastiaan commented Sep 12, 2024

Polkadot-Forum commented Sep 24, 2024

charliewinston14 commented Nov 4, 2024 • edited Loading

koute commented Nov 4, 2024

koute commented Jun 24, 2024 •

edited

Loading

gavofyork Jun 25, 2024 •

edited

Loading

ec2 commented Jul 17, 2024 •

edited

Loading

charliewinston14 commented Nov 4, 2024 •

edited

Loading