Skip to content
This repository has been archived by the owner on Nov 3, 2021. It is now read-only.

Finalize opcode encodings #29

Closed
lars-t-hansen opened this issue Feb 13, 2019 · 10 comments
Closed

Finalize opcode encodings #29

lars-t-hansen opened this issue Feb 13, 2019 · 10 comments

Comments

@lars-t-hansen
Copy link
Contributor

As part of the effort to drive both this proposal and the bulk memory proposal toward shipping status, let's nail down the opcode encodings. (The bulk memory proposal depends on what we choose for ref.null and ref.func, since those are used to express passive element segments.)

The spec interpreter in this repo has some TODO comments around the opcode encodings, and some opcodes are missing from the interpreter at present, and all proposed opcodes are precious single-byte ones.

From the interpreter in this repo we have:

0x25 == table.get
0x26 == table.set
0xd0 == ref.null
0xd1 == ref.is_null
0xd2 == ref.func

From the bulk memory proposal we have these proposed codes:

0xfc 0x08 == memory.init
0xfc 0x09 == data.drop
0xfc 0x0a == memory.copy
0xfc 0x0b == memory.fill
0xfc 0x0c == table.init
0xfc 0x0d == elem.drop
0xfc 0x0e == table.copy

In addition we need opcodes in this proposal for table.grow, table.size, and (possibly) table.fill.

We are in somewhat short supply of single-byte opcodes so I propose that we (a) change the encoding of ref.func since is not likely to be a very common opcode, and (b) allocate prefixed opcodes also for the three table operations mentioned above, yielding the following table for the present proposal:

0x25 == table.get
0x26 == table.set

0xd0 == ref.null
0xd1 == ref.is_null

0xfc 0x0f == table.grow
0xfc 0x10 == table.size
0xfc 0x11 == table.fill

0xfc 0x20 == ref.func

with the idea that 0xfc 0x20 can be the start of the group for multi-byte gc/reftypes operations, and 0xd0 remains the start of the group for single-byte gc operations.

@rossberg @binji @lukewagner @titzer, opinions?

@rossberg
Copy link
Member

I wouldn't mind moving table.get/set to 0xfc as well, but I'm fine either way.

On the other hand I think ref.func should remain single-byte -- note that one of its uses will be as a constant instruction in generalised element segments, where it could be fairly frequent.

@lars-t-hansen
Copy link
Contributor Author

I agree that making ref.func two-byte will bloat the passive element segments that contain functions, but given how regular such an element segment would look to a level-1 wasm compressor or even a generic compressor, I'd expect that use case not to be all that important for compact encoding (during transmission).

(No objection to moving table.get / table.set.)

@rossberg
Copy link
Member

But why rely on external compression for cases where we can avoid it without notable drawback? In this case there really seems to be no benefit in doing so.

@lars-t-hansen
Copy link
Contributor Author

The benefit would be saving a scarce single-byte opcode (but only that).

@binji
Copy link
Member

binji commented Feb 13, 2019

I believe the MVP leaves us with 68 single-byte opcodes, assuming all opcodes from 0xf0 through 0xff are reserved for prefixes. Of those, a few are already being claimed by proposals: 5 for exception handling, 2 for tail calls, and 5 for sign-extension. That leaves 56 opcodes remaining: [0x14..0x19], [0x1c..0x1f], [0x25..0x27], [0xc5..0xef].

Making ref.func a single byte seems useful to me, and given that we still have this many single-byte opcodes, I'm not too concerned yet about using one more.

Keeping table.get and table.set at 0x25 and 0x26 is nice aesthetically since they're near global.get and global.set. But I'd be OK moving them too.

@rossberg
Copy link
Member

Okay, I'd propose leaving all three as single-byte then.

@alexcrichton
Copy link
Contributor

Would it be worth finalizing the table index encodings as well? It looks like the interpreter in this repo for table.get encodes the index as a varuint32 after the opcode but I think Firefox uses an 0x4 byte followed by varuint32 for nonzero table indices. (also for existing instructions like where to put the table index for call_indirect)

@lars-t-hansen
Copy link
Contributor Author

Firefox is going to be updated shortly to follow the interpreter here. The flag byte is not useful, it's just a holdover from an older regime.

@lars-t-hansen
Copy link
Contributor Author

lars-t-hansen commented Feb 14, 2019

Summarizing the discussion so far, the proposal is:

0x25 == table.get
0x26 == table.set

0xd0 == ref.null
0xd1 == ref.is_null
0xd2 == ref.func      (* modulo bikeshedding of the name *)

0xfc 0x0f == table.grow
0xfc 0x10 == table.size
0xfc 0x11 == table.fill

Additionally, the table arguments to all the table operations are simple varuint32 values that are always present, there are no flags. This includes the second operand to call_indirect.

@lars-t-hansen
Copy link
Contributor Author

Landed and fixed everywhere.

rossberg pushed a commit that referenced this issue Nov 20, 2019
…able.copy`. (#29)

This would make it simpler to extend those instructions to support multiple memories/tables, and copying between different memories/tables.

The current encoding has a single placeholder zero byte for those instructions, which allows extension to multiple memories/tables, but would require a more complicated encoding to add two immediate indices.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants