Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode modules with variable-length integers #2322

Merged
merged 1 commit into from
Oct 26, 2020

Conversation

alexcrichton
Copy link
Member

Update Module::{serialize,deserialize} to use variable-length integers
with bincode to make the output artifacts smaller. Locally this
reduces the size of #2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.

Update `Module::{serialize,deserialize}` to use variable-length integers
with `bincode` to make the output artifacts smaller. Locally this
reduces the size of bytecodealliance#2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.
@github-actions github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Oct 26, 2020
@github-actions
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@alexcrichton alexcrichton merged commit 2723385 into bytecodealliance:main Oct 26, 2020
@alexcrichton alexcrichton deleted the bincode-varint branch October 26, 2020 14:52
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 26, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as bytecodealliance#2321
or bytecodealliance#2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from bytecodealliance#2318
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Oct 27, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as bytecodealliance#2321
or bytecodealliance#2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from bytecodealliance#2318
alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Nov 3, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as bytecodealliance#2321
or bytecodealliance#2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from bytecodealliance#2318
alexcrichton added a commit that referenced this pull request Nov 3, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as #2321
or #2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from #2318
@fitzgen
Copy link
Member

fitzgen commented Nov 4, 2020

To be clear this is just enabling variable length encoding for integers, right? It is not switching to a delta encoding (which would ensure that essentially all addresses are 1-byte encoded) correct?

@alexcrichton
Copy link
Member Author

Ah yes indeed, a delta encoding might make it even more further compact.

I've so far shied away from big changes like that though. I think we may want to do it eventually, but it trades-off in-memory footprint for lookup speed since we can no longer quickly do a random lookup for a particular trapping pc. Not that we necessarily need that to be too too fast.

@fitzgen
Copy link
Member

fitzgen commented Nov 4, 2020

I'm only advocating for delta encoding in the serialized format, not the in-memory representation.

@tschneidereit
Copy link
Member

I'm only advocating for delta encoding in the serialized format, not the in-memory representation.

I think it'd be good to try not to have those two deviate too much: ideally we should at some point be able to mmap the on-disk representation entirely or almost entirely instead of having to read and deserialize it.

There clearly are size/speed tradeoffs to be had here, so that might not be the only thing we support, but I hope it can become a thing we support :)

cfallin pushed a commit that referenced this pull request Nov 30, 2020
Update `Module::{serialize,deserialize}` to use variable-length integers
with `bincode` to make the output artifacts smaller. Locally this
reduces the size of #2318 from 160 to 110 MB, a 30% decrease in size!
Deserialization performance is slightly slower, but seemingly within the
range of noise locally for me.
cfallin pushed a commit that referenced this pull request Nov 30, 2020
This commit reduces the size of `InstructionAddressMap` from 24 bytes to
8 bytes by dropping the `code_len` field and reducing `code_offset` to
`u32` instead of `usize`. The intention is to primarily make the
in-memory version take up less space, and the hunch is that the
`code_len` is largely not necessary since most entries in this map are
always adjacent to one another. The `code_len` field is now implied by
the `code_offset` field of the next entry in the map.

This isn't as big of an improvement to serialized module size as #2321
or #2322, primarily because of the switch to variable-length encoding.
Despite this though it shaves about 10MB off the encoded size of the
module from #2318
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:api Related to the API of the `wasmtime` crate itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants