wasm64 support #572

joshtriplett · 2019-11-14T17:18:51Z

We should consider supporting wasm64 modules, not just wasm32; people will want to run with large linear address spaces, both to process large amounts of data and to provide address space for shared mappings or file mappings.

Opening this issue to start discussing what that support should look like, and how we can do that with minimal complexity or duplication.

sunfishcode · 2019-11-14T18:05:13Z

The first step is to propose the idea for stage 0 of the CG process. Once that's accepted, we can then create a repo in the WebAssembly organization where we can coordinate and collect documentation, and avoid duplication.

Next is to design a binary encoding. In theory, all we need is a way to label a linear-memory as 64-bit, because all opcodes that operate on linear memories take an index which specifies which memory they talk about, and that can then determine the types of their operands.

(Especially with multiple memories on the horizon, I think we can say that "wasm64" shouldn't be a new language or mode. Instead, we want individual linear memories to be marked as 64-bit, so that a program could in theory have both 32-bit and 64-bit memories. Some tools, like LLVM, may continue to think of "wasm64" as a separate architecture from "wasm32", however that's just a convention.)

Then, prototyping can start, both on the producer side and consumer side. I expect we can do this prototyping upstream, rather than in separate branches, because wasm64 is something that many people want, and most of the code should be straightforward. We just need to be careful to communicate that the binary format won't be stable until it progresses further through the CG process.

On the producer side, the one tool I know of with a start on wasm64 is LLVM, though it's not complete yet, and it needs to be taught about the binary encoding.

On the consumer side, besides just teaching various components how to recognize the flag and allocate memory for it and generate code for it, there's also a question of sandboxing. To start with, we can use bounds checking, though also see here for an interesting possible optimization.

joshtriplett · 2019-11-15T01:12:14Z

@sunfishcode Ah, for some reason I thought there was already a preliminary specification for the binary format. https://webassembly.org/docs/future-features/#linear-memory-bigger-than-4-gib somewhat implies that ("wasm32 and wasm64 are both just modes of WebAssembly, to be selected by a flag in a module header").

Sorry for the confusion on my part.

I do think it makes sense to support the concept of both 32-bit and 64-bit memories, and for that matter, future linking models could theoretically communicate between and translate between modules written for 32-bit and for 64-bit.

(Though, even if the underlying model supports multiple memories, I wouldn't find it surprising if code targeting wasm64 uses flat 64-bit pointers and runtimes have to encode any concept of multiple address spaces into the pointer, rather than making pointers larger than 64 bits.)

sunfishcode · 2019-11-15T01:26:49Z

It appears I wrote that sentence back in 2015; things have evolved somewhat since then :-}. I've now submitted WebAssembly/design#1311 to update that.

sunfishcode · 2020-02-07T16:16:33Z

Some discussion on wasm64 here.

binji · 2020-02-07T18:42:01Z

Yeah, I'd like to push this forward sooner rather than later. My biggest concerns at this point are in the consumer -- making sure that we don't have to fall back to bounds checks. If anyone has spare cycles, it would be great to see some experiments with alternate trap-handlers (including effective address calculation).

joshtriplett · 2020-02-07T18:59:53Z

Would architecture-specific optimizations to bounds checking be welcome? Obviously we need a correct implementation on all platforms. But if a platform can substantially speed up bounds checking and avoid per-access checks, would that be welcome?

Specifically, I would suggest memory protection keys (PK). We could allocate the memory for a given sandbox using a protection key, and enable only that protection key before jumping into the JITted code. (This would have limitations, notably that a process has a limited number of protection keys, but I think it would work well in many common cases.)

binji · 2020-02-07T19:34:19Z

Yeah, I think that's a great solution for some platforms. But as with SIMD, I think we'll need to make sure that we're OK with the performance on all platforms. I think we all expect to see a performance regression, but the question is how much.

joshtriplett · 2020-02-07T20:43:01Z

@binji Of course. Performance would need to be acceptable everywhere, but from what I understand, even with bounds checking, performance is acceptable.

binji · 2020-02-08T21:34:36Z

True, @aardappel has made a similar argument. We know we'll need 64-bit memories, even if they end up being slower at first. I'm mostly concerned w/ how we spec it so we can support as many optimizations as possible (including PK).

sunfishcode · 2020-03-21T04:20:30Z

There is now an official spec proposal repo, memory64.

lygstate · 2020-09-11T03:42:50Z

When this going to happen?

aardappel · 2020-09-11T03:46:50Z

I've implemented wasm64 support in LLVM, LLD, WABT, now working on Binaryen..

lygstate · 2020-09-11T03:56:48Z

Wonderfull work, support in wasmtime would be relative harder?

tschneidereit · 2020-09-11T10:22:58Z

Indeed, that's excellent progress, @aardappel!

Wonderfull work, support in wasmtime would be relative harder?

My understanding — which might be wrong — is that it's probably not a huge amount of work, but also non-trivial. @alexcrichton and @sunfishcode, ISTM we talked about this a while ago, do you happen to have an idea of the work involved to make this happen?

lygstate · 2020-09-11T11:31:27Z

Indeed, that's excellent progress, @aardappel!

Wonderfull work, support in wasmtime would be relative harder?

My understanding — which might be wrong — is that it's probably not a huge amount of work, but also non-trivial. @alexcrichton and @sunfishcode, ISTM we talked about this a while ago, do you happen to have an idea of the work involved to make this happen?

does the wasm64 and wasm32 have different abi？

bjorn3 · 2020-09-11T11:38:45Z

They would need to have different abi's. Pointer size is part of the abi and they use different pointer sizes.

alexcrichton · 2020-09-11T18:22:55Z

I suspect this would be relatively simple to implement nowadays. The memory64 proposal is quite small, basically just changing memory-operating instructions to work with either 32 or 64-bit indices. The work in wasmparser is already done to implement the memory64 proposal, so all that's needed to be done is to thread it all through cranelift.

At this time this probably won't be as well optimized as memory32 since we can't naively do the exact same guard page trick we do there (reserving a 32-bit region of the address space). That being said cranelift has all the internal machinery to insert manual checks on each load/store, so we'd just need to hook that up. Overall this is likely a simple-ish refactoring of the code translator to conditionally use 64 or 32-bit indices everywhere, depending on what type each memory has.

The wasmtime API itself may not even have to change at all for something like this. We might add a flag to Memory as to whether it's 64-bit or not, but otherwise everything is pretty much the same. Although ABIs are different that's only really an artifact of compilation toolchains, once you get to the runtime it's all basically just a wasm blob.

lastmjs · 2021-06-05T22:09:11Z

Any update on the progress here? I'm working on building applications for the DFINITY Internet Computer, and having wasm64 working could do wonders for applications scaling on the Internet Computer. Right now applications (canisters) are limited to 4gb in size without doing some relatively complicated cross-canister scaling.

aardappel · 2021-06-06T01:32:32Z

@lastmjs Memory64 has progressed to a stage 3 proposal, LLVM/WABT/Binaryen support has further matured, spec implementation is available. The biggest things still not finished are Emscripten support (which is in progress) and of course VM support (V8 is in progress, not aware of others that have started).

Wasmtime support would be great! Who's going to take it on? :)

zeroexcuses · 2021-07-20T23:06:38Z

Are there any toy repos where someone has forked wasmtime to add wasm64 support? In my very limited understanding, the start of it is:

change u32 to u64 here: https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasmparser/src/primitives.rs#L260-L261
modify all the functions regarding memarg at https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/wasm/src/code_translator.rs#L118
remove the 32-bit bound check (bad for security), possibly add bounds check (bad for performance) [here, I'm running 'trusted' wasm64 code]

This is a bit above my current cranelift knowledge. Is there any repo/toolchain where someone forked all this and patched it?

alexcrichton · 2021-07-21T15:12:43Z

FWIW the wasmparser crate, part of wasm-tools, should already support wasm64 in that it implments validation and decoding support. What's not supported is Wasmtime's code_translator.rs or the supporting runtime support, since that's all geared towards 32-bit. AFAIK though there's no fork of Wasmtime with this implemented at this time (but I could be wrong!)

zeroexcuses · 2021-07-21T15:44:22Z

The following two beliefs contradict:

@alexcrichton : Because you are one of the top wasmtime committers, I want to believe "FWIW the wasmparser crate, part of wasm-tools, should already support wasm64 in that it implments validation and decoding support." is true
https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasmparser/src/primitives.rs#L260-L261 literally states:

#[derive(Debug, Copy, Clone)]
pub struct MemoryImmediate {
    /// Alignment, stored as `n` where the actual alignment is `2^n`
    pub align: u8,
    pub offset: u32,
    pub memory: u32,
}

Unless this is some type of weird encoding where actual_address = memory * 2^32 + offset, I don't see how wasm-parser could possibly support wasm64.

Am I mis understanding something fundamental? Does wasm64 not store address as a single u64? If I am misunderstanding this, can you point me to the documentation on how wasm64 does store addresses ?

alexcrichton · 2021-07-21T16:14:37Z

In that structure the align field doesn't need to change since it can already represent all reasonable alignments. The offset field is just a fixed offset encoded in the memarg structure. The memory64 proposal now indicates that this can be up to 64-bits, but I think at the time I implemented memory64 validation that wasn't clarified in the upstream proposal. The memory field is the index of the memory being used, and that does not change in the memory64 proposal.

That MemoryImmediate structure is an AST-level construct, not something used at runtime. It does not represent actual raw addresses, but rather it's the memarg from the spec on each memory-related instruction, describing which memory is being operated on, the alignment of the operation, and the fixed offset from the runtime-calculated address, if any.

zeroexcuses · 2021-07-21T16:24:35Z

Is https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wast/src/ast/expr.rs#L1235-L1246

// from wast crate
#[derive(Debug)]
pub struct MemArg<'a> {
    /// The alignment of this access.
    ///
    /// This is not stored as a log, this is the actual alignment (e.g. 1, 2, 4,
    /// 8, etc).
    pub align: u32,
    /// The offset, in bytes of this access.
    pub offset: u32,
    /// The memory index we're accessing
    pub memory: ast::ItemRef<'a, kw::memory>,
}

the right MemArg ?

// =============================

It seems like, either way, I need to "fork the instruction set". Two options are:

fork wast::Instruction https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wast/src/ast/expr.rs#L499
fork wasmparser::Operator https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasmparser/src/primitives.rs#L350

I was wondering if you had advice on which route might be nicer (the two enums look very similar).

alexcrichton · 2021-07-21T17:04:14Z

Are you looking to implement wasm64? (sorry I'm not sure if you're looking to learn information about the state of things or whether you're looking to help push forward the state of things)

zeroexcuses · 2021-07-21T18:04:16Z

Are you looking to implement wasm64? (sorry I'm not sure if you're looking to learn information about the state of things or whether you're looking to help push forward the state of things)

I am looking to throw together a toy prototype with the following properties:

language looks like wasm, but uses 64-bit instead of 32-bit addresses
on x86_64, JITs via cranelift into native code
executes 'trusted' code (i.e. no bound checks)

The XY problem is that I am generating wasm32 not via LLVM/Cranelift, but by my own toy code generator. I want to be able to swap in something that allows for > 4GB memory.

EDIT: I acknowledge this is quite different from 'write a wasm64 backend that passes wasmtime's coding standards and plays nicely with the rest of the wasmtime toolchain'

alexcrichton · 2021-07-21T20:32:31Z

If you're goal is to just use Cranelift I don't think there's any changes necessary, the rustc backend using Cranelift is already for x86_64 and works reasonably well. Otherwise if you want to work with wasm you can probably get away with a few small edits to code_translator.rs. Other than that though you're likely in territory I'm at least personally not able to help too much with.

zeroexcuses · 2021-07-21T20:36:49Z

I don't think I explained this well. I have

a toy language FooLang
a toy compiler FooLang -> wast::{Module, Func, Instruction} -> wasmtime -> x86_64
(2) however, is limited to 4GB memory; I want to eliminate this limitation
I'm going for something like FooLang -> (something that looks like wasm64) -> cranelift_wasm -> x86_64

I am trying to figure out the minimal patch to { wast or wasmparser } + cranelift_wasm to (1) have the "wasm-like ast" be able to have 64 bit address and (2) generate corresponding instrs

alexcrichton · 2021-07-22T14:21:23Z

Ah ok, unfortunately though I don't think there's a too-minimal-path. I think that wasm64 just needs to be implemented in cranelift-wasm.

zeroexcuses · 2021-07-22T20:38:56Z

The current 'minimal' changes I see are:

copy https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wast -> wast64, change some u32 to u64 // this gives us an instruction set that supports 64bit addr
copy https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wasmparser -> wasmparser64, changes some u32 to u64 // this gets us 64bit addr while parsing
copy https://github.com/bytecodealliance/wasmtime/tree/main/cranelift/wasm -> cranelift-wasm64, add support to generating instrs for 64-bit addr
then, look at dependents of these 3 libs in the wasmtime tree, copy them, adding -64 prefix

Is this approximately the 'minimal' path ?

bjorn3 · 2021-07-22T20:45:49Z

I think all of them should support both 32bit and 64bit memories especially because the multi-memory proposal allows mixing loads and stores to 32bit and 64bit memories within a single function.

zeroexcuses · 2021-07-22T21:25:54Z

I think all of them should support both 32bit and 64bit memories especially because the multi-memory proposal allows mixing loads and stores to 32bit and 64bit memories within a single function.

In the general case, I agree that you are right. For my particular case, given:

I'm going for something like FooLang -> (something that looks like wasm64) -> cranelift_wasm -> x86_64

only having 64 bit addr would be an acceptable first step

abrown · 2021-07-22T21:29:47Z

Why not just implement the proposal as specified--for both memory sizes? It doesn't seem like that much more work and you would probably get some more help from maintainers of all these libraries since it helps them implement the proposal?

zeroexcuses · 2021-07-22T21:50:00Z

Why not just implement the proposal as specified--for both memory sizes? It doesn't seem like that much more work and you would probably get some more help from maintainers of all these libraries since it helps them implement the proposal?

I have previously been playing around with wast & cranelift-jit, so I believe the hack I have outlined above is a matter of days, whereas I have no idea how much work full wasm64 is (given no one has done it, sounds like months?)
I don't think it's honest to pretend to care about something I don't care about just to get help from maintainers.

zeroexcuses · 2021-07-26T19:29:23Z

Does https://github.com/bytecodealliance/wasmtime/tree/main/crates/lightbeam work ? I am trying to run the examples at:

https://github.com/bytecodealliance/wasmtime/blob/main/crates/lightbeam/examples/test.rs
https://github.com/bytecodealliance/wasmtime/blob/main/crates/lightbeam/src/benches.rs
but getting error: enches::bench_fibonacci_run' panicked at 'not implemented: can't translate CodeSectionEntry("...")',

Now, I can modify https://github.com/bytecodealliance/wasmtime/blob/main/crates/lightbeam/src/module.rs#L572 to map the enum to {}, but then it runs into the problem that nothing sets TranslateModule::translated_code_section.

Thus, the question: is lightbeam currently in working state, or is it broken ?

bjorn3 · 2021-07-26T19:31:55Z

Lightbeam is unmaintained, so it is probably broken.

zeroexcuses · 2021-07-28T06:59:55Z

No warranty. Not liable for damages. Do not use this code. Only for educational purposes. Probably dangerous side effects.

I believe I got a basic "ret 42" to execute on lightbeam by copying/pasting module.rs and fixing the generated runtime errors:

const WAT: &str = r#"
(module
(func $foo (result i32) 
  (i32.const 42)))

"#;

fn main() -> anyhow::Result<()> {
    let data = wat::parse_str(WAT)?;

    let mut output = TranslatedModule::default();

    for payload in Parser::new(0).parse_all(&data) {
        println!("payload received: {:?}", payload);
        match payload? {
            Payload::TypeSection(s) => output.ctx.types = translate_sections::type_(s)?,
            Payload::ImportSection(s) => translate_sections::import(s)?,
            Payload::FunctionSection(s) => {
                output.ctx.func_ty_indices = translate_sections::function(s)?;}
            Payload::TableSection(s) => {
                translate_sections::table(s)?;}
            Payload::MemorySection(s) => {
                let mem = translate_sections::memory(s)?;

                if mem.len() > 1 {
                    Err(Error::Input("Multiple memory sections not yet implemented".to_string()))?;}

                if !mem.is_empty() {
                    let mem = mem[0];
                    let limits = match mem {
                        MemoryType::M32 {
                            limits,
                            shared: false,
                        } => limits,
                        _ => Err(Error::Input("unsupported memory".to_string()))?,};
                    if Some(limits.initial) != limits.maximum {
                        Err(Error::Input(
                            "Custom memory limits not supported in lightbeam".to_string(),))?;}
                    output.memory = Some(limits);}}
            Payload::GlobalSection(s) => {
                translate_sections::global(s)?;}
            Payload::ExportSection(s) => {
                translate_sections::export(s)?;}
            Payload::StartSection { func, .. } => {
                translate_sections::start(func)?;}
            Payload::ElementSection(s) => {
                translate_sections::element(s)?;}
            Payload::DataSection(s) => {
                translate_sections::data(s)?;}
            Payload::CodeSectionStart { .. }
            | Payload::CustomSection { .. }
            | Payload::Version { .. } => {}

            Payload::CodeSectionEntry(function_body) => {
                let mut code_gen_session = CodeGenSession::new(1, &output.ctx, microwasm::I64);
                let mut func_idx = 0;

                let mut null_offset_sink = NullOffsetSink;
                let mut unimplemented_reloc_sink = translate_sections::UnimplementedRelocSink;
                let mut null_trap_sink = NullTrapSink {};

                let mut sinks = Sinks {
                    relocs: &mut unimplemented_reloc_sink,
                    traps: &mut null_trap_sink,
                    offsets: &mut null_offset_sink,};

                translate_wasm(&mut code_gen_session, sinks, func_idx, function_body);
                func_idx = func_idx + 1;

                output.translated_code_section =
                    Some(code_gen_session.into_translated_code_section()?);}
            Payload::End => {}

            other => unimplemented!("can't translate {:?}", other),}}

    let translated = output.instantiate(); //  m| m.instantiate())?;

    let module = &translated.module;
    let func_idx = 0;
    if func_idx as usize >= module.ctx.func_ty_indices.len() {
        Err(ExecutionError::FuncIndexOutOfBounds)?;}
    let type_ = module.ctx.func_type(func_idx);
    let args = ();

    if (&type_.params[..], &type_.returns[..])
        != (<() as TypeList>::TYPE_LIST, <u32 as TypeList>::TYPE_LIST)
    {
        Err(ExecutionError::TypeMismatch)?;}

    println!("func_idx: {:?}", func_idx);
    let code_section = translated
        .module
        .translated_code_section
        .as_ref()
        .expect("no code section");
    let start_buf = code_section.func_start(func_idx as usize);

    let result: u32 = unsafe {
        args.call(
            <() as FunctionArgs<u32>>::into_func(start_buf),
            translated
                .context
                .as_ref()
                .map(|ctx| (&**ctx) as *const VmCtx as *const u8)
                .unwrap_or(std::ptr::null()),)};

    // let result: u32 = translated.execute_func(0, (5u32, 3u32))?;

    println!("f(5, 3) = {}", result);

    assert_eq!(result, 42);

    Ok(())}

#[test]
fn test_00() {
    main().unwrap();}

(adding 5 + 3, even with right calling convention, unfortunately does not return 8 yet).

zeroexcuses · 2021-07-30T23:52:02Z

I think I just got passing arguments + adding working. This issue appears to be an out of sync comment + off-by-1 with passing arguments on sysv. In particular, https://github.com/bytecodealliance/wasmtime/blob/main/crates/lightbeam/src/backend.rs#L587 needs to have the rdi register appended to the front of it.

I am now interested in throwing the entire wasm test suite at refactored-lightbeam, and seeing what breaks. Is there a standard way of 'throwing entire wasm test suite" ?

alexcrichton · 2021-08-05T22:14:51Z

I have an initial PR for implementing this in cranelift and wasmtime at #3153

alexcrichton · 2021-08-12T14:40:39Z

Added in #3153

lastmjs mentioned this issue Jun 5, 2021

Scaling sudograph/sudograph#75

Open

EricSimons mentioned this issue Aug 5, 2021

RFC: Env flag for npm packages to conditionally load WASM binaries instead of native binaries stackblitz/webcontainer-core#286

Open

alexcrichton mentioned this issue Aug 5, 2021

Tracking issue for the memory64 proposal #2361

Closed

alexcrichton closed this as completed Aug 12, 2021

Tails mentioned this issue Oct 9, 2022

[Feature request] WASM compilation terminusdb/terminusdb#1494

Open

necessarily-equal mentioned this issue Feb 8, 2023

Wasm hardware support (including 32-bit like Windows i686 and Linux armhf) DaemonEngine/Daemon#757

Open

wasm64 support #572

wasm64 support #572

Comments

joshtriplett commented Nov 14, 2019

sunfishcode commented Nov 14, 2019

joshtriplett commented Nov 15, 2019

sunfishcode commented Nov 15, 2019

sunfishcode commented Feb 7, 2020

binji commented Feb 7, 2020

joshtriplett commented Feb 7, 2020

binji commented Feb 7, 2020

joshtriplett commented Feb 7, 2020

binji commented Feb 8, 2020

sunfishcode commented Mar 21, 2020

lygstate commented Sep 11, 2020

aardappel commented Sep 11, 2020

lygstate commented Sep 11, 2020

tschneidereit commented Sep 11, 2020

lygstate commented Sep 11, 2020

bjorn3 commented Sep 11, 2020

alexcrichton commented Sep 11, 2020

lastmjs commented Jun 5, 2021

aardappel commented Jun 6, 2021

zeroexcuses commented Jul 20, 2021

alexcrichton commented Jul 21, 2021

zeroexcuses commented Jul 21, 2021

alexcrichton commented Jul 21, 2021

zeroexcuses commented Jul 21, 2021 • edited Loading

alexcrichton commented Jul 21, 2021

zeroexcuses commented Jul 21, 2021 • edited Loading

alexcrichton commented Jul 21, 2021

zeroexcuses commented Jul 21, 2021

alexcrichton commented Jul 22, 2021

zeroexcuses commented Jul 22, 2021

bjorn3 commented Jul 22, 2021

zeroexcuses commented Jul 22, 2021

abrown commented Jul 22, 2021

zeroexcuses commented Jul 22, 2021

zeroexcuses commented Jul 26, 2021

bjorn3 commented Jul 26, 2021

zeroexcuses commented Jul 28, 2021

zeroexcuses commented Jul 30, 2021

alexcrichton commented Aug 5, 2021

alexcrichton commented Aug 12, 2021

zeroexcuses commented Jul 21, 2021 •

edited

Loading

zeroexcuses commented Jul 21, 2021 •

edited

Loading