Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm64 support #572

Closed
joshtriplett opened this issue Nov 14, 2019 · 40 comments
Closed

wasm64 support #572

joshtriplett opened this issue Nov 14, 2019 · 40 comments

Comments

@joshtriplett
Copy link
Member

We should consider supporting wasm64 modules, not just wasm32; people will want to run with large linear address spaces, both to process large amounts of data and to provide address space for shared mappings or file mappings.

Opening this issue to start discussing what that support should look like, and how we can do that with minimal complexity or duplication.

@sunfishcode
Copy link
Member

The first step is to propose the idea for stage 0 of the CG process. Once that's accepted, we can then create a repo in the WebAssembly organization where we can coordinate and collect documentation, and avoid duplication.

Next is to design a binary encoding. In theory, all we need is a way to label a linear-memory as 64-bit, because all opcodes that operate on linear memories take an index which specifies which memory they talk about, and that can then determine the types of their operands.

(Especially with multiple memories on the horizon, I think we can say that "wasm64" shouldn't be a new language or mode. Instead, we want individual linear memories to be marked as 64-bit, so that a program could in theory have both 32-bit and 64-bit memories. Some tools, like LLVM, may continue to think of "wasm64" as a separate architecture from "wasm32", however that's just a convention.)

Then, prototyping can start, both on the producer side and consumer side. I expect we can do this prototyping upstream, rather than in separate branches, because wasm64 is something that many people want, and most of the code should be straightforward. We just need to be careful to communicate that the binary format won't be stable until it progresses further through the CG process.

On the producer side, the one tool I know of with a start on wasm64 is LLVM, though it's not complete yet, and it needs to be taught about the binary encoding.

On the consumer side, besides just teaching various components how to recognize the flag and allocate memory for it and generate code for it, there's also a question of sandboxing. To start with, we can use bounds checking, though also see here for an interesting possible optimization.

@joshtriplett
Copy link
Member Author

@sunfishcode Ah, for some reason I thought there was already a preliminary specification for the binary format. https://webassembly.org/docs/future-features/#linear-memory-bigger-than-4-gib somewhat implies that ("wasm32 and wasm64 are both just modes of WebAssembly, to be selected by a flag in a module header").

Sorry for the confusion on my part.

I do think it makes sense to support the concept of both 32-bit and 64-bit memories, and for that matter, future linking models could theoretically communicate between and translate between modules written for 32-bit and for 64-bit.

(Though, even if the underlying model supports multiple memories, I wouldn't find it surprising if code targeting wasm64 uses flat 64-bit pointers and runtimes have to encode any concept of multiple address spaces into the pointer, rather than making pointers larger than 64 bits.)

@sunfishcode
Copy link
Member

It appears I wrote that sentence back in 2015; things have evolved somewhat since then :-}. I've now submitted WebAssembly/design#1311 to update that.

@sunfishcode
Copy link
Member

Some discussion on wasm64 here.

@binji
Copy link

binji commented Feb 7, 2020

Yeah, I'd like to push this forward sooner rather than later. My biggest concerns at this point are in the consumer -- making sure that we don't have to fall back to bounds checks. If anyone has spare cycles, it would be great to see some experiments with alternate trap-handlers (including effective address calculation).

@joshtriplett
Copy link
Member Author

Would architecture-specific optimizations to bounds checking be welcome? Obviously we need a correct implementation on all platforms. But if a platform can substantially speed up bounds checking and avoid per-access checks, would that be welcome?

Specifically, I would suggest memory protection keys (PK). We could allocate the memory for a given sandbox using a protection key, and enable only that protection key before jumping into the JITted code. (This would have limitations, notably that a process has a limited number of protection keys, but I think it would work well in many common cases.)

@binji
Copy link

binji commented Feb 7, 2020

Yeah, I think that's a great solution for some platforms. But as with SIMD, I think we'll need to make sure that we're OK with the performance on all platforms. I think we all expect to see a performance regression, but the question is how much.

@joshtriplett
Copy link
Member Author

@binji Of course. Performance would need to be acceptable everywhere, but from what I understand, even with bounds checking, performance is acceptable.

@binji
Copy link

binji commented Feb 8, 2020

True, @aardappel has made a similar argument. We know we'll need 64-bit memories, even if they end up being slower at first. I'm mostly concerned w/ how we spec it so we can support as many optimizations as possible (including PK).

@sunfishcode
Copy link
Member

There is now an official spec proposal repo, memory64.

@lygstate
Copy link

When this going to happen?

@aardappel
Copy link

I've implemented wasm64 support in LLVM, LLD, WABT, now working on Binaryen..

@lygstate
Copy link

Wonderfull work, support in wasmtime would be relative harder?

@tschneidereit
Copy link
Member

Indeed, that's excellent progress, @aardappel!

Wonderfull work, support in wasmtime would be relative harder?

My understanding — which might be wrong — is that it's probably not a huge amount of work, but also non-trivial. @alexcrichton and @sunfishcode, ISTM we talked about this a while ago, do you happen to have an idea of the work involved to make this happen?

@lygstate
Copy link

Indeed, that's excellent progress, @aardappel!

Wonderfull work, support in wasmtime would be relative harder?

My understanding — which might be wrong — is that it's probably not a huge amount of work, but also non-trivial. @alexcrichton and @sunfishcode, ISTM we talked about this a while ago, do you happen to have an idea of the work involved to make this happen?

does the wasm64 and wasm32 have different abi?

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 11, 2020

They would need to have different abi's. Pointer size is part of the abi and they use different pointer sizes.

@alexcrichton
Copy link
Member

I suspect this would be relatively simple to implement nowadays. The memory64 proposal is quite small, basically just changing memory-operating instructions to work with either 32 or 64-bit indices. The work in wasmparser is already done to implement the memory64 proposal, so all that's needed to be done is to thread it all through cranelift.

At this time this probably won't be as well optimized as memory32 since we can't naively do the exact same guard page trick we do there (reserving a 32-bit region of the address space). That being said cranelift has all the internal machinery to insert manual checks on each load/store, so we'd just need to hook that up. Overall this is likely a simple-ish refactoring of the code translator to conditionally use 64 or 32-bit indices everywhere, depending on what type each memory has.

The wasmtime API itself may not even have to change at all for something like this. We might add a flag to Memory as to whether it's 64-bit or not, but otherwise everything is pretty much the same. Although ABIs are different that's only really an artifact of compilation toolchains, once you get to the runtime it's all basically just a wasm blob.

@lastmjs
Copy link

lastmjs commented Jun 5, 2021

Any update on the progress here? I'm working on building applications for the DFINITY Internet Computer, and having wasm64 working could do wonders for applications scaling on the Internet Computer. Right now applications (canisters) are limited to 4gb in size without doing some relatively complicated cross-canister scaling.

@aardappel
Copy link

@lastmjs Memory64 has progressed to a stage 3 proposal, LLVM/WABT/Binaryen support has further matured, spec implementation is available. The biggest things still not finished are Emscripten support (which is in progress) and of course VM support (V8 is in progress, not aware of others that have started).

Wasmtime support would be great! Who's going to take it on? :)

@zeroexcuses
Copy link

Are there any toy repos where someone has forked wasmtime to add wasm64 support? In my very limited understanding, the start of it is:

  1. change u32 to u64 here: https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasmparser/src/primitives.rs#L260-L261
  2. modify all the functions regarding memarg at https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/wasm/src/code_translator.rs#L118
  3. remove the 32-bit bound check (bad for security), possibly add bounds check (bad for performance) [here, I'm running 'trusted' wasm64 code]

This is a bit above my current cranelift knowledge. Is there any repo/toolchain where someone forked all this and patched it?

@alexcrichton
Copy link
Member

FWIW the wasmparser crate, part of wasm-tools, should already support wasm64 in that it implments validation and decoding support. What's not supported is Wasmtime's code_translator.rs or the supporting runtime support, since that's all geared towards 32-bit. AFAIK though there's no fork of Wasmtime with this implemented at this time (but I could be wrong!)

@zeroexcuses
Copy link

The following two beliefs contradict:

  1. @alexcrichton : Because you are one of the top wasmtime committers, I want to believe "FWIW the wasmparser crate, part of wasm-tools, should already support wasm64 in that it implments validation and decoding support." is true

  2. https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasmparser/src/primitives.rs#L260-L261 literally states:

#[derive(Debug, Copy, Clone)]
pub struct MemoryImmediate {
    /// Alignment, stored as `n` where the actual alignment is `2^n`
    pub align: u8,
    pub offset: u32,
    pub memory: u32,
}

Unless this is some type of weird encoding where actual_address = memory * 2^32 + offset, I don't see how wasm-parser could possibly support wasm64.

Am I mis understanding something fundamental? Does wasm64 not store address as a single u64? If I am misunderstanding this, can you point me to the documentation on how wasm64 does store addresses ?

@alexcrichton
Copy link
Member

In that structure the align field doesn't need to change since it can already represent all reasonable alignments. The offset field is just a fixed offset encoded in the memarg structure. The memory64 proposal now indicates that this can be up to 64-bits, but I think at the time I implemented memory64 validation that wasn't clarified in the upstream proposal. The memory field is the index of the memory being used, and that does not change in the memory64 proposal.

That MemoryImmediate structure is an AST-level construct, not something used at runtime. It does not represent actual raw addresses, but rather it's the memarg from the spec on each memory-related instruction, describing which memory is being operated on, the alignment of the operation, and the fixed offset from the runtime-calculated address, if any.

@zeroexcuses
Copy link

zeroexcuses commented Jul 21, 2021

Is https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wast/src/ast/expr.rs#L1235-L1246

// from wast crate
#[derive(Debug)]
pub struct MemArg<'a> {
    /// The alignment of this access.
    ///
    /// This is not stored as a log, this is the actual alignment (e.g. 1, 2, 4,
    /// 8, etc).
    pub align: u32,
    /// The offset, in bytes of this access.
    pub offset: u32,
    /// The memory index we're accessing
    pub memory: ast::ItemRef<'a, kw::memory>,
}

the right MemArg ?

// =============================

It seems like, either way, I need to "fork the instruction set". Two options are:

  1. fork wast::Instruction https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wast/src/ast/expr.rs#L499
  2. fork wasmparser::Operator https://github.com/bytecodealliance/wasm-tools/blob/main/crates/wasmparser/src/primitives.rs#L350

I was wondering if you had advice on which route might be nicer (the two enums look very similar).

@alexcrichton
Copy link
Member

Are you looking to implement wasm64? (sorry I'm not sure if you're looking to learn information about the state of things or whether you're looking to help push forward the state of things)

@zeroexcuses
Copy link

zeroexcuses commented Jul 21, 2021

Are you looking to implement wasm64? (sorry I'm not sure if you're looking to learn information about the state of things or whether you're looking to help push forward the state of things)

I am looking to throw together a toy prototype with the following properties:

  • language looks like wasm, but uses 64-bit instead of 32-bit addresses
  • on x86_64, JITs via cranelift into native code
  • executes 'trusted' code (i.e. no bound checks)

The XY problem is that I am generating wasm32 not via LLVM/Cranelift, but by my own toy code generator. I want to be able to swap in something that allows for > 4GB memory.

EDIT: I acknowledge this is quite different from 'write a wasm64 backend that passes wasmtime's coding standards and plays nicely with the rest of the wasmtime toolchain'

@alexcrichton
Copy link
Member

If you're goal is to just use Cranelift I don't think there's any changes necessary, the rustc backend using Cranelift is already for x86_64 and works reasonably well. Otherwise if you want to work with wasm you can probably get away with a few small edits to code_translator.rs. Other than that though you're likely in territory I'm at least personally not able to help too much with.

@zeroexcuses
Copy link

I don't think I explained this well. I have

  1. a toy language FooLang
  2. a toy compiler FooLang -> wast::{Module, Func, Instruction} -> wasmtime -> x86_64
  3. (2) however, is limited to 4GB memory; I want to eliminate this limitation
  4. I'm going for something like FooLang -> (something that looks like wasm64) -> cranelift_wasm -> x86_64

I am trying to figure out the minimal patch to { wast or wasmparser } + cranelift_wasm to (1) have the "wasm-like ast" be able to have 64 bit address and (2) generate corresponding instrs

@alexcrichton
Copy link
Member

Ah ok, unfortunately though I don't think there's a too-minimal-path. I think that wasm64 just needs to be implemented in cranelift-wasm.

@zeroexcuses
Copy link

The current 'minimal' changes I see are:

  1. copy https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wast -> wast64, change some u32 to u64 // this gives us an instruction set that supports 64bit addr
  2. copy https://github.com/bytecodealliance/wasm-tools/tree/main/crates/wasmparser -> wasmparser64, changes some u32 to u64 // this gets us 64bit addr while parsing
  3. copy https://github.com/bytecodealliance/wasmtime/tree/main/cranelift/wasm -> cranelift-wasm64, add support to generating instrs for 64-bit addr
  4. then, look at dependents of these 3 libs in the wasmtime tree, copy them, adding -64 prefix

Is this approximately the 'minimal' path ?

@bjorn3
Copy link
Contributor

bjorn3 commented Jul 22, 2021

I think all of them should support both 32bit and 64bit memories especially because the multi-memory proposal allows mixing loads and stores to 32bit and 64bit memories within a single function.

@zeroexcuses
Copy link

I think all of them should support both 32bit and 64bit memories especially because the multi-memory proposal allows mixing loads and stores to 32bit and 64bit memories within a single function.

In the general case, I agree that you are right. For my particular case, given:

I'm going for something like FooLang -> (something that looks like wasm64) -> cranelift_wasm -> x86_64

only having 64 bit addr would be an acceptable first step

@abrown
Copy link
Contributor

abrown commented Jul 22, 2021

Why not just implement the proposal as specified--for both memory sizes? It doesn't seem like that much more work and you would probably get some more help from maintainers of all these libraries since it helps them implement the proposal?

@zeroexcuses
Copy link

Why not just implement the proposal as specified--for both memory sizes? It doesn't seem like that much more work and you would probably get some more help from maintainers of all these libraries since it helps them implement the proposal?

  1. I have previously been playing around with wast & cranelift-jit, so I believe the hack I have outlined above is a matter of days, whereas I have no idea how much work full wasm64 is (given no one has done it, sounds like months?)

  2. I don't think it's honest to pretend to care about something I don't care about just to get help from maintainers.

@zeroexcuses
Copy link

Does https://github.com/bytecodealliance/wasmtime/tree/main/crates/lightbeam work ? I am trying to run the examples at:

Now, I can modify https://github.com/bytecodealliance/wasmtime/blob/main/crates/lightbeam/src/module.rs#L572 to map the enum to {}, but then it runs into the problem that nothing sets TranslateModule::translated_code_section.

Thus, the question: is lightbeam currently in working state, or is it broken ?

@bjorn3
Copy link
Contributor

bjorn3 commented Jul 26, 2021

Lightbeam is unmaintained, so it is probably broken.

@zeroexcuses
Copy link

No warranty. Not liable for damages. Do not use this code. Only for educational purposes. Probably dangerous side effects.

I believe I got a basic "ret 42" to execute on lightbeam by copying/pasting module.rs and fixing the generated runtime errors:

const WAT: &str = r#"
(module
(func $foo (result i32) 
  (i32.const 42)))

"#;

fn main() -> anyhow::Result<()> {
    let data = wat::parse_str(WAT)?;

    let mut output = TranslatedModule::default();

    for payload in Parser::new(0).parse_all(&data) {
        println!("payload received: {:?}", payload);
        match payload? {
            Payload::TypeSection(s) => output.ctx.types = translate_sections::type_(s)?,
            Payload::ImportSection(s) => translate_sections::import(s)?,
            Payload::FunctionSection(s) => {
                output.ctx.func_ty_indices = translate_sections::function(s)?;}
            Payload::TableSection(s) => {
                translate_sections::table(s)?;}
            Payload::MemorySection(s) => {
                let mem = translate_sections::memory(s)?;

                if mem.len() > 1 {
                    Err(Error::Input("Multiple memory sections not yet implemented".to_string()))?;}

                if !mem.is_empty() {
                    let mem = mem[0];
                    let limits = match mem {
                        MemoryType::M32 {
                            limits,
                            shared: false,
                        } => limits,
                        _ => Err(Error::Input("unsupported memory".to_string()))?,};
                    if Some(limits.initial) != limits.maximum {
                        Err(Error::Input(
                            "Custom memory limits not supported in lightbeam".to_string(),))?;}
                    output.memory = Some(limits);}}
            Payload::GlobalSection(s) => {
                translate_sections::global(s)?;}
            Payload::ExportSection(s) => {
                translate_sections::export(s)?;}
            Payload::StartSection { func, .. } => {
                translate_sections::start(func)?;}
            Payload::ElementSection(s) => {
                translate_sections::element(s)?;}
            Payload::DataSection(s) => {
                translate_sections::data(s)?;}
            Payload::CodeSectionStart { .. }
            | Payload::CustomSection { .. }
            | Payload::Version { .. } => {}

            Payload::CodeSectionEntry(function_body) => {
                let mut code_gen_session = CodeGenSession::new(1, &output.ctx, microwasm::I64);
                let mut func_idx = 0;

                let mut null_offset_sink = NullOffsetSink;
                let mut unimplemented_reloc_sink = translate_sections::UnimplementedRelocSink;
                let mut null_trap_sink = NullTrapSink {};

                let mut sinks = Sinks {
                    relocs: &mut unimplemented_reloc_sink,
                    traps: &mut null_trap_sink,
                    offsets: &mut null_offset_sink,};

                translate_wasm(&mut code_gen_session, sinks, func_idx, function_body);
                func_idx = func_idx + 1;

                output.translated_code_section =
                    Some(code_gen_session.into_translated_code_section()?);}
            Payload::End => {}

            other => unimplemented!("can't translate {:?}", other),}}

    let translated = output.instantiate(); //  m| m.instantiate())?;

    let module = &translated.module;
    let func_idx = 0;
    if func_idx as usize >= module.ctx.func_ty_indices.len() {
        Err(ExecutionError::FuncIndexOutOfBounds)?;}
    let type_ = module.ctx.func_type(func_idx);
    let args = ();

    if (&type_.params[..], &type_.returns[..])
        != (<() as TypeList>::TYPE_LIST, <u32 as TypeList>::TYPE_LIST)
    {
        Err(ExecutionError::TypeMismatch)?;}

    println!("func_idx: {:?}", func_idx);
    let code_section = translated
        .module
        .translated_code_section
        .as_ref()
        .expect("no code section");
    let start_buf = code_section.func_start(func_idx as usize);

    let result: u32 = unsafe {
        args.call(
            <() as FunctionArgs<u32>>::into_func(start_buf),
            translated
                .context
                .as_ref()
                .map(|ctx| (&**ctx) as *const VmCtx as *const u8)
                .unwrap_or(std::ptr::null()),)};

    // let result: u32 = translated.execute_func(0, (5u32, 3u32))?;

    println!("f(5, 3) = {}", result);

    assert_eq!(result, 42);

    Ok(())}

#[test]
fn test_00() {
    main().unwrap();}

(adding 5 + 3, even with right calling convention, unfortunately does not return 8 yet).

@zeroexcuses
Copy link

I think I just got passing arguments + adding working. This issue appears to be an out of sync comment + off-by-1 with passing arguments on sysv. In particular, https://github.com/bytecodealliance/wasmtime/blob/main/crates/lightbeam/src/backend.rs#L587 needs to have the rdi register appended to the front of it.

I am now interested in throwing the entire wasm test suite at refactored-lightbeam, and seeing what breaks. Is there a standard way of 'throwing entire wasm test suite" ?

@alexcrichton
Copy link
Member

I have an initial PR for implementing this in cranelift and wasmtime at #3153

@alexcrichton
Copy link
Member

Added in #3153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests