Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of WASI support #610

Closed
wants to merge 14 commits into from
Closed

Draft of WASI support #610

wants to merge 14 commits into from

Conversation

dicej
Copy link

@dicej dicej commented Aug 5, 2022

This is a first pass at implementing support for WASI. It adds a new TeaVMTargetType of WASI, which is mostly the same as WEBASSEMBLY except it generates a module suitable for execution by a WASI-compatible runtime such as Wasmtime.

Currently supported features:

  • A simple allocator for supporting the component model canonical ABI calls
  • CLI argument marshaling
  • Standard in/out/err
  • Filesystem operations and I/O
  • System clock
  • Environment variables
  • System random numbers

See tests/wasi/src/main/java/wasi/Test.java and tests/wasi/test.sh for examples.

Not yet implemented:

  • Networking (i.e. sockets)
  • Process exit, yield, etc.
  • DWARF debugging support
  • Better logging of uncaught exceptions

Note that the current implementation is probably overly paranoid in that all
pointer arguments passed to WASI functions are allocated by the aforementioned
allocator rather than on the Java heap. I've done this because I'm not clear on
when TeaVM guarantees that object address will not change. Thus I've assumed
that they could change at any time whatsoever, which is probably not true.

Also note that the filesystem implementation currently throws generic
IOExceptions and ErrnoExceptions rather than those specified by the standard
(e.g. FileNotFoundException). Additional work will be needed to throw the
correct exception based on the errno value.

The main goal of this PR is to start a discussion about WASI support in TeaVM and answer questions such as:

  • Is WASI support a reasonable goal for the TeaVM project?
  • If so, is the approach of this PR roughly correct, or are significant changes needed?
  • What other features are missing to make this useful?

Signed-off-by: Joel Dice joel.dice@fermyon.com

This is a first pass at implementing support for [WASI](https://wasi.dev/).  It
adds a new `TeaVMTargetType` of `WASI`, which is mostly the same as
`WEBASSEMBLY` except it generates a module suitable for execution by a
WASI-compatible runtime such as
[Wasmtime](https://github.com/bytecodealliance/wasmtime).

Currently supported features:

- A simple allocator for supporting the component model [canonical
  ABI](https://github.com/WebAssembly/component-model/blob/main/design/mvp/CanonicalABI.md)
  calls
- CLI argument marshaling
- Standard in/out/err
- Filesystem operations and I/O
- System clock

See tests/wasi/src/main/java/wasi/Test.java and tests/wasi/test.sh for examples.

Not yet implemented:

- Networking (i.e. sockets)
- Environment variables
- System random numbers
- Process exit, yield, etc.
- [DWARF](https://dwarfstd.org/) debugging support
- Better logging of uncaught exceptions

Note that the current implementation is probably overly paranoid in that all
pointer arguments passed to WASI functions are allocated by the aforementioned
allocator rather than on the Java heap.  I've done this because I'm not clear on
when TeaVM guarantees that object address will not change.  Thus I've assumed
that they could change at any time whatsoever, which is probably not true.

Also note that the filesystem implementation currently throws generic
`IOException`s and `ErrnoException`s rather than those specified by the standard
(e.g. `FileNotFoundException`).  Additional work will be needed to throw the
correct exception based on the `errno` value.

The main goal of this PR is to start a discussion about WASI support in TeaVM
and answer questions such as:

- Is WASI support a reasonable goal for the TeaVM project?
- If so, is the approach of this PR roughly correct, or are significant changes needed?
- What other features are missing to make this useful?

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@konsoletyper
Copy link
Owner

Let's start with discussion: what is the purpose of running Java in WASI? Why not just running JVM?

@konsoletyper
Copy link
Owner

As for introducing separate target for WASI: currently it looks quite dirty. I'd consider updating wasm-runtime.js to emulate WASI in the browser.

@jcaesar
Copy link

jcaesar commented Aug 6, 2022

Let's start with discussion: what is the purpose of running Java in WASI? Why not just running JVM?

Haven't we already talked about this? My reasons are:

  • Ecosystem: I can offer a WASM/WASI plugin system for my application and don't have to worry whether it's C or Java running in there.
  • Lightness: JVMs are typically huge.
  • Performance encapsulation: Even if you lock down a class loader fully with a security manager, its classes can still create threads, allocate as much memory as it wants, and run forever. WASM locks down more tightly.

@konsoletyper
Copy link
Owner

This PR also lacks the way of running classlib tests for WASI target.

@konsoletyper
Copy link
Owner

As for buffers. TeaVM GC can move objects. However, it does not move any objects to which there are references from stack. This means that for any non-asynchronous call it's ok to use byte array in heap instead of custom allocator.

@dicej
Copy link
Author

dicej commented Aug 6, 2022

@konsoletyper First, thanks for all the work you've put into TeaVM. The WebAssembly support in particular is quite impressive and useful. I've looked at other similar projects, but TeaVM seems to be the most robust and flexible.

Let's start with discussion: what is the purpose of running Java in WASI? Why not just running JVM?

Excellent question. @jcaesar mentioned a few reasons related to the precise control and security guarantees you get when embedding untrusted Wasm code in an application.

Another use case is lightweight, multi-tenant cloud computing tasks. Companies like Fastly, Shopify, and Fermyon (my current employeer) are using WASI and the component model to host and run customer code in a secure, platform-, architecture-, and language-agnostic way. A traditional JVM is not a great fit for this for a few reasons:

  • Sandboxing code in a JVM (e.g. via SecurityManager) has a spotty track record, whereas the WebAssembly "shared nothing" approach to isolating modules and components is both simple and reliable.
  • The JVM is great for languages which were designed for it (e.g. Kotlin, Scala, Clojure, and Java itself), but not a great fit for others like C, Python, Ruby, JS, etc. GraalVM helps address this somewhat, although it's too heavyweight for many use cases.
  • The JVM is fundamentally bigger and more complicated than a minimal Wasm runtime, so it's not a great fit for embedded (e.g. IoT and edge computing) workloads. There are a few lightweight JVMs out there (I helped write one), but they don't help much with the sandboxing and language agnosticism goals mentioned above.
  • JIT compilation is not a great fit for FaaS (AKA "serverless") workloads where startup time is paramount. In the WebAssembly world, we can use tools like Wizer to pre-initialize modules and then use Wasmer or Wasmtime to easily pre-compile them to native code, resulting in sub-millisecond startup times even for managed languages such as C#.

As for introducing separate target for WASI: currently it looks quite dirty. I'd consider updating wasm-runtime.js to emulate WASI in the browser.

To be clear, are you suggesting I should remove the WASI target and make the WEBASSEMBLY target generate a WASI-compatible module by default, which would rely on wasm-runtime.js to adapt it for use in the browser? If so, that sounds reasonable to me.

BTW, a WASI polyfill is already available: https://wasi.dev/polyfill/. Perhaps we could reuse that? I haven't looked at the code yet, though -- it might be too heavyweight for this purpose.

This PR also lacks the way of running classlib tests for WASI target.

Good point. I focused on testing the WASI-specific parts first, but I agree that we should run all the existing classlib tests. I'll put that on my TODO list.

As for buffers. TeaVM GC can move objects. However, it does not move any objects to which there are references from stack. This means that for any non-asynchronous call it's ok to use byte array in heap instead of custom allocator.

Thanks, that's helpful. I'll update the code with that in mind.

@bjorn3
Copy link

bjorn3 commented Aug 7, 2022

BTW, a WASI polyfill is already available: https://wasi.dev/polyfill/. Perhaps we could reuse that? I haven't looked at the code yet, though -- it might be too heavyweight for this purpose.

This polyfill is for an older version of WASI (wasi_unstable) and uses emscripten behind the scenes, which means it is about 400kb big. I have written a pure javascript polyfill for a subset of the current WASI version (wasi_snapshot_v0): https://github.com/bjorn3/browser_wasi_shim I believe there are also other wasi javascript polyfills.

@syrusakbary
Copy link

syrusakbary commented Aug 7, 2022

This PR is awesome. Great work on this @dicej. Adding some comments that I believe could be relevant:

it generates a module suitable for execution by a
WASI-compatible runtime

Can we make this PR runtime agnostic? I believe it will be beneficial to have the runtime as an environment variable rather than statically defined. That way we can test easily with Wasmer, Wizer or any other server-side Wasm VM.

BTW, a WASI polyfill is already available: https://wasi.dev/polyfill/. Perhaps we could reuse that? I haven't looked at the code yet, though -- it might be too heavyweight for this purpose.

Wasmer already ships the most popular WASI implementation for JS. It might be worth using it here (it's already used officially by the Ruby ecosystem): https://www.npmjs.com/package/@wasmer/wasi (specially because if tests pass when using Wasmer as a runtime in the Github CI, it will be assured that it will work exactly the same way on the browser). Thoughts?

dicej added 2 commits August 8, 2022 10:00
My previous commit was based on the assumption that objects allocated on the
Java heap could be moved by TeaVM at any time.  It turns out TeaVM will not move
objects which are referenced by stack variables, so we can use those instead of
`malloc`ed buffers.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Per review feedback, this removes `TeaVMTargetType.WASI` and makes
`TeaVMTargetType.WEBASSEMBLY` generate a WASI module unconditionally.  Such
modules won't run in a browser without a polyfill, which I plan to add in a
later commit.

Along the way, I noticed the `Console` intrinsics weren't working properly and
fixed them.

Finally, this adds a new `WasiRunStrategy` which runs the full TeaVM test suite
using either Wasmtime or Wasmer.  You can try it by running the following from
the test directory:

```
mvn -e install \
  -Dteavm.junit.js=false \
  -Dteavm.junit.js.runner=none \
  -Dteavm.junit.wasm=true \
  -Dteavm.junit.wasm.runner=wasi-wasmer
```

Specify `-Dteavm.junit.wasm.runner=wasi-wasmtime` to use Wasmtime.

Currently many of the tests are failing, but I haven't had a chance to
investigate yet or compare the results to the same tests running in a browser
using an unmodified TeaVM.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@jcaesar
Copy link

jcaesar commented Aug 9, 2022

Thoughts?

How big is WasmerJS's WASI? From what I see, the average TeaVM module doesn't need much more than fd_write(2, …) for logging error messages. Maybe 1kB in JS? (Modules targeting the web also have some web-api imports, but they won't be available through WASI in any case.) Bumping up the size of all TeaVM modules by a mostly unused WASI may be wasteful.

@syrusakbary
Copy link

WasmerJS's WASI is around 120Kb gzip-compressed.

From looking at the PR, it seems that the WASI requirements are mainly on the filesystem and clock:
https://github.com/konsoletyper/teavm/pull/610/files#diff-742a3b3321470e2a546d2b28641ed6fe885a0591c96e5fbec7bddd3e921fd15cR54-R119

To implement those properly, you will usually require a in-memory FS (which is where most of the bulk comes from, looking at Wasmer-JS).

dicej added 9 commits August 9, 2022 12:13
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Previously, TeaVM used imported functions for timezone resolution, char case
conversion, and floating point tests.  Since WASI doesn't support any of those
things, we implement them in Java as best we can.  Floating point tests are
easy; timezone resolution will have to wait for
WebAssembly/WASI#467 (we hard-code UTC for now); case
conversion can be done entirely in Java, although I've only handled ASCII
characters in this commit.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
One of the tests had been running for 90 minutes when I gave up on it.  Now we
time out any tests that takes longer than 5 minutes.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
A previous commit provided stub implementations of `getNativeOffset`,
`toUpperCase`, and `toLowerCase`.  This reverts that change and instead uses the
JS versions in the browser.  It won't work in a WASI runtime without providing
the extra imports to the module, but there's currently no way to get the local
timezone in WASI, and I'm not yet ready to take on a proper implementation of
`toUpperCase` and `toLowerCase`.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
This uses https://github.com/bjorn3/browser_wasi_shim to emulate WASI features.
Note that I had to change the API for TeaVM `main` since `browser_wasi_shim`
requires that CLI arguments are known before the module is loaded.

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@dicej
Copy link
Author

dicej commented Aug 10, 2022

Status update: I've removed TeaVMTargetType.WASI and made TeaVMTargetType.WEBASSEMBLY generate a WASI module unconditionally (but see caveat below). I've also updated wasm-runtime.js to use @bjorn3's browser_wasi_shim, so these modules run fine in either a WASI-compatible runtime (e.g. Wasmer or Wasmtime) or a browser.

I've also added a new WasiRunStrategy class for executing the full test suite. It takes almost 3 hours to run on my machine, and there are a lot of failures, but that's comparable to a browser-based test suite run for the Wasm target.

Finally, I've implemented support for environment variables and random numbers, as well as pure Java implementations of isNaN, isInfinite, and isFinite.

Caveat: There are still several functions in the class library which require non-WASI imports from the host, including local timezone resolution, character case conversion, and various math functions. Code which uses any of those features will result in a module that imports functions not provided by WASI and thus won't run on a WASI runtime unless those imports are explicitly satisfied (e.g. by an embedding application). For timezone resolution, we can use WebAssembly/WASI#467 if/when it is implemented. The other functions could in principal be implemented in pure Java with no host support, but I'll leave that as a future improvement.

@konsoletyper What do you think? Is there anything you'd like to see changed or added, or is this something you'd consider merging?

Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
@dicej
Copy link
Author

dicej commented Aug 23, 2022

@konsoletyper I'm guessing you're busy and that this PR isn't a high priority for you right now, which is totally understandable.

For the time being, it might be best if I create a friendly fork of the TeaVM repo for further WASI development and publish Maven packages under a different name (e.g. teavm-wasi) and group for anyone to use. We'd like to start using this at Fermyon, and I know there are others who are interested in using it as well.

Thoughts?

@nickvidal
Copy link

@dicej @konsoletyper we are interested in this PR.

@sdeleuze
Copy link

Same here

@dicej
Copy link
Author

dicej commented Sep 2, 2022

Per my above comment, I've created a friendly fork of this project and published an initial release to the Maven Central repository under the com.fermyon group.

I'll be using that fork for WASI and Component Model work going forward. Feel free to open issues and/or PRs there if you're interested in helping out.

@dcodeIO
Copy link

dcodeIO commented Nov 8, 2022

I'd like to caution: Java fits Web APIs and JavaScript almost naturally due to its many similarities, an advantage that makes it a great source language for WebAssembly in actual browsers due to sharing many concepts. Neither WASI nor the Component Model share Java respectively Web/JS concepts, but are afaict attempts to establish a new foundation where the disadvantages of the few blessed first-class languages become every other language's disadvantage over mere opinion. This, and especially that this PR now proposes to unconditionally emit WASI modules even when targeting browsers, has a bunch of implications for anything with an almost natural fit for the Web platform - here effectively turning Java's good fit into concrete disadvantages, none of which is mentioned as a caveat by those now trying to persuade TeaVM into supporting (and then indirectly promoting) the imho fundamentally misguided and, frankly, questionable approach to Web standards they are responsible for themselves.

FYI, take care :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants