-
-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from Wasmer to Wasmtime #3349
Conversation
.appender("logFile") | ||
.build("wasmer_compiler_cranelift", LevelFilter::Warn), | ||
.appender("logPlugin") | ||
.build("wasmtime_wasi", LevelFilter::Warn), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go to logFile
or logPlugin
? On the one hand these are logs in response to plugin actions, on the other hand they are not logs directly produced by the plugin itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't matter that much, they go into the same file in the end. But might indeed be nicer to change it to logPlugin since then we'll see the plugin id too.
In Wasmer trying to create a file with a relative path creates it in the |
CI should be fixed now. |
ecdb67f
to
c23a4b5
Compare
(Rebased. Will likely keep doing this as soon as conflicts arise to avoid accumulating conflicts.) |
Given that this PR can break many behaviors for plugins on Zellij, we can commit to update Cranelift in Wasmer very soon, so it doesn't become a problem for the project |
Much appreciated @syrusakbary !! But since I don't want you to perform extra work, I will emphasize something I mentioned elsewhere: we will not consider any upgrade of wasmer that includes wasix or any sort of extra dependency that comes with wasix. Given that I understood this is a big ask for your project (totally legit ofc, every project and application has different requirements), I gave @bjorn3 the go-ahead for this work. If all checks out with it (I haven't gone over it yet) and we manage to iron out the differences with plugins you rightfully mentioned (some of them have already been brought up as points of order by @bjorn3 ) - we will go through with this change. |
Is not a big ask at all! Because WASIX is a superset of WASI, is actually feasible to only ship the base calls of WASI –for example– via a feature flag in Cargo. We can do that for you guys so you can upgrade Wasmer easily and without the burden of changing runtimes and plugins! I've created this issue so we can follow up on that quickly: wasmerio/wasmer#4722 |
Thanks @syrusakbary ! As mentioned here and elsewhere, a feature flag will probably not suffice. We also need things like dependencies that are only used in wasix not to be included at all (for our packagers). This would likely have to be an entirely different crate (though I of course do not know about your internals). But as I said - the work has already started here, so I intend to see it through. If we hit a road block that prevents us from migrating, or feel the migration is too risky - I'll be sure to check out the work in the linked issue to see if it would be a better path forward for us. Thanks for your efforts and I am sorry things have to be this way, but we've sounded this alarm before a few times and we have to think of our application first. |
@imsnif of course, no worries! Do what you think is best for the project. Just a minor clarification: on the Wasmer side, we can turn off the WASIX syscalls, dependencies and code paths incredibly easily via a feature flag, so a new crate shall not be needed if you would like to stick with Wasmer :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bjorn3 - first, thanks so much for your thorough work on this. This is a major infrastructure change for us and I do not want to take it lightly - so I hope you don't mind being patient with my review cycle.
I'm going on a short vacation tomorrow and didn't want to leave this hanging without a response while I'm away. So I added some comments about things that immediately jumped out at me to get us started. I will give this a more thorough shake-up and review when I get back around the beginning of June.
In Wasmer trying to create a file with a relative path creates it in the /host directory. In Wasmtime it will result in an error that there is no "pre-opened file descriptor" through which the file can be opened. As WASI doesn't have any such concept as a current working directory, the Wasmtime behavior is more natural. It does break the current released version of zjstatus however. dj95/zjstatus@25e3a56 works around this. I tried to work around the difference, but while it fixed this specific issue, it broke a lot more things. As such I would prefer documenting the difference rather than making Wasmtime behave the same way as Wasmer.
I've been thinking about this specific issue and while I hear you about the difficulty with the behavior change, I'm not enthusiastic about it. zjstatus
is just one example that we happened to catch, I'm sure there are more. Backwards compatibility of already-compiled plugins is a strong value for our ecosystem.
I'm not saying this is a deal-breaker, but I'd like to understand the trade-offs before going ahead with this. What other stuff breaks if we try to work around this problem? How big of a hack is it? Will it affect upgrades? What else?
@@ -84,7 +84,7 @@ fn start_zellij(channel: &mut ssh2::Channel) { | |||
) | |||
.unwrap(); | |||
channel.flush().unwrap(); | |||
std::thread::sleep(std::time::Duration::from_secs(1)); // wait until Zellij stops parsing startup ANSI codes from the terminal STDIN | |||
std::thread::sleep(std::time::Duration::from_secs(3)); // wait until Zellij stops parsing startup ANSI codes from the terminal STDIN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why we need the increase here? I don't mind increasing it in in this case of the e2e tests if we need it, but I'd prefer to understand what happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it, the first snapshot for several tests seem to be taken while still compiling wasm, which suggests that compiling is slower. This is surprising to me as locally it actually seemed to be faster. Maybe the singlepass compiler of Wasmtime is slower than the one of Wasmer and only the optimizing compiler is faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Winch is about 2-6 times slower compiling than Wasmer's Singlepass for non-trivial applications (>2Mb), you can check more examples such as time to compile Spidermonkey or ffmpeg on each runtime on this article:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All builtin plugins together add up to 2MB, but point taken. I agree Winch being a fair bit slower at compiling is likely the cause here. Caching compiled wasm modules between tests rather than recompiling every single time would likely help a lot. I haven't looked into how hard that would be, but Zellij already caches compiled wasm modules under normal operation. To be honest I'm surprised it isn't already cached between tests. Or maybe it is already getting cached, but would just need a warmup with a higher timeout before any tests run?
I just reproduced this behavior in standalone Wasmer. Turns out it is completely disregarding the intention of the WASI specification: fn main() {
println!("{:?}", std::fs::write("foo", ""));
println!("{:?}", std::fs::read("foo"));
println!("{:?}", std::fs::read_dir("/").map(|dir| dir.collect::<Vec<_>>()));
println!("{:?}", std::fs::read_dir(".").map(|dir| dir.collect::<Vec<_>>()));
println!("{:?}", std::fs::read_dir("/tmp/foo/x").map(|dir| dir.collect::<Vec<_>>()));
} when run in wasmtime with a
as I would have expected while wasmer prints
In other words rather than passing through the directories the user specified as pre-opened directories as seems to be intended by the WASI specification, it creates an in-memory virtual filesystem whose contents are discarded upon exit and inside this VFS it mounts the user specified directories. The pre-opened directories it passes are
It would probably be possible to emulate this with wasmtime-wasi, but I think it would only hide bugs inside plugins. From what I can tell before dj95/zjstatus@25e3a56 the |
2b68791
to
7d6821c
Compare
Hey @bjorn3 - I took this for a spin and am seeing some weird behavior. Could it be that a plugin's state is being reset when a new instance is loaded for some reason? To reproduce:
|
I can't reproduce this behavior locally. For me the counts are preserved as I switch between tabs. Peek.2024-06-07.17-36.webm |
Also when the second plugin instance is initially launched? That's when it gets erased for me. |
Yes, opening a new tab preserves the state too: Peek.2024-06-07.17-36.webm |
Alright, I'll do some troubleshooting next week and find out what's up. |
Hey, so first - I did some testing and this was a configuration error on my part. My apologies! Otherwise, I'm still testing this PR. I'm sorry for not getting to do a full review this week, it was on my schedule and got pushed off. I fully plan to do so next week. Thank you for your patience @bjorn3 ! |
This will enable PluginEnv to be the Store context when migrating to Wasmtime.
This will allow removing the Clone impl from PluginEnv when migrating to Wasmtime as required by the missing Clone impl on Wasmtime's WasiCtx.
Wasmtime requires storing the read/write end of the pipe outside of the WasiCtx. Passing PluginEnv to these functions allows storing them in the PluginEnv.
Let's do it before if we're already here. Do you think there are significant changes that will affect us? |
Updated. No api changes that affects us. Only had to ensure the new std feature of wasmtime is enabled. |
By the way, I forgot to mention that I disabled the "pooling allocator" too. It is a performance optimization for when you quickly drop and recreate instances of a wasm module. For example creating an instance for each request in an http server. Are plugin instances meant to be quickly destroyed again after invoking a command inside them or do they live for a while in Zellij? |
They definitely live for a while. There can be cases where a plugin is spun up to deal with a request (eg. through a CLI pipe) and then kills itself, but I don't think there's a use-case for it then spinning itself up again multiple times. |
So where things are now: I'm going to give stacab a few days to see if he can test this with some Go plugins, and if all is well or he doesn't have time, I'll give it one more shakeup with the new version and merge if all is well. |
Alright. Things look good, multiple people have also tested this manually in various scenarios, so I'm giving this the go. I would like to emphasize that this is not a criticism of wasmer in any way or form. Wasmer has been a great runtime for us over the years and has served us well. A heartfelt thank you to the maintainers. Thank you very much for your work on this @bjorn3 ! |
@bjorn3 - I hope it's alright to consult you about this after we've already merged. Working with
I'm using Any ideas? |
Of course.
I will look into it. Do you have a backtrace for the crash? |
Thank you! I unfortunately do not have a backtrace, but I'll try to keep an eye on this and produce one if it happens again. |
Looks like the assertion in question is a debug assertion, so it would only be hit during development: https://github.com/bytecodealliance/wasmtime/blob/cedf9aa0f9620391cab465361b474b149654e88d/crates/wasmtime/src/runtime/type_registry.rs#L740 I will ask someone familiar with this code if they know what is wrong. |
Hi @fitzgen - thanks for the very quick response! I unfortunately do not have a test case or steps to reproduce. Application-wise, this seems to happen pretty randomly (happened to me when I opened a new tab in Zellij which in this context causes new plugins to be instantiated). It happens to me occasionally while developing, I'll try to keep an eye out for the trace next time. Since this only happens in development though - on my side at least it's an extremely minor inconvenience and nothing to worry about. |
Hey @bjorn3 and @fitzgen - this now happened to me in release code:
I don't customarily run with |
Unfortunately, it is hard to debug this issue with so little information. Can you confirm whether you are using Wasm GC types (and specifically the I will do another review of some potentially relevant code paths. |
I think I've figured out what this is actually. Will link the PR when I have it up. |
Much appreciated @fitzgen ! I can totally understand this is not trivial to troubleshoot with so little information. I'm working with the Zellij main branch as a matter of course and have now made sure to also have As for specific questions, I did not do the integration myself - so I think @bjorn3 can answer them best. But hopefully as you say you figured it out and this will not be needed. |
No wasm GC types are used in the external interface as it is using wasip1 just like the Wasmer code. And as most (all) plugins are written in Rust, there is also no GC types usage expected internally within the plugins. |
Thanks for providing additional info. Fix coming soon, I've managed to reproduce locally and verify the fix. Aside: it might be neat to install a custom panic handler to Zellij that captures the stack trace and panic message and logs them to a file or something like that. Could make debugging this kind of bug (whether in Wasmtime, another dep, or Zellij itself) easier in the future. Anyways, thanks for your patience while we figured this out! |
I totally agree. We actually do have a panic hook that is tied to some other internal app logic. I very much agree we should get it to do the right thing in this case. Like everything, it's a matter of developer capacity which is something we're unfortunately in short supply of. Thanks for the quick turnaround on this! |
@fitzgen which PR fixed this bug? |
The bug was GHSA-7qmx-3fpx-r45m so it ended up not having an official PR for it but rather a direct merge |
Opened #3685 to update Wasmtime. |
* Remove ForeignFunctionEnv wrapper around PluginEnv This will enable PluginEnv to be the Store context when migrating to Wasmtime. * Pass PluginEnv by value to load_plugin_instance This will allow removing the Clone impl from PluginEnv when migrating to Wasmtime as required by the missing Clone impl on Wasmtime's WasiCtx. * Avoid passing a Store around when an Engine is enough * Pass PluginEnv to the wasi read/write functions Wasmtime requires storing the read/write end of the pipe outside of the WasiCtx. Passing PluginEnv to these functions allows storing them in the PluginEnv. * Migrate to Wasmtime * Switch from wasi-common to wasmtime-wasi * Reduce verbosity of wasmtime_wasi logs * Increase startup delay To wait for all plugins to be compiled. * Disable some wasmtime features * Update to Wasmtime 21.0.1
Zellij is currently unable to update Wasmer as the currently used version is the last one without mandatory wasix support which Zellij doesn't want to adopt for various reasons. In the mean time a CVE has been found in the Cranelift version used by this Wasmer version. While it has been worked around in #2830, future CVE's may not be possible to workaround without updating Cranelift and thus Wasmer. This PR switches from Wasmer to Wasmtime to avoid all these problems.
Each individual commit builds. All commits except for the last two commits keep using Wasmer and thus could land without the actual switch to Wasmtime if preferred.