Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where do we define the Wasm "container" boundry #619

Closed
jsturtevant opened this issue Jun 12, 2024 · 9 comments
Closed

Where do we define the Wasm "container" boundry #619

jsturtevant opened this issue Jun 12, 2024 · 9 comments
Labels
question Further information is requested

Comments

@jsturtevant
Copy link
Contributor

We are doing a bunch of set up work for the wasm runtime inside the container and using rust log crate. We are effectively still operating as the "shim" until we invoke the Wasm binary for example

let status = start_func.call_async(&mut store, &[], &mut []).await;

This leads to some confusion from an end user perspective in some scenarios. For instance if a user is using a Rust wasm application and they set RUST_LOG on their podspec, then it will switch logging for everything in

fn run_wasi(&self, ctx: &impl RuntimeContext, stdio: Stdio) -> Result<i32>;
.

This could also happen when we start to enable tracing in #582

Since there is some additional set up of the wasm runtime, this leads an issue of where the boundary should live for the container vs wasm.

@jsturtevant
Copy link
Contributor Author

some further conversation and context in https://cloud-native.slack.com/archives/C06PC7JA1EE/p1717097248450109

@Mossaka
Copy link
Member

Mossaka commented Jun 12, 2024

Let me clarify why we are seeing "if a user is using a Rust wasm application and they set RUST_LOG on their podspec, then it will switch logging for everything in run_wasi"

There are two shim processes when started:

  • Parent shim process: started by containerd and uses containerd's env vars (e.g. RUST_LOG from containerd)
  • Child shim process: started by containerd and inherits parent shim process's env vars (e.g. RUST_LOG from containerd)

When the child shim process invokes youki's ContainerBuilder::build() it reads the env vars from the Container Spec, hence from the PodSepc, and changes the process's env vars (e.g. this line).

This happens prior to run_wasi. Hence everything after run_wasi is using the env vars from PodSpec.


Back to your question: "where the boundary should live for the shim vs wasm?". I think from the user's perspective, they would have assumed that their application environment variables (i.e. from PodSpec) should only be seen by their application (i.e. WASI environment consumed by Wasm modules).

The next question is how do we achieve that?

I am thinking about either

  1. moving Wasm runtime (Spin, Wasmtime, WasmEdge, etc.) out of youki's LibcontainerExecutor so that the runtimes won't be able to see env vars from PodSpec.
  2. telling youki to not mutate main process's env vars (i.e. disable the code from here) and passing env vars to the Wasm runtime's WASI context. This routes PodSpec env vars directly into WASI environemnt that only wasm modules can see

@kate-goldenring
Copy link
Contributor

I am concerned about these options. If we do not allow for configuring the Wasm runtime/Spin engine via container env vars, how will we do app level execution configuration? For example, how do we configure which port for a Spin app to listen on?

I'd prefer to continue to set these env vars on the Wasm runtime and tell the users a user story wherein for Wasm, the container env vars configure the Wasm runtime, especially given that Spin apps often contain multiple Wasm instances. Would setting container env vars mean setting those env vars in each instance's environment? This is part of the reason Spin encourages using configuration variables over environment variables.

I do understand that it is a different user story than that of containers. If there is a way we can configure the Spin runtime and Spin app env vars at a container level that would be a double win.

@jsturtevant
Copy link
Contributor Author

jsturtevant commented Jun 17, 2024

Would setting container env vars mean setting those env vars in each instance's environment?

right now, spin doesn't use the env's in the same way as wasmtime shim, so each instance only gets the the values in spin-toml as defined in https://developer.fermyon.com/spin/v2/variables


I think one of the challenges is there is a additional component in scope compared to the traditional container environment. One of the goals of runwasi is wrap the wasm execution in a "container" to provide additional security and smoother transition coming from the expectations of using containers.

The additional component comes from the fact that wasm has two parts: a host (spin or wasmtime) and guest (wasm binary). They both have unique configuration values beyond the container. With our current set up we run both in the "container" which means from an OCI perspective we don't really have the ability to distinguish between the two.

Should the host run inside the context of the container? I believe it should. A concrete example is given by @kate-goldenring : the host needs to know the ip and port to bind to.

How do we distinguish between host configuration and wasm configuration? After reading up on Spins approach to configuration. I am under the impression that we are not doing the right thing with the wasmtime shim:

let envs: Vec<_> = std::env::vars().collect();
. Spin has a nice way of solving this via the application variables, I am not sure if wasmtime does.

I am still unsure the correct solution across all shims... Maybe it is unique choice for each shim?

@kate-goldenring
Copy link
Contributor

kate-goldenring commented Jun 17, 2024

right now, spin doesn't use the env's in the same way as wasmtime shim, so each instance only gets the the values in spin-toml as defined in https://developer.fermyon.com/spin/v2/variables

Spin does also support environment variables. Similar to wasmtime run --env Foo=bar, spin up --env Foo=bar. But, as you mention, the values are static rather than dynamic; however, the static nature of environment variables in containers is normal.

We could also support the --env container arg to get environment variables injected into Wasm instances. Or, we could determine a prefixing syntax for determining which envs are for the runtime and should not be exposed to the guest. Or, we could move away from env vars in the guest and use something like application variables. Alternatively, we could support adding a .env file to the OCI artifacts and import those. I am throwing out ideas but i am also reluctant to drop support for envs for components executed by the wasmtime shim. So maybe I am coming to the same summary that "Maybe it is a unique choice for each shim?"

@jsturtevant
Copy link
Contributor Author

We could also support the --env container arg to get environment variables injected into Wasm instances. Or, we could determine a prefixing syntax for determining which envs are for the runtime and should not be exposed to the guest. Or, we could move away from env vars in the guest and use something like application variables. Alternatively, we could support adding a .env file to the OCI artifacts and import those. I am throwing out ideas but i am also reluctant to drop support for envs for components executed by the wasmtime shim. So maybe I am coming to the same summary that "Maybe it is a unique choice for each shim?"

The ideas are great, I agree it does come down to the shim (but maybe there is opportunity for Wasm and/or OCI to provide a way to standardize this longer term) and I also don't want to drop support the envs fully from wasmtime shim.

Having it be a unique choice for each shim, seems like a good approach for now.

@kate-goldenring
Copy link
Contributor

kate-goldenring commented Jun 18, 2024

@jsturtevant @Mossaka @devigned @cpuguy83 in our runwasi meeting this morning, we discussed this more synchronously, settling on a potential approach of setting all container env vars as application variables in the Spin shim. After some thinking, i found a caveat to this approach: this only works if you use the environment variable provider for application variables, which can also be set through other providers such as Azure Key Vault or Vault. This is because Spin currently only supports one application provider per app.

In our discussion on the sandbox API, @devigned pointed out that in general we should take the approach that is the most container-like to users. On that note, how do we handle injection of cluster information environment variables such as KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, OTHER_SERVICE_HOST, etc? To ensure that components can access these, they would need to be injected as environment variables rather than application variables. This is because if a user were to choose to use the Vault application variables provider, there would be no way to set these values as they are passed as container envs. We could have an allow list of envs we will inject, or we could inject everything into every component (unless it has a SPIN prefix to indicate it should only be used for the runtime). Or we could find a way to support multiple application variable providers so some variables (such as the cluster information ones) could be grabbed from the environment variable provider and others from another provider (say Vault).

Short term, if it is a defined list of environment variable suffixes and prefixes we can expect from Kubernetes, I think we may be best suited to set them as environment variables in all components

[[EDIT]]: Spin already supports multiple variables providers. [[variables_provider]] is a list; they get queried in order until one returns a value, so our approach of setting all container envs as application variables and leaving it up to components to select which ones to allow holds!

@devigned
Copy link
Contributor

\o/ for multiple variable providers!

@Mossaka
Copy link
Member

Mossaka commented Sep 17, 2024

I believe #668 has resolvevd the biggest concern of this issue - the concern for environment variables and configuration.

@Mossaka Mossaka closed this as completed Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants