Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wit-bindgen does not include reference to Store in generated host interfaces #313

Closed
stevelr opened this issue Sep 14, 2022 · 8 comments
Closed

Comments

@stevelr
Copy link

stevelr commented Sep 14, 2022

There are some use cases that require a host to know some information about the module that invoked it. Examples include access to secrets, or maintaining separation of data in a multi-tenant server. How the host handles a call at runtime could depend on the access rights of the user that loaded the module, or perhaps capabilities that were attached to the module itself (for example, wasmcloud attaches a JWT with signed capability claims to a custom section of the wasm module).

It's entirely possible that I've overlooked something and wit-bindgen isn't the best way to address this. To clarify intent, I'll go into a little more detail on the use case and challenge.

Suppose you want to use a KV-like WIT interface to allow webassembly modules to query a secrets store on a multi-tenant server. When the module requests access to a secret, the host must have some way to determine whether access should be allowed. If the api signature is just get(key) -> value, the host doesn't have enough information to resolve it. Given that a module's code may be "untrusted", we cannot allow parameters of the request to declare its own access rights, and we certainly don't want to compile credentials into the module. A multi-tenant server could manage access by placing a user-id in the Store at the time the module is loaded, and use that data at runtime to lookup access rights before executing the query. The mechanism used by the host for managing user ids or determining access rights is beyond the scope of the component model and WIT. However, for the host to properly execute the request from the module, it must have some trusted information about the user or account performing the request. The Store is a natural place to store that info, and it's trusted because it's not directly accessible by the module. If a reference to Store is generated as an extra parameter to the host call, the host can easily check permissions and decide how to process the request.

This issue made the same request, and was rejected because it appeared that there was a need for the host to call back into the module, in violation of the component model design. This proposal makes the same request, but the use case is slightly different. The host needs access to the Store not so that it can invoke the module, but so that it can get an identity token from the Store.

Ugly Work-arounds

There are some work-arounds to accomplish the goal, but they are much less desirable.

  • Add the Store parameter by manually generating the host binding interfaces - or hand-editing the generated code. This is messy, error-prone, and difficult to maintain.
  • Change all incoming interfaces and outgoing interfaces for the module to include an additional session token or context parameter. Since the user session information can be added to the Store, it can also be pulled from the Store to pass into the module with all invocations, so that it can be passed out again when the module calls the host. This is undesirable because it pollutes unrelated interfaces by adding identity or session information, complicates module implementation by requiring all methods to pass through an extra parameter, and makes the interfacess less portable by encoding information about how some particular host wants to represent sessions or identity. That goes against the principles of WIT.

Misc details:

  • wasmtime TypedFunc.call and .call_async require a mutable reference to Store. When that module in turn calls out to the host via a generated WIT binding, it is that outgoing interface that I'm proposing needs a read reference to the same Store. If the Store is accessed read-only, I don't think it violates the exclusive access guaranteed by the mut reference.
  • The host would have to ensure that there are no reentrant calls to the module. This is already a requirement.
  • The host cannot cause the module's memory to grow. I'm not sure of the details, but this requirement may imply that the reference to Store must be read-only, or at least that it doesn't grow. For any use case I can think of where the host would want to update module-specific data structures, the host could maintain a hashtable where the immutable key is in Store, so Store could still be read-only and the hashtable in host memory can grow arbitrarily.

Thanks for considering this.

@stevelr
Copy link
Author

stevelr commented Sep 14, 2022

  • It might be possible to implement this inside add_to_linker - at least in Rust - where Store or some part of it is passed into the closure that invokes the host bindings. It would then be possible for wit-bindgen to generate two different implementations of add_to_linker: one that adds Store params and one that doesn't, and so there would be no code change for anybody that doesn't need the Store parameter, and, as a bonus, no change would be required to the WIT source code. It would be a matter of the host selecting which function to call when adding the bindings to the linker.

@alexcrichton
Copy link
Member

In #268 as you pointed out it is different in that was about capabilities of what to do with the wasm runtime, whereas what you're searching for here is access to trusted data set by the embedder the guest can't modify. That should already be possible though because Wasmtime-based embeddings are modeled as:

impl TheWitBindgenGeneratedTrait for MyState {
    // ...
}

where you then call add_to_linker and project from &mut T within Store<T> into &mut MyState and trait methods are called on MyState. This means the signature for get in your KV scenario is:

trait KvStore {
    fn get(&mut self, key: &str) -> String;
}

Here you can store the trusted information in the MyState or whatever you implement the trait for:

struct MyDatabase {
    user_id: u32,
}

impl KvStore for MyDatabase {
    fn get(&mut self, key: &str) -> String {
        println!("getting {key} for user {}", self.user_id);
        unimplemented!()
    }
}

Does that work for your use case? Your state within Store<T> could either be Store<MyDatabase> or it could be something like Store<(WasiCtx, MyDatabase)> if you want to implement both WASI calls and your own calls.

@r12f
Copy link

r12f commented Oct 2, 2022

Hi Alex, I have met similar issues, when using wit-bindgen. The model we are currently using as you mentioned above is actually great, which allows us to directly access the contexts and tables that we need for different capabilities, and I like it a lot. The only problem I am hitting is when I need to access the context for WASI.

Say, I created a API to bind TCP socket, and I would like to add it into the WASI table, just like the preopened sockets. The WASI context is saved in the store, but i could not get it back, so I ended up doing this: r12f@7899ffb.

In short, the code change is this:

-                            let (host, tables) = get(caller.data_mut());
+                            let (host, _wasi_ctx, tables) = get(caller.data_mut());

With this, I could add the socket into the WASI table in the API:

    fn tcp_connect(
        &self,
        wasi_ctx: &mut WasiCtx,
        remote_endpoint: &str,
    ) -> Result<socket::RawFd> {
        // ...
        Self::allow_socket_in_wasi(wasi_ctx, socket_fd, wasi_socket); // Add in table.
    }

Not sure if I am doing it in the right way or if it makes sense. If so, maybe we could consider adding a flag, say expose_wasi: true, to enable this behavior.

@alexcrichton
Copy link
Member

@r12f for your use case you'd want:

struct MyState {
    wasi: WasiCtx,
    // ... other user-defined state you need
}

and with Store<MyState> you'd have:

impl TheWitBindgenGeneratedTrait for MyState {
    // ...
}

followed with:

wasmtime_wasi::add_to_linker(&mut your_linker, |state| &mut state.wasi) along with the_wit_bindgen_module::add_to_linker(&mut your_linker, |state| state)

Does that make sense?

@r12f
Copy link

r12f commented Oct 4, 2022

Hi Alex, yep I understand what you are saying and if we only have 1 host component, it would makes total sense. However, considering we have multiple components, each components could have their own state, while WASI is kinda shared among all the components, which is why I was thinking about exposing it separately.

Say we have 2 resources in wit file: resource-a and resource-b. After bind gen, we would get these 2 link functions below. And this is I like the most - type U: ResouceX splits the per resources/class data, making each class can only access its own data and won't be able to mess up others'.

pub fn add_to_linker<T, U>(
        linker: &mut wasmtime::Linker<T>,
        get: impl Fn(&mut T) -> (&mut U, &mut IcmpTables<U>) + Send + Sync + Copy + 'static,
    ) -> anyhow::Result<()>
    where
        U: ResourceA,
        T: Send

pub fn add_to_linker<T, U>(
        linker: &mut wasmtime::Linker<T>,
        get: impl Fn(&mut T) -> (&mut U, &mut IcmpTables<U>) + Send + Sync + Copy + 'static,
    ) -> anyhow::Result<()>
    where
        U: ResourceB,
        T: Send

And then we initialize the store with the context below:

pub struct StoreContext {
    pub wasi: WasiCtx,
    pub resource_a_context: ResourceAContext,
    pub resource_b_context: ResourceBContext,
}

pub struct ResourceAContext {
    pub data: ResourceA,
    pub table: ResourceATable<ResourceA>,
}

With this structure, each resources would have its own context - data and table. And now the question is how could we expose WASI as a common resource.

One way is of course using the approach I am currently doing, change the get function to return WASI reference. The other way would be making ResourceA and ResourceB holding a mutable reference to WasiCtx. It might be doable with maybe some lifetime tweaks, but the code could be messy. Hence, in the end, I went with my current approach to make the code easier to maintain.

But there could be an even better way that I haven't seen yet. If so, that would be awesome!

@alexcrichton
Copy link
Member

Ah yeah once resources come into play then everything gets weird. The generated APIs for resources have not really received much feedback/testing/etc, it's still largely in the proof-of-concept stage and will have a lot of churn.

If it worked, which I realize that it doesn't today, would it solve your use case if the host API could take a wasi resource directly? For example your function which uses resource-a needs to access WASI, so would it make sense for one of the arguments to that method to be a WASI file or directory or socket or something like that? If so I think that's the best way to model this going forward (and once resources are fully fleshed out). Otherwise though this is definitely good feedback to figure out how to get the embedder APIs flexible enough here.

Sorry I realize this isn't a great answer, but it's tough to talk about concrete changes given the flux resources are in right now. I just made a PR to remove resources and will add them back in once they're specified upstream in the component-model, so the best answer for now is the best thing to do is to try to distill the fundamentals of what your use case requires and ensure that it informs the design of resources at the root to ensure there's a path to accomodate it here in wit-bindgen's code generation.

@r12f
Copy link

r12f commented Oct 4, 2022

If it worked, which I realize that it doesn't today, would it solve your use case if the host API could take a wasi resource directly?

Yeah, I think it is pretty close! There are 2 types of functions in the resource:

  • Static methods, mostly like factories, such as new/create-x. In this case, we could pass the WasiCtx or table in, so we can create and track things.
  • Resource methods which is tied to a specific resource instance. In this case, a specific WASI file object or whatever we put in the table would makes total sense, just like you says : D
    • This is also the part confuses me - I would expect the get function returns 2 things essentially - class object data and instance data, however the type that the table takes is also the class.
    • E.g. instead of ResourceA and ResourceATable<ResourceA>, I was actually expecting ResourceA and ResourceATable<ResourceAInstance>. Not sure if anything I missed.

Sorry I realize this isn't a great answer, but it's tough to talk about concrete changes given the flux resources are in right now.

Your answer is absolutely awesome, as always! And yea! I totally agree with you. If it is not finalized, then it might be hard to get any concrete changes out. In the meantime, I will continue to use my change as the short-term workaround. And I am looking forward to the final design coming out very very much!

@alexcrichton
Copy link
Member

I'm going to close this for now to revisit in the future as necessary with a concrete implementation for resources. I do think that this is going to be somewhat tricky to implement but at the same time I'm hopeful it's also not the end of the world to implement. How to bind resources in Wasmtime is an open question at this point and I think it will be easier to consider this with code to work with rather than the current state of "it's not actually implemented".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants