Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(wasi): don't fail select() on empty subscriptions #1580

Closed
wants to merge 1 commit into from

Conversation

haraldh
Copy link
Contributor

@haraldh haraldh commented Jun 10, 2022

return Ok(()) rather than Err(EINVAL)

Signed-off-by: Harald Hoyer harald@profian.com

return `Ok(())` rather than `Err(EINVAL)`

Signed-off-by: Harald Hoyer <harald@profian.com>
Copy link
Collaborator

@Thomasdezeeuw Thomasdezeeuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this one...

I think we'll want to match other OSes, which would wait "forever" (no timeout), when hitting this case. I'm mostly concerned about two cases:

  • Users that expect polling an empty poll to sleep/poll forever until, e.g., a process signal wakes them.
  • Users calling poll in a loop.

The case would be broken, as we won't sleep forever, and for the second case we'll create an infinite loop.

@haraldh
Copy link
Contributor Author

haraldh commented Jun 13, 2022

Well, there are no other sources which could interrupt this call... no signals, no other threads.

@Thomasdezeeuw
Copy link
Collaborator

Well, there are no other sources which could interrupt this call... no signals, no other threads.

I think that's only currently the case. I think wasm/wasi could be expanded to support both signals and/or threads (in some fashion).

@haraldh
Copy link
Contributor Author

haraldh commented Jun 13, 2022

so what? change the wasi poll? loop forever in mio? leave the error as is?

@Thomasdezeeuw
Copy link
Collaborator

so what? change the wasi poll? loop forever in mio? leave the error as is?

Good question(s), don't have a good answer without some more information. For now, I think our best bed it not doing anything in Mio, i.e. return the error by the OS (well, WASI runtime).

Next, we'll have to find the related use cases where this branch could be hit. Than we'll compare what happens on other platforms and see if we can get a single uniform (and desired) behaviour.

@npmccallum
Copy link

@Thomasdezeeuw It is impossible to "match other OSes."

POSIX states the following for poll():

If none of the defined events have occurred on any selected file descriptor, poll() shall wait
at least timeout milliseconds for an event to occur on any of the selected file descriptors. If
the value of timeout is 0, poll() shall return immediately. If the value of timeout is -1, poll()
shall block until a requested event occurs or until the call is interrupted.

In contrast, poll_oneoff() in WASI has no timeout parameter. This implies an ambiguity as to whether poll_oneoff() should wait forever (timeout = -1) or return immediately (timeout = 0). The WASI poll_oneoff() definition does not define the behavior when no subscriptions are provided.

Further, wasmtime appears to return EINVAL, which is neither return immediately nor sleep indefinitely. Hence, the most popular implementation choses neither of the POSIX options.

@sunfishcode Do you have any suggestions here?

@npmccallum
Copy link

@Thomasdezeeuw please note that even if wasm gains threads, the "wait forever" behavior still never makes sense because wasm does not have, and will not gain, support for signals. So I suspect that "return immediately" is the only logical behavior.

@sunfishcode
Copy link
Contributor

sunfishcode commented Jun 16, 2022

I can also add that it seems very unlikely that Wasm or WASI will ever add Unix-style asynchronous signals or thread cancellation, because they'd create many problems for JITs, and they'd interact in complex ways with host code called by Wasm code.

so what? change the wasi poll? loop forever in mio? leave the error as is?

The public API that calls this code seems to be Poll::poll, where my read of the documentation says that a call with an empty events should block indefinitely.

WASI's documentation could be more clear here, but the current intent is that poll_oneoff should not have a way to enter an indefinite sleep, because without asynchronous signals or thread cancellation, there's no way it could ever be woken up.

So my question is, who is calling Poll::poll with no events, and what are they expecting it to do? Are they expecting it to block indefinitely, and perhaps also expecting that there will be some way to wake it up? Or are they expecting something else?

@kulakowski-wasm
Copy link

I do want to highlight one other perspective.

It's not a priori clear to me that WASI should prevent or detect all "useless" behavior by its clients at the API level. Doing so may have a cost in either the ergonomics or the performance of the API for non-"useless" cases.

By analogy, deadlocking is useless, and yet typical mutex APIs do prevent deadlock by their construction. But they may provide other tools or diagnostics to detect it.

And so I second sunfishcode's question: why call Poll::poll(on nothing)?

If we knew that, I think it'd be easier to answer whether that particular call site of WASI poll should bear the costs of checking for no events, or whether WASI poll should detect that case itself.

@Thomasdezeeuw
Copy link
Collaborator

So my question is, who is calling Poll::poll with no events, and what are they expecting it to do? Are they expecting it to block indefinitely, and perhaps also expecting that there will be some way to wake it up? Or are they expecting something else?

@sunfishcode @kulakowski-wasm
I say would in most cases calling Poll::poll with no events is a bug in the calling code, but I could be wrong here. Most runtimes should check if their process/Future/what-ever list is empty (as that most likely means Mio's registered resources list is empty) before calling Poll::poll, because it means they can shutdown.

It's not a priori clear to me that WASI should prevent or detect all "useless" behavior by its clients at the API level. Doing so may have a cost in either the ergonomics or the performance of the API for non-"useless" cases.

@kulakowski-wasm
To be clear I'm not arguing for that, it's just that the docs in the WASI spec are a little light on the documentation of edge cases. For Mio it's a priority to achieve (well, attempt as much as possible) a good cross-platform API. But, as pointed out previously, without threads and process signals the use case(s) for calling Poll::poll without registering anything are slim.

So, I think we have three choices:

  • Change nothing, i.e. return what the WASI implementation returns (EINVAL for wasmtime).
  • Change to return immediately (this pr currently), possibly creating a infinite loop.
  • Change to add a "infinite" timeout event.

@Thomasdezeeuw
Copy link
Collaborator

I've opened #1585 to document the situation until we can make a decision on this.

@notgull
Copy link

notgull commented Jun 19, 2022

If calling Poll::poll without any targets is a bug, shouldn't it return an error? It feels counterintuitive to say "if this behavior which should only be called in an invalid state occurs, the bet option is to hang the program indefinitely." If I were writing a runtime based on mio, this behavior would be difficult to debug. Moreover, I'd hate to imagine the type of programmer to intentionally rely on this behavior, and whether this would be considered a "breaking change" when it was already broken in the first place.

In my opinion this should be an open issue for discussion.

@Thomasdezeeuw
Copy link
Collaborator

If calling Poll::poll without any targets is a bug, shouldn't it return an error? It feels counterintuitive to say "if this behavior which should only be called in an invalid state occurs, the bet option is to hang the program indefinitely."

It's not necessarily a bug though, if you have a process that support signals is valid to wait for a process signal using Poll::poll.

If I were writing a runtime based on mio, this behavior would be difficult to debug.

This is a bit of a straw man argument as this doesn't depend on Mio at all but rather on the implementation. For example, it should be fairly easy to determine the number of process/tasks/Future in a debuggable runtime, but that's not related to Mio.

Moreover, I'd hate to imagine the type of programmer to intentionally rely on this behavior, and whether this would be considered a "breaking change" when it was already broken in the first place.

It wasn't broken, WASI is just a different environment to the one Mio is usually run in (i.e. an OS process).

In my opinion this should be an open issue for discussion.

That's what we're doing right here... discussing this issue.

@notgull
Copy link

notgull commented Jun 19, 2022

I wasn't aware that this use case included signals as a halting mechanism. In that case, I feel it's perfectly reasonable to "halt until signal on signal-supported platforms, or error out on ones that don't."

@npmccallum
Copy link

@haraldh What prompted putting this patch in mio rather than handling it a layer above?

@haraldh
Copy link
Contributor Author

haraldh commented Jun 21, 2022

@haraldh What prompted putting this patch in mio rather than handling it a layer above?

Well, I would have put it in tokio, but there is no book keeping of the number of subscribed interests to check for subscriptions.is_empty().

diff --git a/tokio/src/io/driver/mod.rs b/tokio/src/io/driver/mod.rs
index 7d8650a3..98bc14d6 100644
--- a/tokio/src/io/driver/mod.rs
+++ b/tokio/src/io/driver/mod.rs
@@ -168,6 +168,13 @@ impl Driver {
         match self.poll.poll(&mut events, max_wait) {
             Ok(_) => {}
             Err(ref e) if e.kind() == io::ErrorKind::Interrupted => {}
+            #[cfg(target_os = "wasi")]
+            Err(e) if e.kind() == io::ErrorKind::InvalidInput => {
+                // FIXME: number of Interests != 0
+                if false {
+                    return Err(e);
+                }
+            }
             Err(e) => return Err(e),
         }

@sunfishcode
Copy link
Contributor

@haraldh I think we want to go up another level and understand why self.poll.poll here is getting called with an empty event list in the first place. If that's intended, what is the expected behavior? Is it expected to block indefinitely, fail, or succeed?

I expect we can change Wasmtime's poll_oneoff to meet your needs in this situation, but I want to understand what the application ultimately wants it to do.

@haraldh
Copy link
Contributor Author

haraldh commented Jun 21, 2022

@sunfishcode that is a good question, which we can ask @Thomasdezeeuw :)

@haraldh
Copy link
Contributor Author

haraldh commented Jun 21, 2022

Reading up on epoll_wait, which I think is what mio tries to resemble somehow, there are several cases, where the list subscriptions can be empty:

  1. you want to be waken up by any signal
  2. another threads adds interests

From the epoll_wait man page:

Note that it is possible to call epoll_wait() on an epoll instance whose interest list is currently empty (or whose interest list becomes empty because file descriptors are closed or removed from the interest in another thread). The call will block until some file descriptor is later added to the interest list (in another thread) and that file descriptor becomes ready.

@haraldh
Copy link
Contributor Author

haraldh commented Jun 21, 2022

@Thomasdezeeuw wrote:

@sunfishcode @kulakowski-wasm I say would in most cases calling Poll::poll with no events is a bug in the calling code, but I could be wrong here. Most runtimes should check if their process/Future/what-ever list is empty (as that most likely means Mio's registered resources list is empty) before calling Poll::poll, because it means they can shutdown.

Well, it seems all the tokio tests in this patch are affected and call poll with an empty list.

@haraldh
Copy link
Contributor Author

haraldh commented Jun 21, 2022

Ok, if the signal feature is activated in tokio, it always has a unix socket pair fd registered as a signal handler.

   2: tokio::io::poll_evented::PollEvented<E>::new_with_interest_and_handle
             at ./src/io/poll_evented.rs:110:28
   3: tokio::signal::unix::driver::Driver::new
             at ./src/signal/unix/driver.rs:77:24
   4: tokio::runtime::driver::create_signal_driver
             at ./src/runtime/driver.rs:60:22
   5: tokio::runtime::driver::create_io_stack
             at ./src/runtime/driver.rs:25:50
   6: tokio::runtime::driver::Driver::new
             at ./src/runtime/driver.rs:170:52
   7: tokio::runtime::builder::Builder::build_basic_runtime

and this one is missing in wasi, cause we disable the signal module in our PR.

@haraldh
Copy link
Contributor Author

haraldh commented Jun 21, 2022

Or the Waker is registered (mio::Waker::new() with eventfd).

@haraldh
Copy link
Contributor Author

haraldh commented Jun 21, 2022

so, this is the call trace:

impl Park for Driver {
    fn park(&mut self) -> io::Result<()> {
        self.turn(None)?;
        Ok(())
    }

and turn() calls self.poll.poll(), which is mio poll and because we have no Waker the subscriptions are empty.

@Thomasdezeeuw
Copy link
Collaborator

@haraldh I think this should be solved in Tokio. If the runtime doesn't have any processes to run, nor registered any I/O sources and can't be awoken by a process signal or another thread (like the case for WASM/WASI) then I think the runtime should return/shutdown as nothing else needs to/can be done.

@haraldh
Copy link
Contributor Author

haraldh commented Jun 23, 2022

ok, will do a follow-up tokio PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants