Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-introduces internal fsapi.File with non-blocking methods #1613

Merged
merged 1 commit into from
Aug 7, 2023

Conversation

codefromthecrypt
Copy link
Contributor

Folks aren't currently on the same page on whether or not users should be able to implement the timeout parameter on File.Poll or whether it should always be externally controlled via a sleep loop which uses poll (immediate). Until we are sure internally, we shouldn't expose this for implementation, as it would be confusing for the same reason.

This migrates the non-blocking functionality back into the internal package.

One future way out could be to allow the user to decide if the backend is allowed to use the timeout parameter or not. Just as they control the FS, they could control a poller, which would then be able to span across both normal files and sockets. One could then be able to choose in the case of a single file and a specific known wasm, simply use the native timeout. In other words, since there are valid alternate answers, even though Go doesn't support custom pollers we could, if we decide that native polling is basically not allowed. Alternatively, we could force it to never be allowed by removing the timeout parameter (making it a poll immediate). Anyway, I will leave that decision up to @ncruces and @evacchi who will take responsibility of whichever decision is made.

// - No-op files, such as those which read from /dev/null, should return
// immediately true, as data will never become available.
// - See /RATIONALE.md for detailed notes including impact of blocking.
Poll(flag Pflag, timeoutMillis int32) (ready bool, errno experimentalsys.Errno)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing that tripped the whole debate was me noticing a PR that basically obviated this parameter based on concerns about blocking. Rather than get through that debate, which seems still murky, this moves the whole feature set internal. This allows users to still supply filesystems even though there remain plenty of blocking methods, such as Read.

@codefromthecrypt
Copy link
Contributor Author

the main impact of this change is now you cannot really wrap all functions used by wasi anymore. However, it is still better than not exposing the ones that are stable. I wasn't aware the File.Poll interface wasn't ready until #1606 which sadly happened after I promised publicly we would release our file api. 🤷 I tried..

Folks aren't currently on the same page on whether or not users should
be able to implement the timeout parameter on `File.Poll` or whether it
should always be externally controlled via a sleep loop which uses poll
(immediate). Until we are sure internally, we shouldn't expose this for
implementation, as it would be confusing for the same reason.

This migrates the non-blocking functionality back into the internal
package.

One future way out could be to allow the user to decide if the backend
is allowed to use the timeout parameter or not. Just as they control the
FS, they could control a poller, which would then be able to span across
both normal files and sockets. One could then be able to choose in the
case of a single file and a specific known wasm, simply use the native
timeout. In other words, since there are valid alternate answers, even
though Go doesn't support custom pollers we could, if we decide that
native polling is basically not allowed. Alternatively, we could force
it to never be allowed by removing the timeout parameter (making it a
poll immediate). Anyway, I will leave that decision up to @ncruces and
@evacchi who will take responsibility of whichever decision is made.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Copy link
Collaborator

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, non-blocking I/O on file systems belongs to the realm of advanced optimizations, which most programs will never successfully leverage, so starting with exposing a blocking-only API seems like it will still unlock most use cases that need to support customizing the file system layer.

I think it's a much safer move to test the experimental sys.File as a blocking API only, and introduce non-blocking on file system operations only when we have a clear use case for it. Take Go as an example, even if the language is designed for concurrency, the runtime still performs blocking I/O on files and simply spawns more threads on demand because the async I/O facilities are usually too limited to be useful as a general purpose solution (in other words, they tend to create more problems than they solve).

Some more details about my experience on the topic:

  • Read-heavy file I/O tend to benefit from the kernel page cache, in which case the reads barely ever block because the data usually resides in memory; the optimization path for those programs tend to be more layers of caches in user space, using mmap, or specializations like sendfile/splice, rarely is async I/O the right answer.

  • Applications doing write-heavy operations often use write-optimized data structures such as write-ahead logs, in which cases the opportunities for non-blocking I/O are very limited; writes hit the page cache, and the only operations that would be truly blocking are synchronization points like fsync/fdatasync, in which case blocking the current thread is usually acceptable. Once again, the complexity of syncing with async I/O rarely justifies the investment.

I hope the perspective is useful, I want to emphasize that I'm in favor of this change, I think that a simpler API will help address most of the reasons why we wanted to expose the file system API in the first place, support for non-blocking file I/O will likely deserve more research to get right 👍

@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented Aug 7, 2023

I hear you @achille-roussel, though we have spent a long time on this and I think that there are impacts by not doing it.. basically it will never be done.

The main cons are..

  • no one can implement non-blocking files except us, like via x/sys or anything wouldn't work.
  • this means users cannot wrap our real FS impl without erasing non-blocking
  • this also will complicate, if not prevent effective use of sockets (though this can be worked around via pre-opens similar to stdio, which wazero can also control)

TL;DR; I think it will make people have to use external projects that completely re-implement the whole syscall layer, if they want to affect non-blocking implementation. Worst case is no one will move this past the finish line later, so it will permanently reside in this state.

@codefromthecrypt
Copy link
Contributor Author

As most of the downside are limited to pre-opens which are special cased anyway, I think we're good here. It can be re-introduced later and meanwhile not block wazero 1.4

@codefromthecrypt codefromthecrypt merged commit 009ee70 into main Aug 7, 2023
@codefromthecrypt codefromthecrypt deleted the fsapi-file branch August 7, 2023 07:50
jerbob92 pushed a commit to jerbob92/wazero that referenced this pull request Aug 9, 2023
…abs#1613)

Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Jeroen Bobbeldijk <jeroen@klippa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants