Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Fully sync surface for SyncAccessHandles #323

Closed
jlongster opened this issue Aug 16, 2021 · 21 comments
Closed

Request: Fully sync surface for SyncAccessHandles #323

jlongster opened this issue Aug 16, 2021 · 21 comments
Assignees
Labels
access handles Issues related to the new Access Handles proposal

Comments

@jlongster
Copy link

jlongster commented Aug 16, 2021

I was reading the AccessHandle proposal, and looked at the code example here: https://github.com/WICG/file-system-access/blob/main/AccessHandle.md#proposed-api

The sync version available in a worker context looks like this:

const handle = await file.createSyncAccessHandle();

You don't need to await that right? The sync APIs should never need to be awaited.

tomayac added a commit to tomayac/native-file-system that referenced this issue Aug 16, 2021
@rstz
Copy link
Contributor

rstz commented Aug 16, 2021

The code example is correct, that is, you need to await the createSyncAccessHandle() method in the implementation that's currently landed in Chrome Canary.
The "SyncAccessHandle" is not fully synchronous, only read/write are synchronous in order get better performance in some Wasm use cases. Other operations, such as close(), truncate() and getSize() on a SyncAccessHandle are asynchronous, see the IDL definition for details. This is different in StorageFoundation, where all methods of the sync API are synchronous.

@tomayac
Copy link
Contributor

tomayac commented Aug 16, 2021

Oh, the naming then is misleading, especially for folks coming from Node.js, where, e.g., file system operations exist as foo() and fooSync(). Will close my PR then.

@jlongster
Copy link
Author

That's really too bad. Why is it async? If we are in a worker we have no need for it to be async.

It makes it very hard to use the API from WASM projects compiled with C, where everything is sync. It will require the whole mess of spawning a worker and using Atomics.wait to convert everything async to sync, and even reads/writes with have to go through that layer because the work controls the files, introducing overhead everywhere.

@rstz
Copy link
Contributor

rstz commented Aug 16, 2021

Pinging @fivedots for more context on this decision.

@rstz rstz added the access handles Issues related to the new Access Handles proposal label Aug 16, 2021
@fivedots
Copy link
Collaborator

Thank you for your feedback on this and other issues!

There’s a few reasons the SyncAccessHandle has a mix of sync and async methods. Overall, sync methods that deal with the host’s file system are discouraged (although, of course, possible in a worker context), and this was mentioned a few times by people that have more experience with spec’ing new APIs (it is also mentioned in the Web Platform Design Principles guide). There’s also tooling that helps with dealing with async calls from Wasm, like Asyncify and our PThreadFS prototype, which should simplify porting to Wasm on top of AccessHandles. The last reason is that there are active efforts on the Wasm side to natively support async calls with little overhead, hopefully meaning that async calls will not be problematic in the medium term.

Considering that, we decided on direct symmetry between the async and sync surface except on the critical read/write methods, where the overhead of something like Asyncify would be too much. The hope is that this leads to a balance between supporting new use cases now and future-proofing the API.

@jlongster
Copy link
Author

Thanks for explaining!

I still hope you reconsider for the following reasons:

  • I find it strange to desire symmetry between the async and sync surface, when it seeks like it should be the other way around. All the sync methods should have symmetry, and all the async methods should have symmetry. It's very confusing for a "sync" method to need await, and I believe a lot of people are going to get tripped up on that at first.
  • If the only reason to have it is symmetry between the methods, that's not a very strong argument for it when there are several technical reasons why all of them should be sync
  • Asyncify greatly bloats the binary size. SQLite is ~1MB compiled, but when compiled with Asyncify it's ~1.7MB, almost twice as big. Supposedly work is being done to reduce it, but that's still theoretical and it's not clear how much it will reduce
  • There's also a significant perf hit which is even more crucial. I haven't benchmarked myself, but someone else in the community saw around 1.6x perf hit. Even if that can reduce, even a small hit is really unfortunate.
  • I haven't looked at the PThreadFS prototype, but I would guess it's using a similar technique that I use in absurd-fs which is to do all the FS calls in a separate process/thread and use something like Atomics.wait to block on it, turning an async call into a sync one. That is still introducing quite a bit of overhead!
  • Going back to asyncify -- because it turns sync APIs into async, it forces the library itself to be async. Meaning a sql.js lib that uses it would need all async APIs like await stmt.run(). SQLite being sync is a major feature and it's really unfortunate to lose that, only because of internal details being leaked.

When we are in a worker, please give us fully sync APIs! It simplifies things SO much.

Here's an example of how much is simplifies things. I wrote a webkitFileSystem backend for absurd-sql, which provides fully sync APIs. Here's all the code (this backend is not usable unfortunately because it has no locking): https://gist.github.com/jlongster/c92ac81c402de59d8556b69a4168d4e1

Compare that with the IndexedDB backend that needs to do all the work of using Atomics.wait to turn an async API into a sync one: https://github.com/jlongster/absurd-sql/tree/master/src/indexeddb

The PThreadFS work might hide away that complexity, but that overhead is still there. Even though you made read/write sync, because we opened the files on a separate thread all the reads/writes must now be proxied over to that thread and back. Nothing is going to be as fast as purely sync APIs, and unless there's a strong technical reason to not provide them, I suggest making all of them sync.

@Taytay
Copy link

Taytay commented Aug 21, 2021

@fivedots : I'm a contributor to Sql.js and am following @jlongster's work with intense interest. I just gave him a 👍 for his comments above, but that doesn't capture how strongly I agree with his statements. :)

I am of course very interested in the future of both WebAssembly and Sql.js in particular, and after reading your proposal, I was thrilled to see that this is a primary use case for your work! As @jlongster mentioned, its synchronous interface and performant nature are both very appreciated features of the library. I see that you've obviously considered the performance implications. To quote from the proposal:

The reason for offering a Worker-only synchronous interface, is that consuming asynchronous APIs from Wasm has severe performance implications (more details here). Since this overhead is most impactful on methods that are called often, we've only made read() and write() synchronous. This allows us to keep a simpler mental model (where the sync and async handle are identical, except reading and writing) and reduce the number of new sync methods, while avoiding the most important perfomance penalties.

I think you bring up good points about performance in that reading and writing calls will benefit from the performance more than the "less-often-called" handle creation. However, the fact that a call to create a handle cannot be synchronous means that the primary WebAssembly use case for this proposal will now have significant complexity added along with the significant consequences to the entire user-level API that @jlongster mentions. I also don't know (@jlongster can likely correct me) if the assumption about the handle creation being such a rare occurrence is a good one when it comes to simulating an underlying filesystem for SQLite. At the very least, making such a fundamental operation async guarantees that almost all users of these file APIs will likely need to be asynchronous themselves.

I also read the reasoning in the doc you linked: https://docs.google.com/document/d/1lsQhTsfcVIeOW80dr467Auud_VCeAUv2ZOkC63oSyKo/edit#heading=h.tb2xzhzdt9qr

This section in particular:

Our goal is to deprecate the synchronous OPFS interface at this point. In order to be able to do so, we will

  • Explicitly communicate to web developers that the synchronous interface is to be avoided outside of very specific WebAssembly use cases.
  • Commit to providing a polyfill for the sync APIs using the async APIs and the Wasm support.
  • Communicate a concrete deprecation plan as soon as possible
    Alternatives considered
    Offering an asynchronous interface only
    The main argument against a synchronous interface is that synchronous APIs will be made obsolete through improvements in WebAssembly and native code. Our counterpoint is that these improvements will be done over a multi-year time span of uncertain length. A synchronous interface could unlock important use cases much faster and be deprecated later on.

This feels like the sort of "very specific WebAssembly" use case they had in mind, is it not? At the very least, I think a strong case could be made.

I know that these APIs need to be designed to be safe/intuitive for the consumer around for the long haul, and I am so glad that all of the authors involved are designing around the WebAssembly and Sql.js use case in particular! But as the authors of the doc mentioned, the time-frames involved that will make these asynchronous APIs a "non-issue" are multi-year and unknown. I am concerned that this decision will have significant negative consequences in one of the primary use cases of this proposal.

@hcldan
Copy link

hcldan commented Sep 20, 2021

I also would like to see the sync apis explicitly supported without threat of future deprecation, as well as sync apis for every call (including handle creation) for the new handles. Making people use Asyncify for a few async operations forces the application to depend on a pretty complex and messy work-around for no apparently good reason.

Let the WASM folks have the sync methods in the web workers.

@hcldan
Copy link

hcldan commented Sep 20, 2021

@fivedots I would like to point out, because it is often not talked about, that the WASM filesystem relies on a fully memory-backed synchronous file system on the main UI thread (at least with pthreads). Data persistence is done via a call to syncfs where async stuff can happen.

We are looking at ways to pitch in and help here... but I have to say, the churn involved with the SF apis being merged and losing access to sync apis gives me great pause to say the least...

@fivedots fivedots self-assigned this Sep 30, 2021
@fivedots fivedots changed the title Unnecessary await in code example for sync AccessHandle Request: Fully sync surface for SyncAccessHandles Oct 1, 2021
@fivedots
Copy link
Collaborator

fivedots commented Oct 4, 2021

Thank you for the detailed replies.

For clarification, are we talking about a fully synchronous surface for access handles or generally for the Origin Private File System (OPFS)? I.e. fully sync access handles would give you methods like sync createSyncAccessHandle(), sync truncate(), sync close(), etc., while a fully sync OPFS would include the previous list plus methods like sync getFileHandle(), sync getDirectoryHandle(), sync removeEntry(), etc.

My understanding based on your concerns, is that you’d prefer a fully sync OPFS. Just providing a fully sync AccessHandle would not be that useful, since you’d still need to pay the overhead of something like Asyncify or PThreadFS for filesystem-like operations e.g. opening/creating a file, moving it, deleting it.

Having more clarity on this would help us evaluate who the right stakeholders are and what the right approach is!

@hcldan
Copy link

hcldan commented Oct 4, 2021

I suppose it would require a fully sync OPFS surface area. Our short-term goal is to be able to use the OPFS apis in WASM worker threads directly until the FS abstractions are reliable. One day they will be and it will be awesome, but for now there's too much churn and uncertainty.

@hcldan
Copy link

hcldan commented Oct 15, 2021

@fivedots any more info on this? Am I going to be able to count on a sync surface area for OPFS?

@fivedots
Copy link
Collaborator

A fully sync surface is a larger project, so I’ll add @mkruisselbrink to get his point of view and make him aware of the feature request.

In my opinion, If we had a fully sync OPFS, it would make sense that it would also include a fully sync Access Handle. That being said, I’d consider that a separate effort from our current focus on introducing Access Handles. We’ve gotten positive feedback during our Origin Trial on the current interface, and it would make sense to work on shipping that while also discussing a future fully sync surface. That way we can quickly cover an initial set of use cases and continue iterating to provide better support to you.

@hcldan
Copy link

hcldan commented Oct 20, 2021

@fivedots I understand... I might ask, though, that you continue the SF origin trial longer to give the people who have been using the SF sync interface from WASM a path for migration. Right now I'm going to be completely broken when the origin trial ends.

@hcldan
Copy link

hcldan commented Oct 28, 2021

@fivedots @mkruisselbrink any news? do I have a bridge until the OPFS stuff has a full sync surface area?

@fivedots
Copy link
Collaborator

Sadly, given the infrastructure of Origin Trials, that won’t be possible. Origin Trials are used to experiment and gather feedback on possible new features. Each time they are extended, an explicit experimental goal and approvals from Chrome API Owners are required. It is not possible to indefinitely leave the trial running until a particular feature (i.e. sync OPFS in this case) lands.

@hcldan
Copy link

hcldan commented Oct 29, 2021

@fivedots well then I will need a sync OPFS layer quickly after Jan 4, as after that, the next chrome build will remove the feature we've been working with. Can we help make that happen? where can we pitch in? A sync OPFS surface area is really needed by us and clearly by others in this thread as well.

@fivedots
Copy link
Collaborator

fivedots commented Nov 3, 2021

I understand there's a real use case for a sync surface, but we can't commit to implementing the feature request by the time the Storage Foundation Origin Trial expires. My recommendation would be to look into other ways of addressing your use case. One option is to look into PthreadFS as an intermediate step. Feedback or direct contributions there would be very welcome. Another one would be to rely on Asyncify to wrap the async methods.

@hcldan
Copy link

hcldan commented Nov 3, 2021

We have looked at Asyncify and it blows up our build.
We have also looked at PthreadFS but we can't have the FS in memory, it's too big.

@hcldan
Copy link

hcldan commented Nov 16, 2021

Another bump here... we are seeing that the async nature of OPFS and the entry navigation are a significant impact
emscripten-core/emscripten#15041 (comment)

I would greatly appreciate any effort that could be made to make an all sync surface area for OPFS.
WASM could be significantly faster...

@mkruisselbrink
Copy link
Contributor

Closing this in favor of whatwg/fs#7, as that will be the new home of access handles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
access handles Issues related to the new Access Handles proposal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants