-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Reference / Release for all objects #15
Conversation
Is this blocked by #9? |
It is, we discussed it a bunch F2F I remember thinking I should do the pull request after that, but don't remember the detail of the discussion. That was a crazy week :) |
I'm coming to the close of a change to switch wgpu-rs to use reference counting; at that point, both implementations should have very similar reference semantics requirements. I have worked to remove Arc wherever I can and do as little synchronization as possible internally, and I've also worked to make the lifetime requirements as low as reasonably possible in terms of overhead. I also believe that future optimizations, if applicable, will mostly be about avoiding internal reference counting during command buffer submission, which isn't going to be exposed to the user. So, here are my thoughts: The boxed / owned objects at the end in my impl will be (from what I believe so far, and I am close to done): adapter, queue, render / command / bundle encoders, shader modules, and command buffers. These should not have exposed retain/release. Everything else should. The rationale: Shader-modules are only used read-only, and are not retained after pipeline layout creation (so they need no reference counting at all, not even internal reference counting). Adapters, according to the spec, should only be used prior to device initialization and then not used again; we can take advantage of that. On native, adapters will be implicitly destroyed when they are used to create a device, therefore they don't need reference counting. Command encoders must already be externally synchronized and are implicitly destroyed on finish. I believe this is how it works today in wgpu-rs, but if it doesn't, this is how it should work--you can't finish a command encoder more than once. So they don't need reference counting. Command buffers, similarly, can only be used once, and you can't do anything with them except submit to a queue. So they will be implicitly destroyed on queue submission (the implementation may keep them alive for various reasons but this doesn't need to be exposed to the user, and might not be done with reference counting). Therefore, they also don't need reference counting. Finally, queues. Queues unfortunately do need some sharing and internal synchronization; however, this is only needed for the "raw" queue. This is due to surface presentation not taking an explicit queue, so it must hold a reference to the queue internally; and the Vulkan spec requires queues to be externally synchronized for both presentation and submission. However, this should ideally only be needed when actually running the native present / submit commands. The only cases where tracking or user information needs to be updated on queues is during buffer map on init, which I would like to explicitly take a queue (there is one other place that uses the implicit device queue, which is command encoder creation and which IMO should be performed on the queue rather than the device, but that's not relevant here as it only needs read-only access to the inner queue). So, tl;dr--I believe there is a very realistic path forward to where WebGPU external (not internal) queues don't need reference counting. So, we should not add a reference counting to it. The other main consequence of this from a C perspective is that operations that explicitly use a queue should be externally synchronized, otherwise it's basically the same as it is currently. |
Additional note from discussion with @kvark , who asked to really confirm there are no future compatibility constraints: it is possible that we could avoid reference counting on QuerySet, ComputePipeline, and RenderBundle, even though we can't do it right now, so we should also not provide inc/dec methods on that. And I haven't gotten to Surface yet, so hold off on adding inc/dec methods for that until we confirm whether an Arc free implementation is possible. Essentially, anything that is referenced by a RenderBundle is going to have to be reference counted, because there's no conceivable future where render bundle lifetimes can be tracked by anything sane that's not tracing GC (which is a no go for C) or reference counting. This is because they're expected to live across multiple queue submissions, so there's no convenient way to figure out when they can be destroyed. This includes anything referenced by either a BindGroup or RenderPass, so basically any resource that hasn't already been explicitly mentioned (except maybe Surface--I still need to work that one out; but in any case Surface/Swapchain didn't even exist in this original proposal). However, for ordinary command encoders, one can imagine WebGPU changing to add APIs allowing avoiding reference counting and tracking by submission index instead. QuerySets are only directly referenced by ordinary command encoders, so it is conceivable that in the future we would not need refcounting there. Similarly, render bundles can't (yet?) reference other render bundles, so it's also conceivable for them. And of course render bundles can't reference compute pipelines, so they also conceivably don't need refcounting (but... it seems pretty likely that a ComputeBundle will be added at some point, so that one seems like a technicality). So that's why if we really want to be future compatibility safe, we shouldn't have inc/dec for them. |
For non-refcounted types, how do they get freed? Would they have a separate Also if we plan to omit ref/release methods for some types now that might change post 1.0, it needs to be nonbreaking to add them later. |
All in all, having an inconsistent memory management interface across the objects seems unfortunate. I understand the reasoning behind why some objects need internal refcounting and others don't, but it's not strictly necessary that the external refcounting and the internal refcounting be the same thing. (I believe Dawn uses the same refcount field for internal and external references, but that doesn't have to be the case.) We can have internal refcounting without refcounting existing in the C API at all. |
Single-shot release.
It would be (that's the intent of what I've been talking about).
This is true, which is why I'm trying to differentiate between "internal refcounting that's an implementation detail" and "internal reference counting that's pretty much inevitable for any C API." For the latter, not exposing the refcounting is just strictly worse for performance, especially because we already need an allocation for every object other reasons in C . But if people would prefer the simplicity of uniform resource management for every object, and don't want refcounting exposed, that's fine too. |
And I also don't think it's super random what resources would have refcounting exposed (I'm including ComputePipeline even though strictly speaking it doesn't have a render bundle equivalent yet). These intrinsically need refcounting (because they're transitively referenced by render bundles):
And here's what doesn't intrinsically need refcounting (everything else):
Subjectively, this all makes a lot of sense, and the only weird one is QuerySet (I'm not actually sure why QuerySet can't be used in a render bundle, but I'm assuming there's a good reason?). I don't think I would be terribly confused about what needs reference counting and what doesn't. |
With the single-shot release design where we can easily add refcounting to an object later, I am relatively comfortable making it not totally uniform across the API. The difference between "refcounted" and "refcounted with a maximum count of 1" is nice and small. Anyway, I'd like to let Corentin have the actual opinions when he's back and has time to look at this. |
Just so I'm clear: What makes it strictly worse? Is it that if the client application wants to use refcounting, they'll have to use their own on top of the internal one? |
For now, maybe we could try to limit the amount of functions we add on top of the browser API? We could start with single-shot If we wait to expose |
Yes, that. And just to be clear--I'd be totally fine not exposing these initially since it's backwards compatible to do so later. I am more saying, for the particular types I listed, there isn't really any conceivable technical decision we could make that would constrain how we could optimize the C interface, so exposing them wouldn't be a backwards compatibility hazard. Of course as you've both mentioned, there are other concerns besides backwards compatibility here (like API surface) so something like user studies (e.g. to see how many end up wrapping the types in refcounts themselves) would be beneficial. I know a huge percentage of all wgpu-rs users do this in Rust, but a lot of that is due to parallel rendering which AFAIK isn't sound in Dawn at the moment. |
I think there's value in having a uniform API surface for memory management and no weird special cases as much as possible. Metal has some thing that are autoreleased and some which aren't and it has been a constant pain to deal with. Special casing encoders might be fine though. Since most of the objects are refcounted in wgpu, especially the high-frequency ones, what is the concern with just adding refcounting to all of them? I understand minimalism is good, but I think it is dwarfed by consistency for developers. And the additional code and overhead for objects like
Refcounting has been super useful in Chromium for at least |
The last comment on this is almost 3 months old. A resolution here would be helpful. |
To try and make forward progress I updated this PR to add reference / release for all objects except encoders that get a single delete function. I think there is value in having CommandBuffer / RenderBundle support multiple refs because they are reusable (or potentially reusable in the future) so they don't require single ownership. |
@litherum this issue is as old as the project itself! What users require today is being able to delete objects. #100 adds the ability to drop objects, which addresses this need. Our long-standing position was that ref-counting is an implementation detail, and we'd not want it exposed to users. Some objects are refcounted in our implementation (see @pythonesque elaboration), some aren't. And even for those that are, it's easier to build solid implementation when we have full control of those reference/release internally. So we think that we should proceed with #100 or a variation of it, since it has the semantics we all agree on, and it addresses an immediate need. Then we have this big transition internally in |
And that's why we're waiting. IMO it's misleading to add "drop" to the API when we plan to revisit it in the future. Better to say "please check what your local WebGPU implementation exposes". @pythonesque described that most objects in wgpu need to be refcounted, including the most performance sensitive ones, so I don't see why we can't make the others refcounted just for convenience. Not everyone has the luxury to work with a language that has a borrow checker to work with references to stack variables everywhere. |
We would like to have a webgpu header that is actually usable, one that both wgpu-native and Dawn implement. The earlier, the better. Having a way to release objects is a requirement for this. So I don't think "that's why we're waiting" is the best way forward. If you don't like the "drop" method, we could agree on it being "release", just without the corresponding acquire. |
"drop" means "I'm done with it; feel free to delete it whenever you feel like deleting it"? If you "drop" something while it has an outstanding callback, will the callback be called? |
For us "drop" essentially means "this handle is done". That is, the object itself may very well live longer, issue pending callbacks, etc, but the user will not use this handle directly any more. |
oh interesting, our implementation calls all the callbacks with some failure/unknown status when "drop" happens. Don't remember if there was a reason behind this right now. @Kangz ? |
Also hopeful for some progress here soonish - I'm trying to use webgpu.h and finding the need for this when doing things outside of one-shot hello world demos where unbounded growth is fine. |
All callbacks must be called "eventually". This is really important because of manual memory management: an outstanding callback should be able to keep memory alive and free it on callback (even if the callback "fails"). (Related note in the last paragraph below.) After a short discussion with Austin, Austin has me agreeing that the liveness semantics in this C API should match the JS API, so that e.g. blink::GPUBuffer exactly holds a WGPUBuffer and can drop it when GC'd.
Initially this seemed surprising to me, but iiuc the difference just boils down to when the callback gets "failed" (called with failed status): inside drop() (== refcount reaching 0), inside destroy(), or inside ProcessEvents(). IMO in all cases it should happen in ProcessEvents. But even though it's deferred, that doesn't mean we actually have to keep the originating object alive, e.g.:
IMO it should never be "valid" (defined behavior) for any wgpu function to be called with any WGPU object argument that has been dropped by the client program. (In Dawn for example I would like a runtime debug ASSERT for this (because due to implementation details the pointer technically stays valid as long as there are internal refs) - this way ownership errors are reliably caught in Debug+ASAN builds.) Therefore without a ref to WGPUInstance you can't call wgpuInstanceProcessEvents, so you wouldn't be able to get any callbacks to be called. It doesn't really matter whether any objects have internal refs back to WGPUInstance. Since we guarantee all callbacks will be called, upon dropping the last ref to WGPUInstance all callbacks should be failed. (This is better than debug asserting there are no pending callbacks, because it's not easy to know whether there are any.) |
Oh yeah, and a further thing: since I think C drop should be 1:1 with JS object garbage collection, there should be no observable effects from dropping any object. For example:
Dropping instances may be a special case. It's impossible in JS so it doesn't present issues implementing the JS API on top of the C API. And if you drop the instance you can't get any callbacks so relatively little is observable anyway. (The return values of |
Which would be implemented by having the ArrayBuffer shared-own its underlying memory, either indirectly by shared-owning a WGPUBuffer, or by some other mechanism (like shared-owning the IPC shared memory). |
+1 to what Kai explained. Ideally webgpu.h implements the JS semantics where you can't observe GC happening so that callbacks don't get called on user-side drop, but only when they would have normally been called. (JS calling The |
Are there any updates? It's been about 4 months. |
FYI @jimblandy, this is the issue I mentioned. |
@kainino0x @cwfitzgerald PTAL again at the last commit there! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last commit LGTM, one comment on earlier one EDIT: wrong PR for that comment
Clean rebased past #182, landing now |
No description provided.