Add Reference / Release for all objects #15

Kangz · 2019-10-17T12:19:01Z

No description provided.

grovesNL · 2019-10-17T12:23:23Z

Is this blocked by #9?

Kangz · 2019-10-17T12:39:11Z

It is, we discussed it a bunch F2F I remember thinking I should do the pull request after that, but don't remember the detail of the discussion. That was a crazy week :)

pythonesque · 2021-08-12T20:11:53Z

I'm coming to the close of a change to switch wgpu-rs to use reference counting; at that point, both implementations should have very similar reference semantics requirements. I have worked to remove Arc wherever I can and do as little synchronization as possible internally, and I've also worked to make the lifetime requirements as low as reasonably possible in terms of overhead. I also believe that future optimizations, if applicable, will mostly be about avoiding internal reference counting during command buffer submission, which isn't going to be exposed to the user. So, here are my thoughts:

The boxed / owned objects at the end in my impl will be (from what I believe so far, and I am close to done): adapter, queue, render / command / bundle encoders, shader modules, and command buffers. These should not have exposed retain/release. Everything else should.

The rationale:

Shader-modules are only used read-only, and are not retained after pipeline layout creation (so they need no reference counting at all, not even internal reference counting).

Adapters, according to the spec, should only be used prior to device initialization and then not used again; we can take advantage of that. On native, adapters will be implicitly destroyed when they are used to create a device, therefore they don't need reference counting.

Command encoders must already be externally synchronized and are implicitly destroyed on finish. I believe this is how it works today in wgpu-rs, but if it doesn't, this is how it should work--you can't finish a command encoder more than once. So they don't need reference counting.

Command buffers, similarly, can only be used once, and you can't do anything with them except submit to a queue. So they will be implicitly destroyed on queue submission (the implementation may keep them alive for various reasons but this doesn't need to be exposed to the user, and might not be done with reference counting). Therefore, they also don't need reference counting.

Finally, queues. Queues unfortunately do need some sharing and internal synchronization; however, this is only needed for the "raw" queue. This is due to surface presentation not taking an explicit queue, so it must hold a reference to the queue internally; and the Vulkan spec requires queues to be externally synchronized for both presentation and submission. However, this should ideally only be needed when actually running the native present / submit commands. The only cases where tracking or user information needs to be updated on queues is during buffer map on init, which I would like to explicitly take a queue (there is one other place that uses the implicit device queue, which is command encoder creation and which IMO should be performed on the queue rather than the device, but that's not relevant here as it only needs read-only access to the inner queue).

So, tl;dr--I believe there is a very realistic path forward to where WebGPU external (not internal) queues don't need reference counting. So, we should not add a reference counting to it. The other main consequence of this from a C perspective is that operations that explicitly use a queue should be externally synchronized, otherwise it's basically the same as it is currently.

pythonesque · 2021-08-12T22:34:26Z

Additional note from discussion with @kvark , who asked to really confirm there are no future compatibility constraints: it is possible that we could avoid reference counting on QuerySet, ComputePipeline, and RenderBundle, even though we can't do it right now, so we should also not provide inc/dec methods on that. And I haven't gotten to Surface yet, so hold off on adding inc/dec methods for that until we confirm whether an Arc free implementation is possible.

Essentially, anything that is referenced by a RenderBundle is going to have to be reference counted, because there's no conceivable future where render bundle lifetimes can be tracked by anything sane that's not tracing GC (which is a no go for C) or reference counting. This is because they're expected to live across multiple queue submissions, so there's no convenient way to figure out when they can be destroyed. This includes anything referenced by either a BindGroup or RenderPass, so basically any resource that hasn't already been explicitly mentioned (except maybe Surface--I still need to work that one out; but in any case Surface/Swapchain didn't even exist in this original proposal).

However, for ordinary command encoders, one can imagine WebGPU changing to add APIs allowing avoiding reference counting and tracking by submission index instead. QuerySets are only directly referenced by ordinary command encoders, so it is conceivable that in the future we would not need refcounting there. Similarly, render bundles can't (yet?) reference other render bundles, so it's also conceivable for them. And of course render bundles can't reference compute pipelines, so they also conceivably don't need refcounting (but... it seems pretty likely that a ComputeBundle will be added at some point, so that one seems like a technicality). So that's why if we really want to be future compatibility safe, we shouldn't have inc/dec for them.

kainino0x · 2021-08-12T22:41:18Z

For non-refcounted types, how do they get freed? Would they have a separate free method or would they simply have a single-shot release method with no reference method?

Also if we plan to omit ref/release methods for some types now that might change post 1.0, it needs to be nonbreaking to add them later.

kainino0x · 2021-08-12T22:45:01Z

All in all, having an inconsistent memory management interface across the objects seems unfortunate. I understand the reasoning behind why some objects need internal refcounting and others don't, but it's not strictly necessary that the external refcounting and the internal refcounting be the same thing. (I believe Dawn uses the same refcount field for internal and external references, but that doesn't have to be the case.) We can have internal refcounting without refcounting existing in the C API at all.

pythonesque · 2021-08-12T22:53:49Z

For non-refcounted types, how do they get freed? Would they have a separate free method or would they simply have a single-shot release method with no reference method?

Single-shot release.

Also if we plan to omit ref/release methods for some types now that might change post 1.0, it needs to be nonbreaking to add them later.

It would be (that's the intent of what I've been talking about).

All in all, having an inconsistent memory management interface across the objects seems unfortunate. I understand the reasoning behind why some objects need internal refcounting and others don't, but it's not strictly necessary that the external refcounting and the internal refcounting be the same thing. (I believe Dawn uses the same refcount field for internal and external references, but that doesn't have to be the case.) We can have internal refcounting without refcounting existing in the C API at all.

This is true, which is why I'm trying to differentiate between "internal refcounting that's an implementation detail" and "internal reference counting that's pretty much inevitable for any C API." For the latter, not exposing the refcounting is just strictly worse for performance, especially because we already need an allocation for every object other reasons in C . But if people would prefer the simplicity of uniform resource management for every object, and don't want refcounting exposed, that's fine too.

pythonesque · 2021-08-12T23:16:55Z

And I also don't think it's super random what resources would have refcounting exposed (I'm including ComputePipeline even though strictly speaking it doesn't have a render bundle equivalent yet).

These intrinsically need refcounting (because they're transitively referenced by render bundles):

Device
ComputePipeline, RenderPipeline, PipelineLayout, BindGroupLayout
BindGroup, Buffer, TextureView, Texture, Sampler

And here's what doesn't intrinsically need refcounting (everything else):

Adapter, Surface, Queue
QuerySet
CommandEncoder, RenderBundle, RenderBundleEncoder, RenderPassEncoder, ComputePassEncoder, CommandBuffer

Subjectively, this all makes a lot of sense, and the only weird one is QuerySet (I'm not actually sure why QuerySet can't be used in a render bundle, but I'm assuming there's a good reason?). I don't think I would be terribly confused about what needs reference counting and what doesn't.

kainino0x · 2021-08-13T19:44:10Z

With the single-shot release design where we can easily add refcounting to an object later, I am relatively comfortable making it not totally uniform across the API. The difference between "refcounted" and "refcounted with a maximum count of 1" is nice and small.

Anyway, I'd like to let Corentin have the actual opinions when he's back and has time to look at this.

kainino0x · 2021-08-13T20:30:10Z

For the latter, not exposing the refcounting is just strictly worse for performance, especially because we already need an allocation for every object other reasons in C .

Just so I'm clear: What makes it strictly worse? Is it that if the client application wants to use refcounting, they'll have to use their own on top of the internal one?

grovesNL · 2021-08-15T01:51:36Z

For now, maybe we could try to limit the amount of functions we add on top of the browser API? We could start with single-shot release/free/drop for all objects and consider exposing reference/retain functionality later.

If we wait to expose reference, we could look at some real use cases to better understand refcounting design trade-offs, e.g. typical performance improvements of shared refcounts for expected usage of WebGPU, effects on developer experience, whether many libraries would benefit from this, whether the increased API surface is worth it, etc.

pythonesque · 2021-08-15T02:08:59Z

Just so I'm clear: What makes it strictly worse? Is it that if the client application wants to use refcounting, they'll have to use their own on top of the internal one?

Yes, that.

And just to be clear--I'd be totally fine not exposing these initially since it's backwards compatible to do so later. I am more saying, for the particular types I listed, there isn't really any conceivable technical decision we could make that would constrain how we could optimize the C interface, so exposing them wouldn't be a backwards compatibility hazard.

Of course as you've both mentioned, there are other concerns besides backwards compatibility here (like API surface) so something like user studies (e.g. to see how many end up wrapping the types in refcounts themselves) would be beneficial. I know a huge percentage of all wgpu-rs users do this in Rust, but a lot of that is due to parallel rendering which AFAIK isn't sound in Dawn at the moment.

Kangz · 2021-08-17T09:19:25Z

I think there's value in having a uniform API surface for memory management and no weird special cases as much as possible. Metal has some thing that are autoreleased and some which aren't and it has been a constant pain to deal with. Special casing encoders might be fine though.

Since most of the objects are refcounted in wgpu, especially the high-frequency ones, what is the concern with just adding refcounting to all of them? I understand minimalism is good, but I think it is dwarfed by consistency for developers. And the additional code and overhead for objects like QuerySet, ShaderModule and friends is really tiny.

Of course as you've both mentioned, there are other concerns besides backwards compatibility here (like API surface) so something like user studies (e.g. to see how many end up wrapping the types in refcounts themselves) would be beneficial. I know a huge percentage of all wgpu-rs users do this in Rust, but a lot of that is due to parallel rendering which AFAIK isn't sound in Dawn at the moment.

Refcounting has been super useful in Chromium for at least Device and Texture to handle the mess of lifetimes in Chrome's guts. Without refcounting we would have had to box the handles in the equivalent of an Rc (and most likely even Arc).

litherum · 2021-11-16T02:20:01Z

The last comment on this is almost 3 months old. A resolution here would be helpful.

Kangz · 2021-11-18T14:59:12Z

To try and make forward progress I updated this PR to add reference / release for all objects except encoders that get a single delete function. I think there is value in having CommandBuffer / RenderBundle support multiple refs because they are reusable (or potentially reusable in the future) so they don't require single ownership.

kvark · 2021-11-22T15:02:39Z

@litherum this issue is as old as the project itself!

What users require today is being able to delete objects. #100 adds the ability to drop objects, which addresses this need.
I think the semantics of this operation is non-controversial, only the naming is.
And naming is debated because it may or may not imply refcounting, which we haven't yet found consensus on.

Our long-standing position was that ref-counting is an implementation detail, and we'd not want it exposed to users. Some objects are refcounted in our implementation (see @pythonesque elaboration), some aren't. And even for those that are, it's easier to build solid implementation when we have full control of those reference/release internally.

So we think that we should proceed with #100 or a variation of it, since it has the semantics we all agree on, and it addresses an immediate need. Then we have this big transition internally in wgpu that needs to happen before we can conclude the refcounting story, lead by @pythonesque .

Kangz · 2021-11-22T16:03:00Z

And that's why we're waiting. IMO it's misleading to add "drop" to the API when we plan to revisit it in the future. Better to say "please check what your local WebGPU implementation exposes". @pythonesque described that most objects in wgpu need to be refcounted, including the most performance sensitive ones, so I don't see why we can't make the others refcounted just for convenience. Not everyone has the luxury to work with a language that has a borrow checker to work with references to stack variables everywhere.

kvark · 2021-11-22T16:21:18Z

We would like to have a webgpu header that is actually usable, one that both wgpu-native and Dawn implement. The earlier, the better. Having a way to release objects is a requirement for this. So I don't think "that's why we're waiting" is the best way forward. If you don't like the "drop" method, we could agree on it being "release", just without the corresponding acquire.

litherum · 2021-11-30T08:17:37Z

"drop" means "I'm done with it; feel free to delete it whenever you feel like deleting it"? If you "drop" something while it has an outstanding callback, will the callback be called?

kvark · 2021-11-30T15:51:50Z

For us "drop" essentially means "this handle is done". That is, the object itself may very well live longer, issue pending callbacks, etc, but the user will not use this handle directly any more.

austinEng · 2021-11-30T16:43:07Z

issue pending callbacks

oh interesting, our implementation calls all the callbacks with some failure/unknown status when "drop" happens. Don't remember if there was a reason behind this right now. @Kangz ?

benvanik · 2021-12-03T01:13:49Z

Also hopeful for some progress here soonish - I'm trying to use webgpu.h and finding the need for this when doing things outside of one-shot hello world demos where unbounded growth is fine.

kainino0x · 2021-12-04T02:42:02Z

"drop" means "I'm done with it; feel free to delete it whenever you feel like deleting it"? If you "drop" something while it has an outstanding callback, will the callback be called?

All callbacks must be called "eventually". This is really important because of manual memory management: an outstanding callback should be able to keep memory alive and free it on callback (even if the callback "fails"). (Related note in the last paragraph below.)

After a short discussion with Austin, Austin has me agreeing that the liveness semantics in this C API should match the JS API, so that e.g. blink::GPUBuffer exactly holds a WGPUBuffer and can drop it when GC'd.

oh interesting, our implementation calls all the callbacks with some failure/unknown status when "drop" happens. Don't remember if there was a reason behind this right now.

Initially this seemed surprising to me, but iiuc the difference just boils down to when the callback gets "failed" (called with failed status): inside drop() (== refcount reaching 0), inside destroy(), or inside ProcessEvents(). IMO in all cases it should happen in ProcessEvents. But even though it's deferred, that doesn't mean we actually have to keep the originating object alive, e.g.:

A WGPURequestDeviceCallback should be successfully handled even if the WGPUAdapter reference has already been dropped. This should be a common pattern in JS, and a semi common pattern in C.
A WGPUBufferMapCallback should be successfully handled even if the WGPUBuffer reference has already been dropped. This isn't a common pattern in any case, but should work in JS, so it should work in C. In JS, you would keep the mapping alive as long as any ArrayBuffer returned from getMappedRange is alive (no choice here because otherwise GC becomes observable). In C there is no destructor on the raw pointer returned by wgpuBufferGetMappedRange, so doing this would constitute a programming error resulting in a memory leak.

IMO it should never be "valid" (defined behavior) for any wgpu function to be called with any WGPU object argument that has been dropped by the client program. (In Dawn for example I would like a runtime debug ASSERT for this (because due to implementation details the pointer technically stays valid as long as there are internal refs) - this way ownership errors are reliably caught in Debug+ASAN builds.)

Therefore without a ref to WGPUInstance you can't call wgpuInstanceProcessEvents, so you wouldn't be able to get any callbacks to be called. It doesn't really matter whether any objects have internal refs back to WGPUInstance.

Since we guarantee all callbacks will be called, upon dropping the last ref to WGPUInstance all callbacks should be failed. (This is better than debug asserting there are no pending callbacks, because it's not easy to know whether there are any.)

kainino0x · 2021-12-04T03:11:38Z

Oh yeah, and a further thing: since I think C drop should be 1:1 with JS object garbage collection, there should be no observable effects from dropping any object. For example:

Dropping an object shouldn't result in callback failure
Dropping a device shouldn't destroy a buffer which would result in map callback failure

Dropping instances may be a special case. It's impossible in JS so it doesn't present issues implementing the JS API on top of the C API. And if you drop the instance you can't get any callbacks so relatively little is observable anyway. (The return values of wgpuBufferGetMappedRange and create/finish calls are examples of things that are still observable.)

kainino0x · 2021-12-04T03:14:14Z

In JS, you would keep the mapping alive as long as any ArrayBuffer returned from getMappedRange is alive (no choice here because otherwise GC becomes observable).

Which would be implemented by having the ArrayBuffer shared-own its underlying memory, either indirectly by shared-owning a WGPUBuffer, or by some other mechanism (like shared-owning the IPC shared memory).

Kangz · 2021-12-06T10:40:50Z

+1 to what Kai explained. Ideally webgpu.h implements the JS semantics where you can't observe GC happening so that callbacks don't get called on user-side drop, but only when they would have normally been called. (JS calling .destroy can cancel some callbacks, but that's well defined and not exposing GC).

The Unknown was also to deal with the network transparency of WebGPU. If the GPU process gets disconnected, the callbacks should still be called, but with Unknown. Though maybe it could just be DeviceLost for both the actual device loss and the GPU process crashing.

litherum · 2022-03-30T01:00:09Z

Are there any updates? It's been about 4 months.

Kangz · 2022-03-30T18:27:59Z

FYI @jimblandy, this is the issue I mentioned.

Kangz · 2023-05-22T10:17:47Z

@kainino0x @cwfitzgerald PTAL again at the last commit there!

kainino0x

Last commit LGTM, ~~one comment on earlier one~~ EDIT: wrong PR for that comment

Fixes webgpu-native#9

kainino0x · 2023-05-24T00:31:09Z

Clean rebased past #182, landing now

Kangz requested a review from kvark October 17, 2019 12:19

kvark mentioned this pull request Apr 9, 2020

Add PartialEq and Eq for more types gfx-rs/wgpu-rs#216

Closed

Kangz changed the base branch from master to main July 2, 2020 15:05

Kangz mentioned this pull request Aug 23, 2021

Add lifetime management #94

Closed

kainino0x mentioned this pull request Nov 16, 2021

Most objects have no way of setting their label #126

Closed

Kangz force-pushed the ref-release branch from c407f21 to 66e91e4 Compare November 18, 2021 14:57

benvanik mentioned this pull request Dec 3, 2021

WebGPU API pain points/blockers iree-org/iree#6499

Open

7 tasks

kainino0x mentioned this pull request Mar 1, 2022

Figuring out lifetime management #9

Closed

eliemichel mentioned this pull request Mar 8, 2023

Make WebGPU backend portable to emscripten, Dawn, and wgpu-native ocornut/imgui#6188

Closed

kainino0x added the lifetimes Lifetimes of object and memory allocations label May 19, 2023

Kangz force-pushed the ref-release branch from 66e91e4 to 0e5f293 Compare May 22, 2023 10:17

kainino0x removed the request for review from kvark May 22, 2023 22:53

kainino0x reviewed May 22, 2023

View reviewed changes

kainino0x approved these changes May 22, 2023 •

edited

Loading

View reviewed changes

Add Reference / Release for all objects

b39632f

Fixes webgpu-native#9

kainino0x force-pushed the ref-release branch from 0e5f293 to b39632f Compare May 24, 2023 00:28

kainino0x merged commit 2451303 into webgpu-native:main May 24, 2023

kainino0x mentioned this pull request May 24, 2023

Add drop method to wgpu objects #100

Closed

Kangz deleted the ref-release branch May 24, 2023 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Reference / Release for all objects #15

Add Reference / Release for all objects #15

Kangz commented Oct 17, 2019

grovesNL commented Oct 17, 2019

Kangz commented Oct 17, 2019

pythonesque commented Aug 12, 2021 •

edited

Loading

pythonesque commented Aug 12, 2021 •

edited

Loading

kainino0x commented Aug 12, 2021

kainino0x commented Aug 12, 2021

pythonesque commented Aug 12, 2021

pythonesque commented Aug 12, 2021 •

edited

Loading

kainino0x commented Aug 13, 2021

kainino0x commented Aug 13, 2021

grovesNL commented Aug 15, 2021

pythonesque commented Aug 15, 2021 •

edited

Loading

Kangz commented Aug 17, 2021

litherum commented Nov 16, 2021

Kangz commented Nov 18, 2021

kvark commented Nov 22, 2021

Kangz commented Nov 22, 2021

kvark commented Nov 22, 2021

litherum commented Nov 30, 2021 •

edited

Loading

kvark commented Nov 30, 2021

austinEng commented Nov 30, 2021

benvanik commented Dec 3, 2021

kainino0x commented Dec 4, 2021 •

edited

Loading

kainino0x commented Dec 4, 2021

kainino0x commented Dec 4, 2021 •

edited

Loading

Kangz commented Dec 6, 2021

litherum commented Mar 30, 2022

Kangz commented Mar 30, 2022

Kangz commented May 22, 2023

kainino0x left a comment •

edited

Loading

kainino0x commented May 24, 2023

Add Reference / Release for all objects #15

Add Reference / Release for all objects #15

Conversation

Kangz commented Oct 17, 2019

grovesNL commented Oct 17, 2019

Kangz commented Oct 17, 2019

pythonesque commented Aug 12, 2021 • edited Loading

pythonesque commented Aug 12, 2021 • edited Loading

kainino0x commented Aug 12, 2021

kainino0x commented Aug 12, 2021

pythonesque commented Aug 12, 2021

pythonesque commented Aug 12, 2021 • edited Loading

kainino0x commented Aug 13, 2021

kainino0x commented Aug 13, 2021

grovesNL commented Aug 15, 2021

pythonesque commented Aug 15, 2021 • edited Loading

Kangz commented Aug 17, 2021

litherum commented Nov 16, 2021

Kangz commented Nov 18, 2021

kvark commented Nov 22, 2021

Kangz commented Nov 22, 2021

kvark commented Nov 22, 2021

litherum commented Nov 30, 2021 • edited Loading

kvark commented Nov 30, 2021

austinEng commented Nov 30, 2021

benvanik commented Dec 3, 2021

kainino0x commented Dec 4, 2021 • edited Loading

kainino0x commented Dec 4, 2021

kainino0x commented Dec 4, 2021 • edited Loading

Kangz commented Dec 6, 2021

litherum commented Mar 30, 2022

Kangz commented Mar 30, 2022

Kangz commented May 22, 2023

kainino0x left a comment • edited Loading

Choose a reason for hiding this comment

kainino0x commented May 24, 2023

pythonesque commented Aug 12, 2021 •

edited

Loading

pythonesque commented Aug 12, 2021 •

edited

Loading

pythonesque commented Aug 12, 2021 •

edited

Loading

pythonesque commented Aug 15, 2021 •

edited

Loading

litherum commented Nov 30, 2021 •

edited

Loading

kainino0x commented Dec 4, 2021 •

edited

Loading

kainino0x commented Dec 4, 2021 •

edited

Loading

kainino0x left a comment •

edited

Loading