Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display list culling prototype based on buffer offsets #38189

Closed
wants to merge 2 commits into from

Conversation

flar
Copy link
Contributor

@flar flar commented Dec 11, 2022

This is a variant of #38304 that uses offsets of the ops within the buffer rather than indices to control which ops re rendered or culled. The description that follows is mostly the same as the other PR.

This PR creates the underpinnings for a mechanism that can be used to cull DisplayLists based on the offsets of the rendering operations that are desired. Such a list of offsets could be obtained from the RTree that the DisplayList maintains as long as it is annotated with the offset into the buffer of the DLOp that generated the rectangle in the tree.

Note that only the offsets of the rendering ops are required.

The following ops will cull themselves if they are not needed:

  • all rendering ops (Draw...Op)
  • all saveLayers with their matching restores
  • all save operations with their matching restores
  • all transform ops
  • all clip ops

The following ops will not cull themselves and will always be executed. The tracking to prevent them from being executed would require a lot of overhead that is unlikely to save much time:

  • all set/clear attribute calls, most of which assign a simple value into a field in the dispatcher, some of which may cause a shared object to be created, though.

The code has been tested by setting the "next_rendering_offset" to the current offset or to 0 and all of the tests seem to pass which indicates that either of these techniques can be used in the un-culled Dispatch case. To enable culling, a list of offsets would have to be provided and the line of code in DisplayList::Dispatch which talks about how to update next_rendering_offset would need to be updated to fill in the offset of the next rendering primitive in the list whenever we pass and execute the previous primitive.

The performance of the overhead of this mechanism has not (yet) been measured.

@flar
Copy link
Contributor Author

flar commented Dec 12, 2022

Proceeding further with this prototype would like require landing the code that merges the bounds/rtree generation into the builder process so that we can remember the buffer offsets for the various primitives that triggered entries in the rtree.

I looked briefly at a workaround, but the bounds/rtree calculators that are currently used don't have access to the offsets of the primitives that they are measuring and potentially recording and we won't really know at construction/build time which primitives will have an associated rect in the tree. The main issues seem to be:

  • Primitives which flood the clip due to modifying transparent black often simply record a flag rather than add a bounds
  • saveLayer usually doesn't add a bounds, and this prototype doesn't need them to do so, but in some cases of having attributes that flood the enclosing clip, a saveLayer might add the clip bounds to the accumulator.

Because of the indeterminate nature of the above optional bounds cases, the builder can't simply remember all of the offsets proactively since not every offset is associated with an entry in the tree.

@flar
Copy link
Contributor Author

flar commented Dec 12, 2022

Looking for feedback from @ColdPaleLight and @endless7

@flar
Copy link
Contributor Author

flar commented Dec 12, 2022

It occurs to me that a variation of this prototype could be based on the "index" of the rendering primitives - i.e. counting records rather than offsets. That variation would require the rendering ops (or any op that has a bounding box in the rtree) to increment the cur_index rather than having that done in the enclosing loop.

@flar
Copy link
Contributor Author

flar commented Dec 13, 2022

I have a prototype of the "index" version that I've also plumbed to remember the indices of the ops that have a rectangle in the rtree so that it can actually perform an RTree search and identify the ops that need to be culled. I need to modify the rendering code to pass in the cull_rect at render time to see if it is working, though. I'll submit that as a new PR so that the 2 implementations can be compared.

@ColdPaleLight
Copy link
Member

I have a prototype of the "index" version that I've also plumbed to remember the indices of the ops that have a rectangle in the rtree so that it can actually perform an RTree search and identify the ops that need to be culled. I need to modify the rendering code to pass in the cull_rect at render time to see if it is working, though. I'll submit that as a new PR so that the 2 implementations can be compared.

The implementation in SkPicture is very similar to this, and @endless7 has tried this solution before. The conclusion at that time is that this solution requires a one-to-one correspondence between the rects in RTree and the Ops in DisplayList (they are already in one-to-one correspondence in Skia), but now there is not a one-to-one correspondence in DisplayList.

c.f.
https://github.com/google/skia/blob/b37516f2cf040e02783810a6c587555cbff559c1/src/core/SkRecordDraw.cpp#L26-L47

@flar
Copy link
Contributor Author

flar commented Dec 13, 2022

The implementation in SkPicture is very similar to this, and @endless7 has tried this solution before. The conclusion at that time is that this solution requires a one-to-one correspondence between the rects in RTree and the Ops in DisplayList (they are already in one-to-one correspondence in Skia), but now there is not a one-to-one correspondence in DisplayList.

c.f. https://github.com/google/skia/blob/b37516f2cf040e02783810a6c587555cbff559c1/src/core/SkRecordDraw.cpp#L26-L47

This whole PR is to demonstrate that that conclusion is not accurate. Neither of these prototypes necessarily require that condition.

saveLayer operations can have a rect or not, they will still be executed appropriately.

Rendering operations require an op, though, (unless they are a NOP or clipped out) and that is the one glitch left. Right now if a rendering operation has an attribute which causes it to flood the clip area, and no clip has been established and no bounds were handed to the Builder, then it will only set a flag. For RTree generation we should simply require a bounds to be handed to the Builder and then there will be a 1:1 correspondence between rendering ops and rects and the requirements of these 2 prototypes will be satisfied.

@flar
Copy link
Contributor Author

flar commented Dec 14, 2022

Waiting for #34365 to land before I do much more work on this concept. It will be much easier to track ops vs rects once that PR lands.

One issue that I see is that sometimes adding a DL will add a bunch of rects to the RTree and they might all show up in the query that is trying to do render clipping. We might want to have a way to query the RTree for embedder overlay detection that includes those sub-rects and another way that is doing render culling that only queries on the bounds. This could all be done with metadata and filtering in the methods that do the query without having to compute multiple RTrees, but given that dispatching a sub-DL from a parent-DL can pass along the culling, we only need its rect to appear once in the query list for render culling.

@flar
Copy link
Contributor Author

flar commented Dec 15, 2022

The "op index" variant of this PR has been submitted as #38304

@flar flar changed the title Display list culling prototype Display list culling prototype based on buffer offsets Dec 15, 2022
@flar flar marked this pull request as draft December 15, 2022 01:10
@flar
Copy link
Contributor Author

flar commented Dec 22, 2022

Closing in favor of #38429

@flar flar closed this Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants