Use instanced array buffer instead of UBO for canvas item batching #70065
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This simplifies the generated shader code which increases both performance and compile time on low end devices. However, the main purpose of this PR is to get OpenGL rendering on Apple devices again.
Fixes: #68760
May fix: #68980
May fix: #68819
May fix: #67595
Background
This started as a search for a workaround to the issues we were facing on Apple devices, but as it turns out, this new approach can also be more efficient on some devices, especially low end devices.
Basically what this PR does is switch from using a giant Uniform Buffer Object to using a giant Vertex Buffer Object. Instead of dynamically indexing into an array of uniforms (using instance ID is the index) in the shader, we push instance rate attributes to the shader. This improves codegen as there is no dynamic indexing.
For Uniform Buffers we had the ability to bind a specific range, for Vertex Buffers we don't, but we can specify an offset while binding the buffer. Unfortunately, this means we need to rebind the buffer for every batch. In my testing this doesn't carry a noticeable performance penalty.
By way of background, the approach we were using is often referred to as "vertex pulling" i.e. we are using an index to pull vertex data from some buffer. This contrasts with "vertex pushing" which is when we use attributes to push vertex data to a shader (so the shader only sees the data for the current vertex). On modern desktop hardware vertex pulling can be way more efficient as it often allows you to get by with much less data.
In our case, since we only need information on a per-vertex or per-instance basis, the built in tools in OpenGL for vertex attributes and instance attributes work just fine. There is an overhead to instancing, however, in my testing it appears the benefits outweigh the costs.
Performance analysis
I carried out the best analysis I could of rendering performance. For the most part this means that I profiled GPU operations using the best tools available for the platform. On macOS I was limited to printing FPS which is understandably a poor way of measuring GPU performance.
All measurements were taken with the Project Manager
--print-fps
** This device is a 2014 Chromebook with ChromeOS wiped
Given these results, I feel comfortable saying that this PR will run at least as fast as the previous approach and may even run faster on some devices.
Further work
I quickly tried out two improvements that are promising but did not show a performance improvement on my main development device (there may be improvements on other devices though). Noting the approaches here for future reference:
glDrawArrays
instead ofglDrawArraysInstanced
). Doing so resulted in no performance benefit. This may be because the drivers on my development device optimize for this situation already. This should be tested on a wider variety of hardware.