-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add selection outline shaders #72
Conversation
Omits out-of-view line emission, tested up to 34% faster
dc54b2a
to
9478df9
Compare
Major changes, including VAO/buf obj. rewrite
So they don't have to be recreated for every layer
9478df9
to
c8fb6b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did run into an issue where the program stalls when you select a pixel with certain images. The picture in question is 7680x4320 large.
if (i == wid * chunkSize) { | ||
chunkIndices[wid] = 0; | ||
next[wid] = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statements in this condition can be moved before the loop.
} | ||
|
||
// Render draws the ui.Component | ||
func (l Layer) Render(view sdl.FRect) { | ||
func (l Layer) Render(view sdl.FRect, program gfx.Program) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use pointer method receivers, so the method receiver type is the same for all the methods.
This method and Destroy
still have value method receivers.
if iv.selLayer != nil { | ||
p.X -= iv.selLayer.area.X | ||
p.Y -= iv.selLayer.area.Y | ||
return iv.selLayer.SelectTexel(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes more sense for this condition to be inversed; by checking the unordinary case, where it's nil
, we can keep the main logic as far left as possible.
These changes feature the return of the dotted outlining of arbitrary selections. On the more basic side, the "BufferArray" object was rewritten to focus exclusively on VAOs (Vertex Array Objects), and a more pure "BufferObject" was created that can handle OpenGL buffer objects in a more generic sense. The bottom line is that when generating a collection of the selected-texel point coordinates, the compute shaders write vertex information into an SSBO (Shader Storage Buffer Object). The underlying buffer object here is no different than any other buffer object, so the VAO needs to be able to switch from looking at the traditional VBO (Vertex Buffer Object) that it is has by default, to then look at the SSBO data and treat it like vertex information.
As for the main feature of this PR, selection outlines have made a return. The first major push of this change was implementing a geometry shader that converts a theoretical coordinate of a selected texel into line vertices for where the outlines should be placed. This was a great improvement, but what was left (99% of the slowness) was improving the way those texel coordinates were emitted into the rendering pipeline in the first place. We first made a CPU parallel algorithm that operates (over a selection texture with binary information on whether a texel is selected) in 3 steps:
This implementation was very good in comparison to the old sequential approach, where one thread would simply loop over an entire set containing the selections and populate a fixed array to send to the rendering pipeline. In our testing, we saw a 105%-107% speedup in extreme cases. The only problem with this approach is that it doesn't take advantage of the GPU - all of these calculations rely on the parallelism in the CPU, which is very limited. So, mainly wanting to grow our OpenGL knowledge, we implemented the same algorithm in compute shaders. The only issue was that the prefix sum step was incredibly slow to run on a single worker sequentially, so I had to implement a rather messy parallel prefix sum algorithm in another compute shader. (I don't know how to do it better with the limitations of GLSL - it plagued all of the shaders with 2 copies of essentially the same thing). The result is a lightning fast algorithm that in small cases is similar to the original CPU parallel solution, but in extreme cases can even get up to 10x faster. The run time depends heavily on how the chunkSize is set - right now it is hard coded at 256, but could be improved by intelligently deciding what it should be for a given image.
Closes #10
Closes #11
Closes #44
Closes #46