Profiling performance of rendering #664

marcus-bigrep · 2021-01-13T13:50:05Z

I deal with big data and some times preview rendering is here quite slow < 1 FPS.
Thus, I have now done some profiling and results are strange.
Take attached 3mf and code adjustments at the bottom of the issue. Then slice the sample and rendering the preview of the gcode.
The log states that (preview) rendering takes <0.3sec but I get >1sec if I manually stop the time with a stop watch (from "start rendering" to "end rendering"). In particular, >1sec is the real rendering time.
Do you know where the time is spent?

https://github.com/Ultimaker/Uranium/blob/master/UM/Qt/Bindings/MainWindow.py

    import time

    def _render(self):
        tstart = time.perf_counter()
        Logger.log("w", "start rendering")

        if self._full_render_required:
            renderer = self._app.getRenderer()
            view = self._app.getController().getActiveView()
            renderer.beginRendering()
            view.beginRendering()
            renderer.render()
            view.endRendering()
            renderer.endRendering()
            self._full_render_required = False
            self.renderCompleted.emit()
        else:
            self._app.getRenderer().reRenderLast()

        tend = time.perf_counter()
        Logger.log("w", "end rendering %s", str(tend - tstart))

UMS5_BigRepManifold.zip

The text was updated successfully, but these errors were encountered:

nallath · 2021-01-14T10:11:23Z

I don't really understand how you timed this. When profiling code like this, I'd recommend using something like https://github.com/pyutils/line_profiler since it shows per line how much time was spent on it.

marcus-bigrep · 2021-01-14T10:53:47Z

Thanks for the advice, I'll do it as well.
Though, I don't think it is about digging deeper in this function.
It seems that there is something in the message handling and Qt which takes a lot of time. And this is a big black hole, I don't understand.
In particular, it is not about this code. This code is fast.

Bottom line of this investigation is:
Log message is performed asynchronously and has a latency of >0.7sec.
Especially, the question is what happens in the 0.7sec? Do you have an idea?

nallath · 2021-01-14T13:17:25Z

I'm not sure what part of the messaging that could be. It has to be something with opengl since the rendering is so much faster if you only display a few lines.

But i do have the feeling that you might be on to something here that I missed the last few times that I tried profiling. Problem is; I don't know what I missed...

marcus-bigrep · 2021-01-14T13:30:10Z

Just why I stumbled about it. I hacked in some caching of VBOs.
Then the render method takes 0.03 (i.e. 6x faster) but overall duration is still >1sec.

nallath · 2021-01-14T14:08:15Z

I managed to find it what is causing it; it's the frame swap. I don't know why that would take so much longer than the actual rendering...

marcus-bigrep · 2021-01-14T15:39:30Z

Where is it in the code?

nallath · 2021-01-14T16:01:45Z

It's in Qt itself. The QQUickWindow has a number of stages: https://doc.qt.io/qt-5/qquickwindow.html#RenderStage-enum

I've been digging around in the documentation of Qt and it seems that the swap stage is when the actual openGL calls that are queued are made. So in that sense it makes sense that the swap takes a long time.

I've hooked it up to QtCreator to debug this a bit, and this is a screenshot from what the scenegraph is doing:

The bar before the swap is the actual render (which is 56.8 ms).

marcus-bigrep · 2021-01-15T10:09:15Z

What surprises me that caching the VBOs has so little impact on the performance.
I am looking now for some OpenGL profilers to get a better understanding.

nallath · 2021-01-15T11:45:20Z

The thing is that we use a shader to get the 3D effect of the layers. We could consider pre-generating the verts and send that to be rendered. It might make it faster at the expense of more pre-computation.

marcus-bigrep · 2021-01-15T11:51:42Z

Still my gut feeling is that VBO caching should have significant effect even with the shaders.

But your idea is good and Cura's old approach is exactly this. I'll try VBO caching with the old approach.
(The issue with the old approach is that pre-generating is done in Python which is damn slow.)

nallath · 2021-01-15T12:23:46Z

We could also do the pre-generating in C++ and make a python binding.

I do agree with your gut feeling though. The VBO caching should have more effect.

marcus-bigrep · 2021-01-19T08:26:07Z

Update from my side:

I have profiled it with NVIDIA NSight and >90% is up to rendering the tubes. In particular, this backs up that VBO caching does not take much time.
I was wrong with that tubes were generated on the cpu in a former version. In fact, there is a compatibility mode which renders pure lines. This is fast though.

I'll hack in the tubes on the cpu now and do some benchmarking.

marcus-bigrep · 2021-01-26T14:55:49Z

What I have done now:
I have "exported" the preview to an stl file. This file is huge >3GB. Of course, you may store it more efficiently (triangle strip etc.).
Though, colors and vertex normals come up additionally and then you likely end up by even more memory consumption.
My gfx card has 4 GB of RAM and won't make it.

Long story short, I don't think that creating the tubes initially and then cache them, will work out.

In case of 1fps, one can have a simple view (i.e. lines) for navigating and then rendering the ordinary view afterwards for the "static" situation. Though, my resources are slipping away and I won't make this in the short term.
@nallath Maybe it's something for the ordinary cura development? ;-)

nallath · 2021-01-27T09:10:50Z

Could you share the project file to create the tubes file (and the tubes file itself?). It would help me to convince people that this is worth the effort.

marcus-bigrep · 2021-02-01T20:51:55Z

(1)
Sorry for answering not earlier. I have added a simple file to generate out of an txt file an stl file of tubes.
Furthermore you have to insert
with open(r"c:\dev\ttt.txt", "a") as file: numpy.savetxt(file, data)
in LayerPolygon.init

tube_to_stl.zip

(2) I had a look at Simplify3d and PrusaSlicer. They are about 10x faster.
Definitely, we should look into that! IMHO, this is the way to go.

bremco · 2021-07-26T09:17:09Z

Did a PR (both Uranium, for the actual VAO issues and Cura, for the shader implementation the fix implies) that might partially(?) fix this. Not sure if people get notifications if you just mention an issue, so here is a message as well :-)

Ghostkeeper · 2021-11-15T14:27:07Z

With Bremco's PR I observed a 4-fold performance increase in a large layer view. Whatever benchmarks you have now will be invalidated with the next release. Perhaps we can then consider this issue fixed?

Ghostkeeper changed the title ~~Profile Rendering~~ Profiling performance of rendering Jan 25, 2021

This was referenced Jul 26, 2021

Graphic Buffers Update #714

Closed

Graphic Buffers Update Ultimaker/Cura#10188

Closed

This was referenced Aug 9, 2021

Graphics Buffer Update -- New PR Ultimaker/Cura#10259

Merged

Graphics Buffer Update -- New PR #721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling performance of rendering #664

Profiling performance of rendering #664

marcus-bigrep commented Jan 13, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 14, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 14, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 14, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 15, 2021

nallath commented Jan 15, 2021

marcus-bigrep commented Jan 15, 2021

nallath commented Jan 15, 2021

marcus-bigrep commented Jan 19, 2021 •

edited

Loading

marcus-bigrep commented Jan 26, 2021

nallath commented Jan 27, 2021

marcus-bigrep commented Feb 1, 2021

bremco commented Jul 26, 2021

Ghostkeeper commented Nov 15, 2021

Profiling performance of rendering #664

Profiling performance of rendering #664

Comments

marcus-bigrep commented Jan 13, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 14, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 14, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 14, 2021

nallath commented Jan 14, 2021

marcus-bigrep commented Jan 15, 2021

nallath commented Jan 15, 2021

marcus-bigrep commented Jan 15, 2021

nallath commented Jan 15, 2021

marcus-bigrep commented Jan 19, 2021 • edited Loading

marcus-bigrep commented Jan 26, 2021

nallath commented Jan 27, 2021

marcus-bigrep commented Feb 1, 2021

bremco commented Jul 26, 2021

Ghostkeeper commented Nov 15, 2021

marcus-bigrep commented Jan 19, 2021 •

edited

Loading