Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CanvasItem clip children is extremely slow #79439

Open
djrain opened this issue Jul 13, 2023 · 10 comments
Open

CanvasItem clip children is extremely slow #79439

djrain opened this issue Jul 13, 2023 · 10 comments

Comments

@djrain
Copy link

djrain commented Jul 13, 2023

Godot version

4.1 stable

System information

macOS, Android, iOS - all renderers

Issue description

We're making a mobile game, and we were getting poor performance on Android. After some testing we found that having even a small number of nodes with clipping enabled was the culprit, making this feature borderline unusable. Turning clipping off brought the game back up to acceptable FPS.

The slowness happens in all renderers and not just on mobile. Having more children being clipped does not seem to matter much - only the total number of nodes with clipping enabled.

I would normally assume that clipping might have a significant performance impact, but this blog post stated that clipping happens "at literally no cost", which sounded too good to be true, but based on that this result is very unexpected.

If a notable performance hit is indeed expected when using this, it would be nice to have documented.

Steps to reproduce

Run MRP main scene. Press enter to toggle clipping on and off, and observe huge FPS change. (note vsync is off)
On my M1 Mac it drops from around 1300 fps to about 75.

Minimal reproduction project

ClipChildrenSlow.zip

@Calinou
Copy link
Member

Calinou commented Jul 13, 2023

I can confirm this on 4.1.stable (Linux, GeForce RTX 4090 with NVIDIA 535.54.03).

1152×548

Clipping disabled Clipping enabled
11471 FPS (0.09 mspf) 1902 FPS (0.53 mspf)

3840×2160 (disabled stretch mode)

Clipping disabled Clipping enabled
5711 FPS (0.18 mspf) 1478 FPS (0.68 mspf)

3840×2160 (canvas_items stretch mode, makes clipped sprites larger)

Clipping disabled Clipping enabled
4114 FPS (0.24 mspf) 844 FPS (1.18 mspf)

What's interesting is that while GPU usage goes up with clipping enabled, power consumption goes down (despite both cases running at the same GPU core and memory clocks):

Clipping disabled Clipping enabled
Screenshot_20230714_000037 Screenshot_20230714_000050

@djrain
Copy link
Author

djrain commented Jul 17, 2023

It's quite a bummer to have such a useful feature hindered by poor performance :( @Calinou can we tag someone who might have some ideas?

@Alexfox22

This comment was marked as off-topic.

@clayjohn
Copy link
Member

This highlights an important difference between mobile and desktop GPUs. Due to a difference in architecture mobile GPUs pay a much larger performance penalty for every pixel touched and for switching render targets.

CanvasItem clipping requires switching render targets twice (once to back buffer, then back to front buffer) and the way it is implemented now requires a full screen copy (it touches a lot of pixels).

On desktop this is essentially free as the cost of switching render targets and copying pixels is super low.

Ultimately the render target switching can't be reduced so clipping will always be somewhat expensive on mobile.

Right now we always copy the full front buffer when doing the clipping, but we really only need to do that when mipmaps are enabled. We can use a tougher clipping rect to reduce the cost of copying the pixels, but I doubt even that optimization will be enough to make this efficient on mobile devices.

@djrain
Copy link
Author

djrain commented Sep 11, 2023

On desktop this is essentially free

But it doesn't seem to be the case, both Calinou and I observed a significant FPS drop on desktop?

@PickleJesus123
Copy link

PickleJesus123 commented Oct 10, 2023

My performance issues were coming specifically from CanvasGroup, not CanvasItem.

Perhaps there should be a 'warning label' in the Godot documentation about using CanvasGroups in mobile projects?

As a new transplant from Unity, I started 'intuitively' using CanvasGroups all throughout my project. It wasn't until significantly later that I realized it was killing my app's performance. I worry that a lot of other newbies will do the same thing, and that may lead to doubts about Godot's capabilities.

@clayjohn
Copy link
Member

Thinking more about this, I can think of two things to explore to improve performance:

  1. As mentioned above, we need to ensure we use the smallest clipping rect possible to reduce the number of pixels being copied to the backbuffer
  2. We might be able to do some smart caching of CanvasGroups and cache the results within the CanvasGroup if none of the child nodes is changed. Right now it is totally dynamic, which saves on memory, but ends up doing the same calculations every frame

In the short term we may indeed want a clear warning in the documentation as the current design is very bad for mobile and that won't change without drastic intervention

@saletrak
Copy link

saletrak commented Nov 6, 2023

@clayjohn Hi, tell me if I'm wrong, but what do you think about making a clipping inside SubViewport, to generate clipped image only once and display it as a texture?

@clayjohn
Copy link
Member

clayjohn commented Nov 7, 2023

@clayjohn Hi, tell me if I'm wrong, but what do you think about making a clipping inside SubViewport, to generate clipped image only once and display it as a texture?

That's definitely an optimization you can do today. It is slightly more cumbersome than using clip_children directly. But it allows you to cache the results of clip_children which will be a net win for performance. In cases where you don't need the clip_children node and child nodes changing every frame, it is definitely best to cache the results in a SubViewport and apply the SubViewport's texture directly.

@insomniacUNDERSCORElemon

Could this be done by only updating when needed? For instance, only updating:

  • on every animation frame (for parent or child nodes) if one is present
    • this means that a continuous animation (or interpolation itself) will be more taxing than a
      discrete animation (especially the fewer frames it has)
    • animated images (/sequences) will update at their framerate (low fps (12fps?) or dynamic fps (twos, threes, and fours) could help here)
      • dynamic I could see being clunky if it needs post-import work
    • excluding translation of the parent
  • on state changes (state machine)
  • on visibility/reorder changes with child nodes
    • shouldn't be needed for visibility of the parent node (at least not immediately freed), but I guess that might be a memory thing
  • on parent scale, camera zoom, or resolution change
    • perhaps skew and rotation, at different intervals (by unit change or a set global interval like 15fps or even 1fps) depending on desired quality
    • in all cases here, potentially based on performance/demand
  • on any transforms of child nodes
  • on other changes (color, modulation?) particularly if partial transparency is involved
  • shader changes, particularly vertex?
  • a custom global max update speed might make stylistic sense for interpolated animations (or zooming) for clips
    • especially with other aspects of the animation (like parent movement) being faster and smoother

? Assuming there isn't something blocking this that I'm missing, it would make static art the default (and updating based on what is added) solving proposal 8747.

Also mentioned in the PR (cascaded) above:

  1. instancing optimization would be very useful particularly for tilemaps and spawning, which need it due to rendering the same scene multiple times over (perhaps animations making that a bit more tricky).

  2. MSAA really gives a massive limitation on clipping instances w/the PR (only rendering 22-25% of the instances compared to 4.2.2's implementation, at least for me w/a 1050Ti where MSAA is a major limit there) though I have not tested if downscaling could be a better alternative to get AA.


I would say that perhaps clipping could be skipped in certain scenarios, though that would probably be mostly used for eyes (when the iris is not near the edge) and maybe a few other character/dynamic things and probably not much else (most interesting art done with this will always need clipping). Though in cases where that is viable, it could be turned off by animating/scripting the value.

In a similar vein, could partial/incremental updates be a thing, particularly for less complex setups/untextured polygons?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: For team assessment
Development

No branches or pull requests

7 participants