Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very high CPU usage with empty example using just default plugins #1462

Closed
nehalem501 opened this issue Feb 17, 2021 · 37 comments
Closed

Very high CPU usage with empty example using just default plugins #1462

nehalem501 opened this issue Feb 17, 2021 · 37 comments
Labels
C-Performance A change motivated by improving speed, memory usage or compile times O-MacOS Specific to the MacOS (Apple) desktop operating system

Comments

@nehalem501
Copy link

nehalem501 commented Feb 17, 2021

Bevy version

Tested on both 0.4 and recent master a895256.

Operating system & version

macOS 10.14 x86_64 (MacBook Pro 2014, 2c/4t 2.8GHz Intel Core i5).
Latest nightly rustc, with the default rust linker (I haven't installed lld).

What you did

I tested the following code on both versions mentioned above, in debug and release modes.

main.rs:

use bevy::prelude::*;

fn main() {
    App::build()
        .add_plugins(DefaultPlugins)
        .add_plugin(bevy::diagnostic::FrameTimeDiagnosticsPlugin::default())
        .add_plugin(bevy::diagnostic::LogDiagnosticsPlugin::default())
        .run();
}

Cargo.toml:

[package]
name = "city_simulator"
version = "0.1.0"
authors = ["nehalem501"]
edition = "2018"

[dependencies]
bevy = { git = "https://github.com/bevyengine/bevy", rev = "a8952569253c5712e890e250cd10c999c1a6b379" }

What you expected to happen

Have the app window show app with low CPU usage.

What actually happened

I have 100% CPU usage with this code when compiled in debug mode, and around 50% CPU usage in release mode. This seems very excessive for drawing an empty window and displaying a few log lines in a terminal.
Am I doing something wrong ?

Additional information

These are the logs in debug mode, vsync seems to work:

Feb 17 09:27:18.040  INFO bevy_diagnostic::log_diagnostics_plugin: frame_time  : 0.017584    (avg 0.016661)
Feb 17 09:27:18.040  INFO bevy_diagnostic::log_diagnostics_plugin: fps         : 59.463637   (avg 59.926370)
Feb 17 09:27:18.040  INFO bevy_diagnostic::log_diagnostics_plugin: frame_count : 227.000000  (avg 227.000000)

And these the logs in release mode:

Feb 17 09:44:54.987  INFO bevy_diagnostic::log_diagnostics_plugin: frame_time  : 0.017124    (avg 0.016655)
Feb 17 09:44:54.987  INFO bevy_diagnostic::log_diagnostics_plugin: fps         : 60.254282   (avg 60.020315)
Feb 17 09:44:54.987  INFO bevy_diagnostic::log_diagnostics_plugin: frame_count : 1605.000000  (avg 1605.000000)

Screenshot of System Monitor with the debug version:
Screenshot 2021-02-17 at 09 29 39

Screenshot of System Monitor with the release version:
Screenshot 2021-02-17 at 13 34 36

@Ratysz Ratysz added C-Performance A change motivated by improving speed, memory usage or compile times O-MacOS Specific to the MacOS (Apple) desktop operating system labels Feb 17, 2021
@nehalem501
Copy link
Author

This is what the "Instruments" app outputs when running the Time Profiler on my release binary, might be helpful.

Screenshot 2021-02-17 at 17 06 03

Screenshot 2021-02-17 at 17 06 22

@malj
Copy link

malj commented Feb 18, 2021

Similar issue on Windows 10.0.19042 Build 19042 with Bevy 4.0: around 30-40% CPU usage and 40-50% integrated Intel GPU usage (instead of using dedicated Nvidia GPU) on a blank window, causing laptop fans to start spinning like crazy:

use bevy::prelude::*;

fn main() {
    App::build().add_plugins(DefaultPlugins).run();
}

@nehalem501
Copy link
Author

nehalem501 commented Feb 19, 2021

After further testing, I seem to have similar symptoms when just using MinimalPlugins.

use bevy::prelude::*;

fn main() {
    App::build().add_plugins(MinimalPlugins).run();
}

@cart
Copy link
Member

cart commented Feb 19, 2021

Can you add the frame time and log diagnostic plugins and respond with your framerate? I have a feeling that you aren't getting frame limited and so the cpu is just spinning:

.add_plugin(FrameTimeDiagnosticsPlugin::default())

When bevy is frame limited (and built in release mode) it is much kinder to cpus.

@cart
Copy link
Member

cart commented Feb 19, 2021

Oh wait you already did that :)

@cart
Copy link
Member

cart commented Feb 19, 2021

Just ran the code in this issue's description (using the master branch) and got the following results:

image

@malj I expect Bevy 0.4 to not be frame limited with the example you posted due to us moving to Mailbox vsync instead of Fifo vsync. We have since reverted that change on master until we can sort out a better frame limiting approach.

However @nehalem501 clearly has frame limiting, so something else is going on there. Maybe it related to macs? If I find time i'll try testing that out.

@nehalem501
Copy link
Author

I actually have similar results on my Linux machine. According to the logs, frame limiting seems to work.

Same code as in my initial bug report, running a release build with git bevy revision a895256.

Ubuntu 20.04 x86_64 with a 16t/8c AMD CPU, Nvidia proprietary GPU drivers, X11 (Unity DE).

Rust version:

nightly-x86_64-unknown-linux-gnu (default)
rustc 1.52.0-nightly (a143517d4 2021-02-16)

What top -i displays:

top - 21:24:06 up 14 days,  3:21,  4 users,  load average: 1,07, 2,81, 1,89
Tasks: 404 total,   2 running, 402 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5,3 us,  1,1 sy,  0,0 ni, 93,6 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Mem :  16025,8 total,   1943,8 free,  11102,8 used,   2979,1 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.   3865,4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                    
2856197 tomek     20   0 1906112 211756 191644 R  71,1   1,3   1:58.92 bevy_test   

logs:

Feb 19 21:26:57.014  INFO bevy_diagnostic::log_diagnostics_plugin: frame_time   : 0.016570    (avg 0.016663)
Feb 19 21:26:57.014  INFO bevy_diagnostic::log_diagnostics_plugin: fps          : 59.994043   (avg 59.985289)
Feb 19 21:26:57.014  INFO bevy_diagnostic::log_diagnostics_plugin: frame_count  : 18939.000000  (avg 18939.000000)

@samnm
Copy link

samnm commented Mar 3, 2021

I have been experiencing the same thing on OSX.

This seems to be the biggest problem area:
Screen Shot 2021-03-03 at 12 38 33 PM
Note this is running on bevy-ecs-v2 and I also updated futures-lite, async-executor, and async-channel just in case.

I would love to help but I have no idea what the source of the problem could be. Happy to continue investigating if anyone wants to point me in the right direction.

@samnm
Copy link

samnm commented Mar 5, 2021

On @cart's recommendation I am testing performance for task_pool pre #932 and #892.

Here is my diff on main:

index db98543b..ecc5f457 100644
--- a/crates/bevy_tasks/src/task_pool.rs
+++ b/crates/bevy_tasks/src/task_pool.rs
@@ -205,22 +205,15 @@ impl TaskPool {
                 // this so we must convert to 'static here to appease the compiler as it is unable to
                 // validate safety.
                 let fut: Pin<&mut (dyn Future<Output = Vec<T>>)> = fut;
-                let fut: Pin<&'static mut (dyn Future<Output = Vec<T>> + 'static)> =
+                let fut: Pin<&'static mut (dyn Future<Output = Vec<T>> + Send + 'static)> =
                     unsafe { mem::transmute(fut) };

                 // The thread that calls scope() will participate in driving tasks in the pool forward
                 // until the tasks that are spawned by this scope() call complete. (If the caller of scope()
                 // happens to be a thread in this thread pool, and we only have one thread in the pool, then
                 // simply calling future::block_on(spawned) would deadlock.)
-                let mut spawned = local_executor.spawn(fut);
-                loop {
-                    if let Some(result) = future::block_on(future::poll_once(&mut spawned)) {
-                        break result;
-                    };
-
-                    self.executor.try_tick();
-                    local_executor.try_tick();
-                }
+                let spawned = executor.spawn(fut);
+                future::block_on(spawned)
             }
         })
     }

And performance for default plugins does improve - not as much as I hoped - from ~130% of CPU to ~75%. Good news is that bevy_render::render_graph::render_graph_schedule_executor_system went from 40% of CPU time to 0.6%!
Screen Shot 2021-03-05 at 3 12 19 PM

I will try to find the other offenders considering how much CPU time an empty project is still using.

Note: I am not sure of the significance of changing local_executor.spawn(fut) to executor.spawn(fut) but it was required to stop the app from locking.

@cart
Copy link
Member

cart commented Mar 5, 2021

I'm still not fully convinced that "combined os-reported cpu usage percentage" is by itself a good metric to optimize for. Ex: if we do multi-threading each frame, work will be spread out across all cores. That means each frame we're using 800% cpu on an 8 core machine when all cores are being utilized (vs 100% cpu if we only single thread). Depending on how the os samples and aggregates these numbers, we might have a situation where multithreading is faster / less power hungry / consumes fewer resources overall across a given slice of time, but the opposite is reported.

Before making design decisions based on these numbers, we might want to determine how these metrics are calculated and whether or not they are a good measure for realtime multithreaded apps.

@cart
Copy link
Member

cart commented Mar 5, 2021

briefly puts on world-weary project owner hat
however given the frequency of percent-cpu-usage-related issues, maybe we should optimize for public perception anyway, despite the facts of the situation

@samnm
Copy link

samnm commented Mar 5, 2021

How about CPU temps? 😝

I am definitely not getting 60 fps on a high end macbook pro with really simple bevy projects so something is wrong.

@cart
Copy link
Member

cart commented Mar 5, 2021

What is "really simple" in this context? Can you share a link or describe your use case? There are a number of knowm performance limitations that wouldn't surface in "empty projects", which is what this issue is specifically about.

@samnm
Copy link

samnm commented Mar 5, 2021

That is fair, it's likely my performance issues are more related to other problems. I am just rendering text and UI quads. Does that mean that empty projects get something into a pathological state and real projects are taxing enough that this performance problem doesn't arise?

@nehalem501
Copy link
Author

I agree that using CPU percentage for measuring performance of big multithreaded apps is not very serious. But an empty project with just the default plugins shouldn't make a laptop fan spin at full speed.

If I write minimal C code to show a window and call glClear at 60hz, I expect it to not be very demanding and not use 60% of my CPU.
Bevy is of course much more complex, so I expect more bloat than this very simple example, but if I write a game, I want most resources to be used on this game's complex algorithms, not on the framework.

Please don't view this as a rant, I'm just trying to understand what is going on.

@samnm
Copy link

samnm commented Mar 5, 2021

Also it's worth noting that the numbers for render_graph_schedule_executor_system performance were through Instruments time profiler so they are likely reasonably accurate, and a relative change of 40% of the measured CPU time to 0.6% is still significant even if the metric isn't perfect.

@cart
Copy link
Member

cart commented Mar 5, 2021

I am just rendering text and UI quads.

Theres a lot of related questions here like "how much text", how many ui elements, and how deeply nested the ui is (the "flexbox" impl we currently use for layout gets really slow with deeply nested elements). But long story short: there is plenty of room for improvement in these areas (and we do plan on improving them asap). But also tangential to the current conversation. Fps isn't the problem we're having with the "empty project", overly-engaging the cpu is (at least ... im relatively certain). @nehalem501 I'm assuming if you disable vsync, you still get high framerates (in the thousands)?

But an empty project with just the default plugins shouldn't make a laptop fan spin at full speed.

Yup fans engaging on an empty app is a real indicator of problems (and is a problem in its own regard). Can't chalk that up to "measurement methodology" 😄

Also it's worth noting that the numbers for render_graph_schedule_executor_system performance were through Instruments time profiler so they are likely reasonably accurate, and a relative change of 40% of the measured CPU time to 0.6% is still significant even if the metric isn't perfect.

Thanks for pointing that out.

@nehalem501
Copy link
Author

I’ve made some additional testing with the default plugins empty app. Disabling vsync on a release build gives me around 130-150 fps, with 70-80% CPU usage. I didn't expect huge numbers, but seems low even for an integrated GPU.

Next, I’ve tried something different, with just the minimal plugins, so no window, no rendering, …, just the following code:

use bevy::prelude::*;

fn main() {
    App::build()
        .add_plugins(MinimalPlugins)
        .run();
}

This gives me around 130% CPU usage on macOS. I understand MinimalPlugins is just bevy ECS and the timing code. It would seem like there is a busy loop somewhere. This is compiled in release mode, using the latest bevy master revision when writing this comment (2e72755).

I’ve tried the same code on Linux as well, this is the result I have on a Linux machine (same results as on macOS):

$ top -i

top - 22:35:58 up  5:49,  2 users,  load average: 1,81, 2,03, 1,12
Tasks: 373 total,   2 running, 371 sleeping,   0 stopped,   0 zombie
%Cpu(s):  9,6 us,  4,3 sy,  0,0 ni, 84,2 id,  0,0 wa,  0,0 hi,  1,9 si,  0,0 st
MiB Mem :  32129,6 total,  23316,1 free,   3770,0 used,   5043,6 buff/cache
MiB Swap:      0,0 total,      0,0 free,      0,0 used.  27597,9 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                    
1112966 tomek     20   0 1092652   4964   4616 R 195,7   0,0   0:26.68 bevy_test  

@malj
Copy link

malj commented Mar 19, 2021

@cart Just a quick update re Windows, I switched to the master branch instead of 4.0 release and tested a debug build with the diagnostic plugins:

fn main() {
    App::build()
        .add_plugins(DefaultPlugins)
        .add_plugin(FrameTimeDiagnosticsPlugin::default())
        .add_plugin(LogDiagnosticsPlugin::default())
        .run();
}

The frame rate seems to be capped to the monitor's native 144Hz refresh rate:

Mar 19 12:00:50.879  INFO bevy diagnostic: frame_time                      :    0.007018s (avg 0.007111s)
Mar 19 12:00:50.880  INFO bevy diagnostic: fps                             :  123.963971  (avg 134.590363)
Mar 19 12:00:50.880  INFO bevy diagnostic: frame_count                     :   82.000000  (avg 82.000000)
Mar 19 12:00:51.882  INFO bevy diagnostic: frame_time                      :    0.006591s (avg 0.007244s)
Mar 19 12:00:51.883  INFO bevy diagnostic: fps                             :  143.368251  (avg 140.644364)
Mar 19 12:00:51.884  INFO bevy diagnostic: frame_count                     :  220.000000  (avg 220.000000)
Mar 19 12:00:52.877  INFO bevy diagnostic: frame_time                      :    0.007169s (avg 0.007804s)
Mar 19 12:00:52.878  INFO bevy diagnostic: fps                             :  135.339058  (avg 134.739928)
Mar 19 12:00:52.879  INFO bevy diagnostic: frame_count                     :  352.000000  (avg 352.000000)
Mar 19 12:00:53.882  INFO bevy diagnostic: frame_time                      :    0.006921s (avg 0.008344s)
Mar 19 12:00:53.883  INFO bevy diagnostic: fps                             :  124.564724  (avg 134.120765)
Mar 19 12:00:53.884  INFO bevy diagnostic: frame_count                     :  479.000000  (avg 479.000000)
Mar 19 12:00:54.880  INFO bevy diagnostic: frame_time                      :    0.006932s (avg 0.006883s)
Mar 19 12:00:54.881  INFO bevy diagnostic: fps                             :  142.198965  (avg 143.845727)
Mar 19 12:00:54.881  INFO bevy diagnostic: frame_count                     :  605.000000  (avg 605.000000)
Mar 19 12:00:55.882  INFO bevy diagnostic: frame_time                      :    0.006147s (avg 0.006920s)
Mar 19 12:00:55.882  INFO bevy diagnostic: fps                             :  142.362940  (avg 143.413999)
Mar 19 12:00:55.883  INFO bevy diagnostic: frame_count                     :  749.000000  (avg 749.000000)

...but the high resource usage issue seems to persist: 25-30% on both Intel i7 8750H CPU and Nvidia RTX 2060 mobile GPU. The release build seems to reduce the CPU usage to about 10-15%, but the GPU usage remains the same, which seems really a lot for an empty 1920x1080 window.

image

@nehalem501
Copy link
Author

@malj Could you try running this code on Windows, to see if you have also high CPU usage without the GPU involved ?

fn main() {
    App::build()
        .add_plugins(MinimalPlugins)
        .run();
}

I haven't looked at the GPU usage, but if you have the same problem as me on the MinimalPlugins example above, it would seem that there are two separate performance issues on CPU and GPU usage.

@malj
Copy link

malj commented Mar 19, 2021

@nehalem501 Sure, the CPU usage is about the same with minimal plugins:

image

Edit: curiously, in this case the CPU usage remains the same even in the release build.

@malj
Copy link

malj commented Mar 20, 2021

Re multithreading, this is the workload distribution across cores (the first half of each graph is while Bevy is running, the second half is after terminating the process).

Default plugins:
default_plugins

Minimal plugins:
minimal_plugins

@aloelazoe
Copy link

i ran into this issues too when running examples on macos with latest commit 3e285d5
i also noticed something that seems kinda strange, maybe it could be helpful for determining the cause of this issue. so when bevy window was on top of everything and fully visible cpu and energy usage was very high, when the window was open but fully covered with another window the usage went down almost to zeroes, but when the window was minimized or put on another desktop it used a lot, just like when it was fully visible

@DanilaMihailov
Copy link

I have the same issue, bevy 0.5 and macOS 11.2.2. Instruments Time Profiler looks the same. Using DefaultPlugins and empty app with default settings for WindowDescriptor.

@geNAZt
Copy link

geNAZt commented May 13, 2021

I also have seen some high CPU usage on windows using the many_sprite example ( with vsync disabled since i only have a 60hz monitor ) reaching ~120 FPS. Pulling a CPU profile i see that it spends much time in clearing shaders:

Screenshot 2021-05-13 160603

The example has been built with this config:

[target.x86_64-pc-windows-msvc]
linker = "rust-lld.exe"
rustflags = ["-Zshare-generics=off"]

# Enable optimizations for dependencies (incl. Bevy), but not for our code:
[profile.dev.package."*"]
opt-level = 3
debug = true

# Maybe also enable only a small amount of optimization for our code:
[profile.dev]
opt-level = 1
debug = true

Resulting in this output

fps_task_manager

@cart
Copy link
Member

cart commented May 14, 2021

@geNAZt

I consider that to be a separate issue. The many sprites example is a intended to be stressful (and we are already in the process of optimizing it). This thread is about "baseline" cpu usage when nothing is happening.

@Frank-Buss
Copy link

Is there a solution for this problem? The simple breakout example needs already up to 20% CPU usage on my PC in release mode, and a 3D game I'm developing needs about 80% CPU usage for drawing 10 boxes in 3D mode with one light. This is unusable for bigger games.

@Frank-Buss
Copy link

I calculated a flamegraph and a lot of time was spent in the systems loop. So I thought a short timeout could improve it. I added this line in stage.rs in the bevy_esc module:

thread::sleep(time::Duration::from_millis(10));

at the start of the while run_stage_loop. Then the CPU usage for my game drops from 80% to 20%. Framerate also drops from 60 Hz to 7 Hz (at least when measured as described here: https://stackoverflow.com/questions/63590141/why-does-bevy-have-trouble-drawing-my-many-spritecomponents ), but this wouldn't be that much important for my game, there are nearly no animations. Is this possible without changing the Bevy library? Or is the problem already solved in Bevy 0.6?

@bjorn3
Copy link
Contributor

bjorn3 commented Oct 31, 2021

Which system takes a lot of time? Pretty much everything is done using systems, including rendering. Of course sleeping inside a system will drop the framerate. It makes it take longer before the render system gets a chance to run.

@Frank-Buss
Copy link

I exported it to webassembly as well:
https://frank-buss.de/water-sort-0.1/
Looks like most of the time is spent in bind_vertex_array, which sounds strange, because there are not that much vertexes:
https://i.imgur.com/5yp6iLm.png
Full source code (use the "3d" branch) :
https://github.com/FrankBuss/water-sort

@roughbits01
Copy link

Just wanted to say that I'm having the same issue (~100% CPU usage with the empty app) on macOS.

@jjant
Copy link

jjant commented Dec 26, 2021

Same here, 150%+ and heating up.

@erikkrieg
Copy link

Sorry to throw another anecdote onto the fire, but I ran the 3d_scene and breakout examples compiled from both v0.5.0 and the main branch (d44c3cd at the time) and very quickly my 2020 MacBook Pro's fans were blaring and the laptop grew too hot to keep on my lap.

@lancecarlson
Copy link

Just ran the breakout app from the main branch on my macbook pro MacBook Pro (Retina, 15-inch, Mid 2015) running Monterey. I noticed the fans were going crazy and checked the activity monitor. CPU was stuck around 185-200%. Did not expect this!

@luke-biel
Copy link
Contributor

I checked that localy - ~30% utilization on release vs 200% on debug. With 100k cubes spawned I had ~100fps release vs 0.25fps debug.

For me it's all fine on release but I guess there is a room for improvement on debug.

@viktorstrate
Copy link

Enabling high optimizations for dependencies in debug mode, lowered the CPU usage substantially for me.
https://bevy-cheatbook.github.io/pitfalls/performance.html

@alice-i-cecile
Copy link
Member

Closed by #3974.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Performance A change motivated by improving speed, memory usage or compile times O-MacOS Specific to the MacOS (Apple) desktop operating system
Projects
None yet
Development

No branches or pull requests