Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS CPU usage increased by 8x in v0.4.4 #470

Closed
brettchalupa opened this issue Aug 4, 2024 · 9 comments
Closed

MacOS CPU usage increased by 8x in v0.4.4 #470

brettchalupa opened this issue Aug 4, 2024 · 9 comments

Comments

@brettchalupa
Copy link

brettchalupa commented Aug 4, 2024

I noticed running a simple Macroquad example on MacOS 14.5 with an Apple M2 Pro chip was using a high amount of CPU. I checked out some previous versions of Macroquad and saw the performance was much better in older versions. Using git bisect, I landed on not-fl3/macroquad@93b8af2 being the introduction of the high CPU usage, which contains a patch level upgrade for miniquad from 0.4.2 to 0.4.3 (which was yanked) and macroquad_macro from 0.1.7 to 0.1.8.

So I ran a git bisect on Miniquad, using cargo run --release --example triangle as my test for CPU usage. Some examples:

  • v0.4.0, 36dd9d9 - CPU @ 11% for cargo run --release --example triangle
  • v0.4.5, a962bfd - CPI @ 80% for cargo run --release --example triangle

Through the bisect, I landed on and confirmed that this commit 833799e increases CPU load on MacOS by about 8x. First noticeable with the v0.4.4 release of Miniquad.

Repro steps

  1. Clone the repo
  2. Checkout the commit just before: git checkout 833799ec32883cdd7465a5f8ddc80e2203dbcbc4^1
  3. Run cargo run --release --example triangle to see CPU performance baseline
  4. Checkout the high CPU usage commit: git checkout 833799ec32883cdd7465a5f8ddc80e2203dbcbc4
  5. Run cargo run --release --example triangle to see CPU performance degrade

Additional info

Happy to help test or debug, especially if access to MacOS is limited. @birhburh pinging you here too since you authored the commit. Let me know if I can be supportive in any way. Thanks!

@brettchalupa brettchalupa changed the title MacOS got 8x slower in v0.4.4 MacOS CPU usage increased by 8x in v0.4.4 Aug 4, 2024
@birhburh
Copy link
Contributor

birhburh commented Aug 4, 2024

@brettchalupa, Thanks!
Yes, checked and this is also true for me on Intel
I'll look into this

@VanjaRo
Copy link

VanjaRo commented Aug 8, 2024

It seems that changin the config field blocking_event_loopto true for the macos fixes the problem. Maybe run method of the NSApplication class was utilizing the same mecanism under the hood. Or maybe not. As for the dynamic quad example the problem persists where the blocking... config value is obviously not the solution.

@VanjaRo
Copy link

VanjaRo commented Aug 8, 2024

Another guess that appeared while searching for a similar even_loop is potentially unlimited fps for the application which leeds to many useless iterations.

@birhburh
Copy link
Contributor

birhburh commented Aug 9, 2024

@VanjaRo, thanks for the link!
I look into this approach if my solution will not work
Now I'm trying to implement own nsview with opengl support as it was done in glfw
It uses flushBuffer method that should use less cpu, because it syncs to screen refresh rate
https://developer.apple.com/documentation/appkit/nsopenglcontext/1436211-flushbuffer?language=objc
But I didn't spent much time during week
And somehow basic test with this approach redraws only background and not geometry, so I'm still debugging

@birhburh
Copy link
Contributor

https://github.com/birhburh/miniquad/tree/macos_prototype
Everything should work now with opengl/metal backend: low cpu usage as before, even resize
If you can, please test it on arm apple laptops ;-)
Not making PR yet though
Need to fix metal backend working with macroquad (again draws quarter of image)
And also test blocking event loop now
And do some cleanup

birhburh added a commit to birhburh/miniquad that referenced this issue Aug 11, 2024
- Reverted macos.rs to this implementation:
not-fl3#443 (it's probably better compare changes with this PR than with latest commit to understand what was added additionally)
- Now using own NSView with NSOpenGLContext instead of NSOpenGLView
- For metal backend using redraw instead setNeedsDisplay, because somehow it reduces cpu usage (Cannot find info about enabling vsync like this in MTKView docs)
- Fixed freezing on resize by drawing in draw_rect that called during "live resize"™. I don't like this approach, but it blocks main event loop while resizing, so it will not be some kind of concurency of opengl stuff i think
- Reducing CPU usage when window is occluded
- Added comments to hacky places, there are lots of them imo

Fixes:
- not-fl3#455
- not-fl3#470
not-fl3 pushed a commit that referenced this issue Aug 11, 2024
- Reverted macos.rs to this implementation:
#443 (it's probably better compare changes with this PR than with latest commit to understand what was added additionally)
- Now using own NSView with NSOpenGLContext instead of NSOpenGLView
- For metal backend using redraw instead setNeedsDisplay, because somehow it reduces cpu usage (Cannot find info about enabling vsync like this in MTKView docs)
- Fixed freezing on resize by drawing in draw_rect that called during "live resize"™. I don't like this approach, but it blocks main event loop while resizing, so it will not be some kind of concurency of opengl stuff i think
- Reducing CPU usage when window is occluded
- Added comments to hacky places, there are lots of them imo

Fixes:
- #455
- #470
@brettchalupa
Copy link
Author

brettchalupa commented Aug 12, 2024

@birhburh I tested out your macos_prototype branch on your fork on my M2 Pro chip, and when I run cargo run --release --example triangle there's a segfault crash:

Translated Report from crash
-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process:               triangle [47308]
Path:                  /Users/USER/*/triangle
Identifier:            triangle
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        zsh [47022]
User ID:               501

Date/Time:             2024-08-12 10:37:38.6855 -0400
OS Version:            macOS 14.5 (23F79)
Report Version:        12
Anonymous UUID:        A67C31B0-B597-DA45-EC6F-2A73A4792CC2

Sleep/Wake UUID:       B9E3C94B-1877-4E66-80F8-7C52E1748C9E

Time Awake Since Boot: 2600000 seconds
Time Since Wake:       6974 seconds

System Integrity Protection: enabled

Crashed Thread:        0  main  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000001
Exception Codes:       0x0000000000000001, 0x0000000000000001

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [47308]

VM Region Info: 0x1 is not in any region.  Bytes before following region: 4376936447
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      104e2c000-104e88000    [  368K] r-x/r-x SM=COW  /Users/USER/*/triangle

Thread 0 Crashed:: main Dispatch queue: com.apple.main-thread
0   libobjc.A.dylib               	       0x184eb7fb4 objc_retain + 8
1   Foundation                    	       0x1864be0f0 -[NSCFTimer initWithFireDate:interval:target:selector:userInfo:repeats:] + 184
2   Foundation                    	       0x1864bdf04 +[NSTimer(NSTimer) timerWithTimeInterval:target:selector:userInfo:repeats:] + 104
3   triangle                      	       0x104e32e8c 0x104e2c000 + 28300
4   triangle                      	       0x104e30550 0x104e2c000 + 17744
5   triangle                      	       0x104e34d38 0x104e2c000 + 36152
6   dyld                          	       0x184f060e0 start + 2360

Thread 1:
0   libsystem_pthread.dylib       	       0x185289d20 start_wqthread + 0

Thread 2:
0   libsystem_pthread.dylib       	       0x185289d20 start_wqthread + 0


Thread 0 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000001   x1: 0x1800600002199d17   x2: 0x0000000000000020   x3: 0x0000000000000001
    x4: 0x0000000000000005   x5: 0x0000000020200000   x6: 0x0000000000000001   x7: 0x0000000000000960
    x8: 0x0000000186def000   x9: 0x0000600002f88000  x10: 0x0000000000000700  x11: 0x0000000000000020
   x12: 0x0000000000000001  x13: 0x00000000fffffe38  x14: 0x00000000000007fb  x15: 0x00000000a00e3ffb
   x16: 0x0000000184eb7fac  x17: 0x00000001eebf13b0  x18: 0x0000000000000000  x19: 0x0000000000000001
   x20: 0x0000000000000001  x21: 0x00000001e59f6690  x22: 0x509b000129127650  x23: 0xb8c37c692394fd82
   x24: 0x0000600002f88700  x25: 0x0000000000000000  x26: 0x0000000000000001  x27: 0x000000016afcd048
   x28: 0x0000600002fa87e8   fp: 0x000000016afc7ba0   lr: 0x00000001864be0f0
    sp: 0x000000016afc7b30   pc: 0x0000000184eb7fb4 cpsr: 0x00001000
   far: 0x0000000000000001  esr: 0x92000006 (Data Abort) byte read Translation fault

Binary Images:
       0x107cec000 -        0x107d57fff com.apple.AppleMetalOpenGLRenderer (1.0) <c032830a-fbed-356d-a4a7-acd24492f336> /System/Library/Extensions/AppleMetalOpenGLRenderer.bundle/Contents/MacOS/AppleMetalOpenGLRenderer
       0x1051bc000 -        0x1051c7fff libobjc-trampolines.dylib (*) <9381bd6d-84a5-3c72-b3b8-88428afa4782> /usr/lib/libobjc-trampolines.dylib
       0x104e2c000 -        0x104e87fff triangle (*) <bc583cd3-531c-3b5d-af2d-5121dd44c0eb> /Users/USER/*/triangle
       0x184eb0000 -        0x184effd83 libobjc.A.dylib (*) <b326b2c3-1069-3d17-b49d-9dcb24efec6f> /usr/lib/libobjc.A.dylib
       0x186445000 -        0x1870a2fff com.apple.Foundation (6.9) <99e0292d-7873-3968-9c9c-5955638689a5> /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
       0x184f00000 -        0x184f88a17 dyld (*) <37bbc384-0755-31c7-a808-0ed49e44dd8e> /usr/lib/dyld
               0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ???
       0x185288000 -        0x185294fff libsystem_pthread.dylib (*) <386b0fc1-7873-3328-8e71-43269fd1b2c7> /usr/lib/system/libsystem_pthread.dylib

External Modification Summary:
  Calls made by other processes targeting this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by this process:
    task_for_pid: 0
    thread_create: 0
    thread_set_state: 0
  Calls made by all processes on this machine:
    task_for_pid: 19
    thread_create: 0
    thread_set_state: 7

VM Region Summary:
ReadOnly portion of Libraries: Total=930.4M resident=0K(0%) swapped_out_or_unallocated=930.4M(100%)
Writable regions: Total=1.1G written=0K(0%) resident=0K(0%) swapped_out=0K(0%) unallocated=1.1G(100%)

                                VIRTUAL   REGION 
REGION TYPE                        SIZE    COUNT (non-coalesced) 
===========                     =======  ======= 
Accelerate framework               256K        2 
Activity Tracing                   256K        1 
CG image                           144K        1 
ColorSync                          576K       28 
CoreAnimation                       16K        1 
CoreGraphics                        16K        1 
Foundation                          16K        1 
Kernel Alloc Once                   32K        1 
MALLOC                             1.1G       47 
MALLOC guard page                  192K       12 
STACK GUARD                         32K        2 
Stack                             9248K        3 
Stack Guard                       56.0M        1 
VM_ALLOCATE                        560K       13 
__AUTH                            1119K      229 
__AUTH_CONST                      18.4M      396 
__CTF                               824        1 
__DATA                            6148K      384 
__DATA_CONST                      20.4M      401 
__DATA_DIRTY                      1079K      136 
__FONT_DATA                        2352        1 
__GLSLBUILTINS                    5174K        1 
__LINKEDIT                       533.1M        4 
__OBJC_RO                         71.9M        1 
__OBJC_RW                         2199K        1 
__TEXT                           397.3M      414 
dyld private memory                272K        2 
mapped file                       75.2M       17 
shared memory                      864K       14 
===========                     =======  ======= 
TOTAL                              2.3G     2116 

Let me know if there's more useful info to provide.

Edit: also seeing a segfault on the latest master in miniquad with this PR merging #475 (comment) (commit: 30b4e17ece36d93988e65d8e57227c21f62b4002)

@brettchalupa
Copy link
Author

@birhburh given your PRs have addressed things, is this good to close or is there more to be done here? Awesome work!

@birhburh
Copy link
Contributor

@brettchalupa, thanks, i think it can be closed

@not-fl3
Copy link
Owner

not-fl3 commented Aug 13, 2024

🎉

Great job @birhburh !

@not-fl3 not-fl3 closed this as completed Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants