-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Monomorphize various things #1914
Conversation
crates/bevy_app/src/app.rs
Outdated
} | ||
|
||
// Outlined to ensure that a lot of drop glue is codegened inside bevy_app instead of the game. | ||
unsafe fn drop_app(app: &mut App) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a // SAFE
comment explaining why this is safe to use.
It's pretty obvious why it is safe, but I'd argue that every use of unsafe
needs to have a // SAFETY
comment to go along with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically the format is:
/// # Safety
/// <explination>
unsafe fn drop_app(app: &mut App) {}
crates/bevy_app/src/app.rs
Outdated
pub world: World, | ||
pub runner: Box<dyn Fn(App)>, | ||
pub schedule: Schedule, | ||
pub world: ManuallyDrop<World>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little conflicted here.
While utilizing ManuallyDrop<T>
helps reduce the binary size, I don't know if it's worth the extra complexity.
I'm not completely against using it, I just feel it's a little counter-intuitive and unnecessary for the only benefit being a smaller executable.
(Again, we need to consider a space vs performance trade off)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha my comment is related but you submitted while I was still typing it :)
crates/bevy_app/src/app.rs
Outdated
pub world: World, | ||
pub runner: Box<dyn Fn(App)>, | ||
pub schedule: Schedule, | ||
pub world: ManuallyDrop<World>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a drop in code clarity / friendliness for a relatively "core" type that I expect people to look at on a regular basis (and also an increase in unsafe).
Can we discuss a bit what this ~17k wins us? If this were optimizing for web/wasm deployments (which have tight network constraints / lots of people that care about load times), that would make more sense to me. But this is specifically optimizing dynamically linked outputs, which generally won't be published.
This change also only affects "top level" apps, so this won't come into play for "dynamically linked plugins" if we decide to go that route in the future.
So I think the intent here is to optimize "iterative compile times" when using --features dynamic
. If so, it would be good to measure how much we're winning here and weight that against the negatives pointed out above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the intent here is to optimize "iterative compile times" when using
--features dynamic
.
Mostly correct, though it should also save some time without --features dynamic
too. Before this PR the drop glue would be generated both in the game (for unwinding on panics) and in bevy_app (for dropping at the end of AppBuilder::run
). After this PR it is only generated in bevy_app.
be6ed55
to
41d9335
Compare
41d9335
to
e5ba919
Compare
e5ba919
to
39d71e2
Compare
I have dropped the drop glue commit and will instead wait for rust-lang/rust#84175. 39d71e2 (Use a function instead of a closure for the add_system_to_stage error path) is still part of this PR. |
bors r+ |
Based on #1910 This shrinks breakout from 310k to 293k. Most of the win is in outlining the drop glue of `App`. The other two commits save about 800 bytes total when using two empty systems and two simple resources. After this PR the full disassembly for ```rust fn main() { App::build().run(); } ``` is about as minimal as it gets, so pretty much all other costs scale linear in the amount of resources, systems, etc. ```asm 0000000000001100 <_ZN4core3ptr54drop_in_place$LT$bevy_app..app_builder..AppBuilder$GT$17h76850422c20653deE>: 1100: ff 25 52 21 00 00 jmpq *0x2152(%rip) # 3258 <_ZN60_$LT$bevy_app..app..App$u20$as$u20$core..ops..drop..Drop$GT$4drop17h67d177ae549d917bE@Base> 1106: cc int3 1107: cc int3 1108: cc int3 1109: cc int3 110a: cc int3 110b: cc int3 110c: cc int3 110d: cc int3 110e: cc int3 110f: cc int3 0000000000001110 <_ZN8breakout4main17h7cbe07b319de1042E>: 1110: 53 push %rbx 1111: 48 81 ec 00 03 00 00 sub $0x300,%rsp 1118: 48 8d 5c 24 08 lea 0x8(%rsp),%rbx 111d: 48 89 df mov %rbx,%rdi 1120: ff 15 3a 21 00 00 callq *0x213a(%rip) # 3260 <_ZN8bevy_app3app3App5build17h8b0ea6be9050d6ccE@Base> 1126: 48 89 df mov %rbx,%rdi 1129: ff 15 39 21 00 00 callq *0x2139(%rip) # 3268 <_ZN8bevy_app11app_builder10AppBuilder3run17hfc8cf50692acdbdeE@Base> 112f: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 1134: ff 15 1e 21 00 00 callq *0x211e(%rip) # 3258 <_ZN60_$LT$bevy_app..app..App$u20$as$u20$core..ops..drop..Drop$GT$4drop17h67d177ae549d917bE@Base> 113a: 48 81 c4 00 03 00 00 add $0x300,%rsp 1141: 5b pop %rbx 1142: c3 retq 1143: 48 89 c3 mov %rax,%rbx 1146: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 114b: e8 b0 ff ff ff callq 1100 <_ZN4core3ptr54drop_in_place$LT$bevy_app..app_builder..AppBuilder$GT$17h76850422c20653deE> 1150: 48 89 df mov %rbx,%rdi 1153: e8 18 01 00 00 callq 1270 <_Unwind_Resume@plt> 1158: 0f 0b ud2 115a: cc int3 115b: cc int3 115c: cc int3 115d: cc int3 115e: cc int3 115f: cc int3 0000000000001160 <main>: 1160: 48 83 ec 08 sub $0x8,%rsp 1164: 48 89 f1 mov %rsi,%rcx 1167: 48 63 d7 movslq %edi,%rdx 116a: 48 8d 05 9f ff ff ff lea -0x61(%rip),%rax # 1110 <_ZN8breakout4main17h7cbe07b319de1042E> 1171: 48 89 04 24 mov %rax,(%rsp) 1175: 48 8d 35 94 1e 00 00 lea 0x1e94(%rip),%rsi # 3010 <__init_array_end> 117c: 48 89 e7 mov %rsp,%rdi 117f: ff 15 eb 20 00 00 callq *0x20eb(%rip) # 3270 <_ZN3std2rt19lang_start_internal17he77194431b0ee4a2E@Base> 1185: 59 pop %rcx 1186: c3 retq 1187: cc int3 1188: cc int3 1189: cc int3 118a: cc int3 118b: cc int3 118c: cc int3 118d: cc int3 118e: cc int3 118f: cc int3 0000000000001190 <_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h83a5b8d55f23dff8E.llvm.909376793398482062>: 1190: 48 83 ec 08 sub $0x8,%rsp 1194: 48 8b 3f mov (%rdi),%rdi 1197: e8 54 ff ff ff callq 10f0 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h6e238af75680eb28E> 119c: 31 c0 xor %eax,%eax 119e: 59 pop %rcx 119f: c3 retq 00000000000011a0 <_ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17hb05d591cd29dea4fE.llvm.909376793398482062>: 11a0: 48 83 ec 08 sub $0x8,%rsp 11a4: 48 8b 3f mov (%rdi),%rdi 11a7: e8 44 ff ff ff callq 10f0 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h6e238af75680eb28E> 11ac: 31 c0 xor %eax,%eax 11ae: 59 pop %rcx 11af: c3 retq 00000000000011b0 <_ZN4core3ptr85drop_in_place$LT$std..rt..lang_start$LT$$LP$$RP$$GT$..$u7b$$u7b$closure$u7d$$u7d$$GT$17he9aeeba375093b99E.llvm.909376793398482062>: 11b0: c3 retq 11b1: cc int3 11b2: cc int3 11b3: cc int3 11b4: cc int3 11b5: cc int3 11b6: cc int3 11b7: cc int3 11b8: cc int3 11b9: cc int3 11ba: cc int3 11bb: cc int3 11bc: cc int3 11bd: cc int3 11be: cc int3 11bf: cc int3 ```
Pull request successfully merged into main. Build succeeded: |
Based on bevyengine#1910 This shrinks breakout from 310k to 293k. Most of the win is in outlining the drop glue of `App`. The other two commits save about 800 bytes total when using two empty systems and two simple resources. After this PR the full disassembly for ```rust fn main() { App::build().run(); } ``` is about as minimal as it gets, so pretty much all other costs scale linear in the amount of resources, systems, etc. ```asm 0000000000001100 <_ZN4core3ptr54drop_in_place$LT$bevy_app..app_builder..AppBuilder$GT$17h76850422c20653deE>: 1100: ff 25 52 21 00 00 jmpq *0x2152(%rip) # 3258 <_ZN60_$LT$bevy_app..app..App$u20$as$u20$core..ops..drop..Drop$GT$4drop17h67d177ae549d917bE@Base> 1106: cc int3 1107: cc int3 1108: cc int3 1109: cc int3 110a: cc int3 110b: cc int3 110c: cc int3 110d: cc int3 110e: cc int3 110f: cc int3 0000000000001110 <_ZN8breakout4main17h7cbe07b319de1042E>: 1110: 53 push %rbx 1111: 48 81 ec 00 03 00 00 sub $0x300,%rsp 1118: 48 8d 5c 24 08 lea 0x8(%rsp),%rbx 111d: 48 89 df mov %rbx,%rdi 1120: ff 15 3a 21 00 00 callq *0x213a(%rip) # 3260 <_ZN8bevy_app3app3App5build17h8b0ea6be9050d6ccE@Base> 1126: 48 89 df mov %rbx,%rdi 1129: ff 15 39 21 00 00 callq *0x2139(%rip) # 3268 <_ZN8bevy_app11app_builder10AppBuilder3run17hfc8cf50692acdbdeE@Base> 112f: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 1134: ff 15 1e 21 00 00 callq *0x211e(%rip) # 3258 <_ZN60_$LT$bevy_app..app..App$u20$as$u20$core..ops..drop..Drop$GT$4drop17h67d177ae549d917bE@Base> 113a: 48 81 c4 00 03 00 00 add $0x300,%rsp 1141: 5b pop %rbx 1142: c3 retq 1143: 48 89 c3 mov %rax,%rbx 1146: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 114b: e8 b0 ff ff ff callq 1100 <_ZN4core3ptr54drop_in_place$LT$bevy_app..app_builder..AppBuilder$GT$17h76850422c20653deE> 1150: 48 89 df mov %rbx,%rdi 1153: e8 18 01 00 00 callq 1270 <_Unwind_Resume@plt> 1158: 0f 0b ud2 115a: cc int3 115b: cc int3 115c: cc int3 115d: cc int3 115e: cc int3 115f: cc int3 0000000000001160 <main>: 1160: 48 83 ec 08 sub $0x8,%rsp 1164: 48 89 f1 mov %rsi,%rcx 1167: 48 63 d7 movslq %edi,%rdx 116a: 48 8d 05 9f ff ff ff lea -0x61(%rip),%rax # 1110 <_ZN8breakout4main17h7cbe07b319de1042E> 1171: 48 89 04 24 mov %rax,(%rsp) 1175: 48 8d 35 94 1e 00 00 lea 0x1e94(%rip),%rsi # 3010 <__init_array_end> 117c: 48 89 e7 mov %rsp,%rdi 117f: ff 15 eb 20 00 00 callq *0x20eb(%rip) # 3270 <_ZN3std2rt19lang_start_internal17he77194431b0ee4a2E@Base> 1185: 59 pop %rcx 1186: c3 retq 1187: cc int3 1188: cc int3 1189: cc int3 118a: cc int3 118b: cc int3 118c: cc int3 118d: cc int3 118e: cc int3 118f: cc int3 0000000000001190 <_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h83a5b8d55f23dff8E.llvm.909376793398482062>: 1190: 48 83 ec 08 sub $0x8,%rsp 1194: 48 8b 3f mov (%rdi),%rdi 1197: e8 54 ff ff ff callq 10f0 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h6e238af75680eb28E> 119c: 31 c0 xor %eax,%eax 119e: 59 pop %rcx 119f: c3 retq 00000000000011a0 <_ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17hb05d591cd29dea4fE.llvm.909376793398482062>: 11a0: 48 83 ec 08 sub $0x8,%rsp 11a4: 48 8b 3f mov (%rdi),%rdi 11a7: e8 44 ff ff ff callq 10f0 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h6e238af75680eb28E> 11ac: 31 c0 xor %eax,%eax 11ae: 59 pop %rcx 11af: c3 retq 00000000000011b0 <_ZN4core3ptr85drop_in_place$LT$std..rt..lang_start$LT$$LP$$RP$$GT$..$u7b$$u7b$closure$u7d$$u7d$$GT$17he9aeeba375093b99E.llvm.909376793398482062>: 11b0: c3 retq 11b1: cc int3 11b2: cc int3 11b3: cc int3 11b4: cc int3 11b5: cc int3 11b6: cc int3 11b7: cc int3 11b8: cc int3 11b9: cc int3 11ba: cc int3 11bb: cc int3 11bc: cc int3 11bd: cc int3 11be: cc int3 11bf: cc int3 ```
Based on bevyengine#1910 This shrinks breakout from 310k to 293k. Most of the win is in outlining the drop glue of `App`. The other two commits save about 800 bytes total when using two empty systems and two simple resources. After this PR the full disassembly for ```rust fn main() { App::build().run(); } ``` is about as minimal as it gets, so pretty much all other costs scale linear in the amount of resources, systems, etc. ```asm 0000000000001100 <_ZN4core3ptr54drop_in_place$LT$bevy_app..app_builder..AppBuilder$GT$17h76850422c20653deE>: 1100: ff 25 52 21 00 00 jmpq *0x2152(%rip) # 3258 <_ZN60_$LT$bevy_app..app..App$u20$as$u20$core..ops..drop..Drop$GT$4drop17h67d177ae549d917bE@Base> 1106: cc int3 1107: cc int3 1108: cc int3 1109: cc int3 110a: cc int3 110b: cc int3 110c: cc int3 110d: cc int3 110e: cc int3 110f: cc int3 0000000000001110 <_ZN8breakout4main17h7cbe07b319de1042E>: 1110: 53 push %rbx 1111: 48 81 ec 00 03 00 00 sub $0x300,%rsp 1118: 48 8d 5c 24 08 lea 0x8(%rsp),%rbx 111d: 48 89 df mov %rbx,%rdi 1120: ff 15 3a 21 00 00 callq *0x213a(%rip) # 3260 <_ZN8bevy_app3app3App5build17h8b0ea6be9050d6ccE@Base> 1126: 48 89 df mov %rbx,%rdi 1129: ff 15 39 21 00 00 callq *0x2139(%rip) # 3268 <_ZN8bevy_app11app_builder10AppBuilder3run17hfc8cf50692acdbdeE@Base> 112f: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 1134: ff 15 1e 21 00 00 callq *0x211e(%rip) # 3258 <_ZN60_$LT$bevy_app..app..App$u20$as$u20$core..ops..drop..Drop$GT$4drop17h67d177ae549d917bE@Base> 113a: 48 81 c4 00 03 00 00 add $0x300,%rsp 1141: 5b pop %rbx 1142: c3 retq 1143: 48 89 c3 mov %rax,%rbx 1146: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi 114b: e8 b0 ff ff ff callq 1100 <_ZN4core3ptr54drop_in_place$LT$bevy_app..app_builder..AppBuilder$GT$17h76850422c20653deE> 1150: 48 89 df mov %rbx,%rdi 1153: e8 18 01 00 00 callq 1270 <_Unwind_Resume@plt> 1158: 0f 0b ud2 115a: cc int3 115b: cc int3 115c: cc int3 115d: cc int3 115e: cc int3 115f: cc int3 0000000000001160 <main>: 1160: 48 83 ec 08 sub $0x8,%rsp 1164: 48 89 f1 mov %rsi,%rcx 1167: 48 63 d7 movslq %edi,%rdx 116a: 48 8d 05 9f ff ff ff lea -0x61(%rip),%rax # 1110 <_ZN8breakout4main17h7cbe07b319de1042E> 1171: 48 89 04 24 mov %rax,(%rsp) 1175: 48 8d 35 94 1e 00 00 lea 0x1e94(%rip),%rsi # 3010 <__init_array_end> 117c: 48 89 e7 mov %rsp,%rdi 117f: ff 15 eb 20 00 00 callq *0x20eb(%rip) # 3270 <_ZN3std2rt19lang_start_internal17he77194431b0ee4a2E@Base> 1185: 59 pop %rcx 1186: c3 retq 1187: cc int3 1188: cc int3 1189: cc int3 118a: cc int3 118b: cc int3 118c: cc int3 118d: cc int3 118e: cc int3 118f: cc int3 0000000000001190 <_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h83a5b8d55f23dff8E.llvm.909376793398482062>: 1190: 48 83 ec 08 sub $0x8,%rsp 1194: 48 8b 3f mov (%rdi),%rdi 1197: e8 54 ff ff ff callq 10f0 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h6e238af75680eb28E> 119c: 31 c0 xor %eax,%eax 119e: 59 pop %rcx 119f: c3 retq 00000000000011a0 <_ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17hb05d591cd29dea4fE.llvm.909376793398482062>: 11a0: 48 83 ec 08 sub $0x8,%rsp 11a4: 48 8b 3f mov (%rdi),%rdi 11a7: e8 44 ff ff ff callq 10f0 <_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h6e238af75680eb28E> 11ac: 31 c0 xor %eax,%eax 11ae: 59 pop %rcx 11af: c3 retq 00000000000011b0 <_ZN4core3ptr85drop_in_place$LT$std..rt..lang_start$LT$$LP$$RP$$GT$..$u7b$$u7b$closure$u7d$$u7d$$GT$17he9aeeba375093b99E.llvm.909376793398482062>: 11b0: c3 retq 11b1: cc int3 11b2: cc int3 11b3: cc int3 11b4: cc int3 11b5: cc int3 11b6: cc int3 11b7: cc int3 11b8: cc int3 11b9: cc int3 11ba: cc int3 11bb: cc int3 11bc: cc int3 11bd: cc int3 11be: cc int3 11bf: cc int3 ```
Based on #1910
This shrinks breakout from 310k to 293k. Most of the win is in outlining the drop glue of
App
. The other two commits save about 800 bytes total when using two empty systems and two simple resources.After this PR the full disassembly for
is about as minimal as it gets, so pretty much all other costs scale linear in the amount of resources, systems, etc.