-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breakout example crashes when limited to a single CPU #405
Comments
Breakout works for me on two cores with commit This was introduced by PR 384.
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Pentium(R) CPU G3420 @ 3.20GHz
Stepping: 3
CPU MHz: 3137.792
CPU max MHz: 3200.0000
CPU min MHz: 800.0000
BogoMIPS: 6400.69
Virtualization: VT-x
L1d cache: 64 KiB
L1i cache: 64 KiB
L2 cache: 512 KiB
L3 cache: 3 MiB
NUMA node0 CPU(s): 0,1
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT disabled
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopolo
gy nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer xsave rdrand lahf_lm abm cpuid_fault invpcid_sin
gle pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust erms invpcid xsaveopt dtherm arat pln pts md_clear flush_l1d |
I'll look at it, probably an easy fix. (It should make 3 1-thread task pools in the low-core-count case) |
For a workaround, grab the PR or change your main() temporarily to something like this: use bevy::app::DefaultTaskPoolOptions;
fn main() {
App::build()
.add_resource(DefaultTaskPoolOptions::with_num_threads(4))
.add_default_plugins() |
Ok, got to the bottom of this and have some interesting results to share. What is the root cause?There are two issues:
Why wasn't this happening with rayon?
fn main() {
rayon::ThreadPoolBuilder::default()
.num_threads(1) // <-- (1) Force only a single thread in the pool
.build_global();
let (tx, rx) = crossbeam_channel::unbounded();
rayon::scope(|scope| {
scope.spawn(|scope| {
println!("Inner scope started"); // <-- (3) This line never runs because no thread is available
tx.send("hi");
});
rx.recv(); // <-- (2) This line will execute first and block the pool's only thread
})
} While on the surface it may seem like "always put at least 2 thread in the pool" would solve the problem, this is not a general solution. If you have N tasks blocking on a resource at the same time, you must have at least N+1 threads in the pool. What to do about this?Two possible solutions:
As a game engine, we have strict latency requirements and we know our workload, so we should really be shooting for option 2. In short, this problem exists with both schedulers, but the new scheduler permits the use of await as it is async aware, allowing us to properly solve it. Near term, we need to replace blocking code in parallel_executor.rs with a non-blocking alternative (i.e. await). Long term, end users will need to be careful to avoid having systems blocking on other systems. |
I think a solution for the "spawn work and wait for completion" case is to let the spawning task also do work:
|
Stjepan also recently deprecated The changes are very minimal: Ticker is removed, and Executor now has async functions for ticking and running tasks. I think the async functions allow to compose Executors, which is kinda neat, and could allow the spawning/waiting thread to tick the executor while waiting to ensure progress for a task pool. |
I think (2) is reasonable given that as a game engine we have pretty tight control over the execution environment. We can (and usually should) provide whatever abstractions people need to interact with the outside world. Ex: networking, asset management, file-io. That being said, we might also want to consider @kabergstrom's approach. Would that give us a nice way to support single-threaded environments? Ex: "main" thread can do work, we spawn zero threads for each thread pool, and the main thread cranks through all of the work? |
It should be worth trying to replace the blocking A more long-term solution would be to experiment with fully async schedulers - spawn the entire stage's worth of systems, with system tasks awaiting their borrow acquisition and dependencies. @aclysma has prototyped something similar in an unrelated project, so it's not outright impossible. |
I agree, a more polished version of the scheduler would allow the caller to be a worker thread. This would avoid having to ship workload from one thread to another - and that has definite benefits. That said, even if the calling thread was allowed to be a worker (as it does with rayon) if the rayon task pool is configured for 1 worker, the expected behavior would be just one worker (in this case the calling thread only - as @cart described.) This would still deadlock, just as rayon deadlocks in this case (see example code above).
While I mentioned those prototypes in discord, at this point I'm not recommending them as a good path.
I have been thinking about this more.. I personally haven't seen anyone deadlock an ECS before and most of them run on rayon - which we can see is clearly possible to deadlock with the example code. So I've been asking myself, why does this not happen more often? I think the answer is, writing code where one system blocks on another system completing is a very unusual thing to do. Generally speaking people are relying on the ECS to schedule their systems - which means systems just do their work and return without having to deal with synchronization primitives themselves. So at this point I think we need to focus on making ParallelExecutor not block. |
Hence "experiment". It would be necessary to come up with something a lot less naive than what I sketched out, an approach that takes the concerns you listed into consideration. We'll either land close to where we are now, rendering the idea moot, or stumble onto interesting API and/or performance gains.
It is indeed, to the point where I haven't seen that in the wild, ever; neither it is the issue here.
What are your thoughts on making the scheduling into a non-blocking task? |
Still looking at options, I've considered a few:
One of the reasons I like (3) is that it reminds me of Naughty Dog's GDC talk on fibers. Essentially they use a counter that can be awaited to hit 0. I think this maps well to systems depending on other systems to run. https://www.gdcvault.com/play/1022186/Parallelizing-the-Naughty-Dog-Engine All of these are greedily executing a DAG of tasks and have the same potential issues I mentioned above. I'm thinking now that bevy's original solution probably had them too so we probably won't be any worse off than we were. |
|
- Use saturating_sub to avoid overflow in core assignment to task pools - Temporarily force 4 minimum threads to avoid examples stalling
When messing around with Bevy I tried to run the breakout example on a single core. In debug mode this crashes with an subtraction overflow.
Specifically:
While the crash only happens on master and not with the stable release (which makes me suspect it is related to the recent task queue replacing rayon), the example simply hangs on 2 or 3 cores as well.
The text was updated successfully, but these errors were encountered: