Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task System for Bevy #384

Merged
merged 18 commits into from
Aug 29, 2020
Merged

Task System for Bevy #384

merged 18 commits into from
Aug 29, 2020

Conversation

lachlansneff
Copy link
Contributor

@lachlansneff lachlansneff commented Aug 28, 2020

Currently bevy depends on rayon for multithreaded dispatching of systems. This PR replaces rayon with a custom designed task system that consists of several threaded "TaskPools."

Why replace Rayon

  • Rayon has long-standing performance issues with smaller workloads on machines with more than a few cores. (cpu usage #111)
  • Rayon is not async-friendly
  • Rayon is somewhat of a closed box, it owns the threads and it would be more difficult to upstream changes that are tuned for games as it’s a general purpose library
  • Rayon has a lot more in it than we really need. Our alternative has fewer dependencies, compiles faster, and is substantially less code

Bevy Tasks

bevy_tasks does three things:

  • Fire-and-forget `static tasks
  • Fork-join for non-static tasks
  • Chunked parallel iteration of slices

These are enough to replace all current rayon functionality within bevy, as well as to provide additional functionality that can partially supplant parts of rayon's parallel iteration support.

Exposed API

Future work:

  • Continue distributing more work onto task pools (e.g. asset handlers).
  • More optimization of cpu usage.

Resolves #318

lachlansneff and others added 14 commits August 23, 2020 16:30
Move the github.com/lachlansneff/small-tasks prototype
into bevy/crates/bevy_tasks. This is intended to replace
rayon to provide better idling behavior.

Initial commit

Make scope return a vector of results

Add par_chunk_map(_mut) and par_splat_map(_mut) apis onto slices

Make calling thread do work until complete

Remove current-thread doing work

Ignore clion/intellij project files

Remove dependency on futures-util

Set up a builder for constructing the TaskPool with customizable thread count, stack size, and thread name. Add a couple examples to demonstrate busy and idle workloads.

More detail on thread name doc comments and static build() function on TaskPool

Update to README (add description, status, license)

Reflect rayon's ThreadPoolBuilder api in TaskPoolBuilder

Co-authored-by: Philip Degarmo <aclysma@gmail.com>
Add test of global pool

Formatting
…or (#1)

Add a task newtype so that we don't expose multitask::Task
Remove some of the README/license info that was present in the prototype repo
Add some doc comments
* Remove generics from TaskPool newtypes and some of the task API
Add IO, AsyncCompute and Compute TaskPools
Move TaskPool setup from bevy_ecs to bevy_app
ParallelExecutorOptions is essentially replaced by DefaultTaskPoolOptions

* Pull TaskPool and related types out of bevy_tasks/lib.rs into a separate module
Add a prelude to bevy_tasks
Update the version of bevy_tasks to match other crates

* Assert percent of cores >= 0
@bgourlie
Copy link
Contributor

bgourlie commented Aug 28, 2020

Our alternative... compiles faster

Nice! Any idea by how much?

@cart
Copy link
Member

cart commented Aug 28, 2020

Just to crudely illustrate how much of an improvement this is, when I run the breakout example I get the following cpu usage (average across cores):

rayon 1.3: ~50%
rayon 1.4: ~40%
bevy_tasks: ~20%

@cart
Copy link
Member

cart commented Aug 28, 2020

This also cuts our dependency count by 7 and significantly decreases compile times.

For example, before breakout clean-compiled in release mode on my machine in 2 minutes 3 seconds

Now it compiles in 1 minute 34 seconds!

@cart
Copy link
Member

cart commented Aug 28, 2020

Actually I ran a few more tests and I dont think the difference is quite that stark. Probably closer to 5-10 seconds.

@kabergstrom
Copy link

Great stuff!
I think it would be very cool to be able to spawn tasks on a thread-local executor too. Sometimes you have things that are !Send, and you can then still use them in an async context by wrapping them in an Rc or something. It should be quite easy to support:

  • Create a thread-local var like this:
thread_local! {
    static EXECUTOR: RefCell<LocalExecutor> = RefCell::new(LocalExecutor::new());
}

To protect from people trying to spawn_local on a non-worker thread, may want the thread_local to be an Option<RefCell<LocalExecutor>>?

Copy link
Member

@cart cart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work. A big thanks to everyone involved!

@@ -13,6 +13,8 @@ keywords = ["bevy"]
# bevy
bevy_derive = { path = "../bevy_derive", version = "0.1" }
bevy_ecs = { path = "../bevy_ecs", version = "0.1" }
bevy_tasks = { path = "../bevy_tasks" }
num_cpus = "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bevy_tasks also relies on num_cpus. this seems like functionality that is relevant enough to consumers that we could just provide a bevy_tasks api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@@ -63,6 +63,12 @@ impl App {
}

pub fn run(mut self) {
// Setup the default bevy task pools
self.resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced bevy_tasks should be hard-coded into apps, especially given that we're just adding resources. Can we make this a plugin that we add with add_default_plugins()?

Copy link
Contributor

@aclysma aclysma Aug 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a plugin in bevy_tasks will cause circular dependencies (app depends on ecs, ecs depends on tasks). Where to do you want it to be?

use bevy_ecs::Resources;
use bevy_tasks::{AsyncComputeTaskPool, ComputeTaskPool, IOTaskPool, TaskPoolBuilder};

fn clamp_usize(value: usize, min: usize, max: usize) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clamp is a big enough missing piece in the rust std that it probably makes sense to expose it in bevy_math. Can we add a generic version there and then consume it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll copy/paste num-traits implementation (licensed MIT) with a note as to where it came from

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://doc.rust-lang.org/std/cmp/trait.Ord.html#method.clamp

it's available on nightly, is it possible to switch to the default impl for nightly users?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh I didn't know that. Definitely good to know. In general I would prefer to wait for nightly features to land in stable. Otherwise we'll need to maintain multiple code paths, which makes it harder to debug and repro potential issues.

crates/bevy_app/src/task_pool_options.rs Outdated Show resolved Hide resolved
crates/bevy_app/src/task_pool_options.rs Show resolved Hide resolved
crates/bevy_tasks/README.md Show resolved Hide resolved
let task_pool = TaskPool::new();
let outputs = v.par_splat_map(&task_pool, None, |numbers| -> i32 { numbers.iter().sum() });

println!("outputs: {:?}", outputs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we swap the println!() calls in these tests for asserts on output values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

/// Vec<Task<T>> contained within TaskPoolInner
executor: Arc<multitask::Executor>,

///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: empty docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

crates/bevy_tasks/src/task_pool.rs Show resolved Hide resolved
crates/bevy_tasks/src/task_pool.rs Show resolved Hide resolved
@cart
Copy link
Member

cart commented Aug 28, 2020

@kabergstrom yeah that would be a welcome feature. when this gets merged, can you create an issue for it? id prefer to keep this pr constrained to its current scope.

@lachlansneff some of the schedule tests are failing because ComputeTaskPool doesn't exist. we'll need to fix them before merging.

@aclysma
Copy link
Contributor

aclysma commented Aug 29, 2020

Just to crudely illustrate how much of an improvement this is, when I run the breakout example I get the following cpu usage (average across cores):

rayon 1.3: ~50%
rayon 1.4: ~40%
bevy_tasks: ~20%

On a 3950x with 32 cores, I see a bigger difference (and rayon 1.3 and 1.4 give the same results for me):
Debug mode: 900% -> 110%
Release mode: 450% -> 40%

@aclysma
Copy link
Contributor

aclysma commented Aug 29, 2020

I think I addressed everything - I PR'ed it to @lachlansneff's branch so he can review it before we ask @cart to take another look.

* Address some feedback for the bevy_tasks PR
 - Add a general clamp fn to bevy_math
 - Expose num_cpus in bevy_tasks
 - Fill out empty doc comment

* Add comments and clean up pinning in bevy_tasks, fix a couple tests
Cargo.toml Outdated Show resolved Hide resolved
crates/bevy_app/Cargo.toml Outdated Show resolved Hide resolved
aclysma and others added 2 commits August 29, 2020 11:01
@aclysma
Copy link
Contributor

aclysma commented Aug 29, 2020

@cart I think this is ready for a second look, the two main issues remaining are:

  • “I'm not convinced bevy_tasks should be hard-coded into apps...”
  • “Ordering determining "precedence" is maybe a little hard to reason about...“

@cart
Copy link
Member

cart commented Aug 29, 2020

Ok I think this ready to merge now. The "bevy_tasks hard coded into apps" issue can be resolved separately. I'd rather merge this and then iterate than wait longer.

One more huge thanks to everyone involved. This is a big step forward!

@cart cart merged commit 17e7642 into bevyengine:master Aug 29, 2020
@CleanCut CleanCut mentioned this pull request Aug 29, 2020
@Moxinilian Moxinilian added C-Enhancement A new feature C-Performance A change motivated by improving speed, memory usage or compile times labels Aug 30, 2020
mrk-its pushed a commit to mrk-its/bevy that referenced this pull request Oct 6, 2020
Add bevy_tasks crate to replace rayon
BD103 added a commit to BD103/bevy that referenced this pull request Apr 9, 2024
I chose to remove the extended documentation for `TaskPool::executor`. It references `TaskPoolInner` which was added in bevyengine#384 when `bevy_tasks` was first created, but that struct was later removed in bevyengine#2250. James may be able to rewrite the comment, but I do not know enough about the task pool to write anything.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Enhancement A new feature C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task System for Bevy
8 participants