Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - use try_send to replace send.await, unbounded channel should always b… #7745

Closed
wants to merge 1 commit into from

Conversation

shuoli84
Copy link
Contributor

…e sendable, this improves performance

Objective

  • From prev experience, .await is not free, also I did a profiling a half year ago, bevy's multithread executor spend lots cycles on ArcWaker.

Solution

  • this pr replace sender.send().await to sender.try_send() to cut some future/await cost.

benchmarked on empty system

➜ critcmp send_base send_optimize
group                                         send_base                              send_optimize
-----                                         ---------                              -------------
empty_systems/000_systems                     1.01      2.8±0.03ns        ? ?/sec    1.00      2.8±0.02ns        ? ?/sec
empty_systems/001_systems                     1.00      5.9±0.21µs        ? ?/sec    1.01      5.9±0.23µs        ? ?/sec
empty_systems/002_systems                     1.03      6.4±0.26µs        ? ?/sec    1.00      6.2±0.19µs        ? ?/sec
empty_systems/003_systems                     1.01      6.5±0.17µs        ? ?/sec    1.00      6.4±0.20µs        ? ?/sec
empty_systems/004_systems                     1.03      7.0±0.24µs        ? ?/sec    1.00      6.8±0.18µs        ? ?/sec
empty_systems/005_systems                     1.04      7.4±0.35µs        ? ?/sec    1.00      7.2±0.21µs        ? ?/sec
empty_systems/010_systems                     1.00      9.0±0.28µs        ? ?/sec    1.00      9.1±0.80µs        ? ?/sec
empty_systems/015_systems                     1.01     10.9±0.36µs        ? ?/sec    1.00     10.8±1.29µs        ? ?/sec
empty_systems/020_systems                     1.12     12.7±0.67µs        ? ?/sec    1.00     11.3±0.37µs        ? ?/sec
empty_systems/025_systems                     1.12     14.6±0.39µs        ? ?/sec    1.00     13.0±1.02µs        ? ?/sec
empty_systems/030_systems                     1.12     16.2±0.39µs        ? ?/sec    1.00     14.4±0.37µs        ? ?/sec
empty_systems/035_systems                     1.19     18.2±0.97µs        ? ?/sec    1.00     15.3±0.48µs        ? ?/sec
empty_systems/040_systems                     1.12     20.6±0.58µs        ? ?/sec    1.00     18.3±1.87µs        ? ?/sec
empty_systems/045_systems                     1.18     22.7±0.57µs        ? ?/sec    1.00     19.2±0.46µs        ? ?/sec
empty_systems/050_systems                     1.03     21.9±0.92µs        ? ?/sec    1.00     21.3±0.96µs        ? ?/sec
empty_systems/055_systems                     1.13     25.7±1.00µs        ? ?/sec    1.00     22.8±0.50µs        ? ?/sec
empty_systems/060_systems                     1.35     30.0±2.57µs        ? ?/sec    1.00     22.2±1.04µs        ? ?/sec
empty_systems/065_systems                     1.28     31.7±0.76µs        ? ?/sec    1.00     24.8±0.79µs        ? ?/sec
empty_systems/070_systems                     1.33    36.8±10.37µs        ? ?/sec    1.00     27.6±0.55µs        ? ?/sec
empty_systems/075_systems                     1.25     38.0±0.83µs        ? ?/sec    1.00     30.3±0.63µs        ? ?/sec
empty_systems/080_systems                     1.33     41.7±1.22µs        ? ?/sec    1.00     31.4±1.01µs        ? ?/sec
empty_systems/085_systems                     1.27     45.6±2.54µs        ? ?/sec    1.00     35.8±4.06µs        ? ?/sec
empty_systems/090_systems                     1.29     48.3±5.33µs        ? ?/sec    1.00     37.6±5.32µs        ? ?/sec
empty_systems/095_systems                     1.16     45.7±0.97µs        ? ?/sec    1.00     39.4±2.75µs        ? ?/sec
empty_systems/100_systems                     1.14     49.5±4.26µs        ? ?/sec    1.00     43.5±1.06µs        ? ?/sec

@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Feb 19, 2023
@alice-i-cecile
Copy link
Member

Looks good: are there any measurable changes on animated_foxes? That's a more realistic example, and I'd love to see the distribution of frame times before and after.

@james7132
Copy link
Member

Makes sense to me. This synchronously attempts to push onto the internal ConcurrentQueue without yielding to the async executor. This would put contention only on the channel and not the executor. Nice work investigating this!

This does make the task now a fully synchronous one: it never yields. There may be some headroom to cut in looking for a way to mix synchronous and asynchronous tasks.

This likely doesn't have much of an impact on our current examples, as the slowdown is likely contention dependent, but this will change as we add more systems to the engine.

@shuoli84
Copy link
Contributor Author

Here is a bench I just did from my game win box. Here is code how to print these pct values. https://github.com/shuoli84/bevy/blob/executor_channel_send_opt_bench/examples/animation/animated_fox.rs#L143

Pr:

2023-02-20T02:40:26.324662Z  INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Windows 11 Pro", kernel: "22621", cpu: "Intel(R) Core(TM) i9-9900KF CPU @ 3.60GHz", core_count: "8", memory: "15.9 GiB" }
2023-02-20T02:40:26.342248Z  INFO bevy_input::gamepad: Gamepad { id: 0 } Connected
2023-02-20T02:40:26.437791Z  INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:40:26.438019Z  INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:40:26.438227Z  INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:40:26.438567Z  INFO naga::back::spv::writer: Skip function Some("skin_model")
2023-02-20T02:40:26.690288Z  INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:40:26.690418Z  INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:40:26.690631Z  INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:40:31.342537Z  INFO animated_fox::diagnostic: min:531us p50:1149us p90:2863us max:7079us
2023-02-20T02:40:36.348494Z  INFO animated_fox::diagnostic: min:546us p50:1192us p90:2790us max:7709us
2023-02-20T02:40:41.349897Z  INFO animated_fox::diagnostic: min:618us p50:1160us p90:2884us max:7431us
2023-02-20T02:40:46.353189Z  INFO animated_fox::diagnostic: min:564us p50:1291us p90:2803us max:7010us
2023-02-20T02:40:51.357689Z  INFO animated_fox::diagnostic: min:615us p50:1220us p90:2820us max:7082us
2023-02-20T02:40:56.357926Z  INFO animated_fox::diagnostic: min:512us p50:1185us p90:2770us max:7269us
2023-02-20T02:41:01.364985Z  INFO animated_fox::diagnostic: min:562us p50:1136us p90:2808us max:7297us
2023-02-20T02:41:06.365066Z  INFO animated_fox::diagnostic: min:599us p50:1142us p90:2670us max:7168us
2023-02-20T02:41:11.367200Z  INFO animated_fox::diagnostic: min:607us p50:1320us p90:2792us max:7316us
2023-02-20T02:41:16.368412Z  INFO animated_fox::diagnostic: min:591us p50:1142us p90:2777us max:7399us
2023-02-20T02:41:21.368891Z  INFO animated_fox::diagnostic: min:497us p50:1186us p90:2824us max:6994us
2023-02-20T02:41:26.373178Z  INFO animated_fox::diagnostic: min:585us p50:1164us p90:2696us max:7166us
2023-02-20T02:41:31.373573Z  INFO animated_fox::diagnostic: min:554us p50:1150us p90:2709us max:7335us
2023-02-20T02:41:36.373897Z  INFO animated_fox::diagnostic: min:630us p50:1376us p90:2839us max:6938us
2023-02-20T02:41:41.374206Z  INFO animated_fox::diagnostic: min:537us p50:1189us p90:2797us max:7133us
2023-02-20T02:41:46.374365Z  INFO animated_fox::diagnostic: min:539us p50:1179us p90:2761us max:6970us
2023-02-20T02:41:51.374704Z  INFO animated_fox::diagnostic: min:449us p50:772us p90:1836us max:7043us


Baseline:

2023-02-20T02:45:27.962135Z  INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Windows 11 Pro", kernel: "22621", cpu: "Intel(R) Core(TM) i9-9900KF CPU @ 3.60GHz", core_count: "8", memory: "15.9 GiB" }
2023-02-20T02:45:27.977233Z  INFO bevy_input::gamepad: Gamepad { id: 0 } Connected
2023-02-20T02:45:28.014561Z  INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:45:28.014794Z  INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:45:28.015004Z  INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:45:28.028847Z  INFO naga::back::spv::writer: Skip function Some("mesh_normal_local_to_world")
2023-02-20T02:45:28.029065Z  INFO naga::back::spv::writer: Skip function Some("sign_determinant_model_3x3")
2023-02-20T02:45:28.029261Z  INFO naga::back::spv::writer: Skip function Some("mesh_tangent_local_to_world")
2023-02-20T02:45:28.029582Z  INFO naga::back::spv::writer: Skip function Some("skin_model")
2023-02-20T02:45:32.977173Z  INFO animated_fox::diagnostic: min:608us p50:1192us p90:2877us max:7172us
2023-02-20T02:45:37.980240Z  INFO animated_fox::diagnostic: min:623us p50:1254us p90:2841us max:6966us
2023-02-20T02:45:42.980550Z  INFO animated_fox::diagnostic: min:597us p50:1461us p90:2887us max:7528us
2023-02-20T02:45:47.985037Z  INFO animated_fox::diagnostic: min:560us p50:1286us p90:2841us max:7105us
2023-02-20T02:45:52.990842Z  INFO animated_fox::diagnostic: min:599us p50:1240us p90:2806us max:7415us
2023-02-20T02:45:57.998088Z  INFO animated_fox::diagnostic: min:666us p50:1366us p90:2895us max:7102us
2023-02-20T02:46:03.002441Z  INFO animated_fox::diagnostic: min:558us p50:1419us p90:2862us max:7019us
2023-02-20T02:46:08.002955Z  INFO animated_fox::diagnostic: min:638us p50:1490us p90:2871us max:6909us
2023-02-20T02:46:13.010620Z  INFO animated_fox::diagnostic: min:636us p50:1382us p90:2939us max:7283us
2023-02-20T02:46:18.014608Z  INFO animated_fox::diagnostic: min:545us p50:1345us p90:2854us max:7284us
2023-02-20T02:46:23.017687Z  INFO animated_fox::diagnostic: min:670us p50:1424us p90:2832us max:6996us
2023-02-20T02:46:28.022218Z  INFO animated_fox::diagnostic: min:667us p50:1555us p90:2832us max:7142us
2023-02-20T02:46:33.023185Z  INFO animated_fox::diagnostic: min:624us p50:1565us p90:2974us max:7097us
2023-02-20T02:46:38.025517Z  INFO animated_fox::diagnostic: min:605us p50:1687us p90:2900us max:6848us
2023-02-20T02:46:43.026081Z  INFO animated_fox::diagnostic: min:671us p50:1358us p90:2846us max:6832us
2023-02-20T02:46:48.026145Z  INFO animated_fox::diagnostic: min:664us p50:1326us p90:2822us max:7321us
2023-02-20T02:46:53.026370Z  INFO animated_fox::diagnostic: min:557us p50:1374us p90:2911us max:7007us
2023-02-20T02:46:58.026779Z  INFO animated_fox::diagnostic: min:653us p50:1587us p90:2878us max:7326us
2023-02-20T02:47:03.033789Z  INFO animated_fox::diagnostic: min:524us p50:1297us p90:2806us max:7150us

@shuoli84
Copy link
Contributor Author

For above bench, eyeball compare, seems pct50 is better, pct90 is slightly better, max kinda same?

@shuoli84
Copy link
Contributor Author

ping

@alice-i-cecile alice-i-cecile added the D-Complex Quite challenging from either a design or technical perspective. Ask for help! label Feb 21, 2023
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I'd like to get another expert's opinion here before merging.

@shuoli84
Copy link
Contributor Author

ping~ the change is safe, try_send fails if channel is full or closed, but unbounded channel never full as stated in https://docs.rs/async-channel/latest/async_channel/struct.Sender.html#method.is_full.

@james7132 james7132 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Feb 27, 2023
@james7132
Copy link
Member

A thought came up: this used to be a bounded channel to minimize allocator use by said channel during the course of using the executor, which is why the await is there, but it seems like that might have disappeared during stageless. Might be worth testing if bringing back that bounded channel (with a capacity based on the total number of systems) will work with try_send as well, as it should never become full.

Copy link
Contributor

@hymm hymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change makes sense to me. I see a smaller change but noticable change on empty_systems. busy_systems, contrived, and many_foxes are all within noise for me.

group                                           main-for-try-send                       try-send
-----                                           -----------------                       --------
busy_systems/01x_entities_03_systems            1.00     30.2±1.08µs        ? ?/sec     1.01     30.5±0.80µs        ? ?/sec
busy_systems/01x_entities_06_systems            1.09     47.3±0.77µs        ? ?/sec     1.00     43.2±0.88µs        ? ?/sec
busy_systems/01x_entities_09_systems            1.02     65.9±1.96µs        ? ?/sec     1.00     64.5±0.73µs        ? ?/sec
busy_systems/01x_entities_12_systems            1.02     82.3±1.60µs        ? ?/sec     1.00     81.1±1.10µs        ? ?/sec
busy_systems/01x_entities_15_systems            1.00     99.4±3.30µs        ? ?/sec     1.00     99.7±6.23µs        ? ?/sec
busy_systems/02x_entities_03_systems            1.01     44.8±1.38µs        ? ?/sec     1.00     44.5±1.02µs        ? ?/sec
busy_systems/02x_entities_06_systems            1.00     74.7±1.35µs        ? ?/sec     1.01     75.1±3.72µs        ? ?/sec
busy_systems/02x_entities_09_systems            1.00    105.2±2.06µs        ? ?/sec     1.01    106.1±8.94µs        ? ?/sec
busy_systems/02x_entities_12_systems            1.01    132.1±2.66µs        ? ?/sec     1.00    131.0±2.88µs        ? ?/sec
busy_systems/02x_entities_15_systems            1.00    160.2±1.54µs        ? ?/sec     1.00    159.4±2.79µs        ? ?/sec
busy_systems/03x_entities_03_systems            1.00     57.7±0.85µs        ? ?/sec     1.03     59.2±1.51µs        ? ?/sec
busy_systems/03x_entities_06_systems            1.00    101.8±1.40µs        ? ?/sec     1.03    105.2±1.40µs        ? ?/sec
busy_systems/03x_entities_09_systems            1.02    149.5±7.04µs        ? ?/sec     1.00    146.2±3.83µs        ? ?/sec
busy_systems/03x_entities_12_systems            1.00    181.6±2.73µs        ? ?/sec     1.02    185.8±4.07µs        ? ?/sec
busy_systems/03x_entities_15_systems            1.00   226.7±13.24µs        ? ?/sec     1.01    228.3±3.48µs        ? ?/sec
busy_systems/04x_entities_03_systems            1.00     70.1±1.22µs        ? ?/sec     1.02     71.6±1.40µs        ? ?/sec
busy_systems/04x_entities_06_systems            1.01    128.7±4.14µs        ? ?/sec     1.00    127.2±5.93µs        ? ?/sec
busy_systems/04x_entities_09_systems            1.00    178.7±2.76µs        ? ?/sec     1.04    185.5±7.96µs        ? ?/sec
busy_systems/04x_entities_12_systems            1.00    232.6±7.49µs        ? ?/sec     1.02    237.8±4.35µs        ? ?/sec
busy_systems/04x_entities_15_systems            1.00   289.0±11.74µs        ? ?/sec     1.03    296.3±8.86µs        ? ?/sec
busy_systems/05x_entities_03_systems            1.04     83.3±1.28µs        ? ?/sec     1.00     80.5±2.34µs        ? ?/sec
busy_systems/05x_entities_06_systems            1.05    154.8±2.74µs        ? ?/sec     1.00    146.9±1.35µs        ? ?/sec
busy_systems/05x_entities_09_systems            1.05    219.0±4.51µs        ? ?/sec     1.00    208.6±3.86µs        ? ?/sec
busy_systems/05x_entities_12_systems            1.05    286.1±8.59µs        ? ?/sec     1.00    271.9±5.12µs        ? ?/sec
busy_systems/05x_entities_15_systems            1.01    354.1±8.31µs        ? ?/sec     1.00   351.4±11.61µs        ? ?/sec
contrived/01x_entities_03_systems               1.04     27.1±1.07µs        ? ?/sec     1.00     26.0±0.93µs        ? ?/sec
contrived/01x_entities_06_systems               1.08     36.5±0.58µs        ? ?/sec     1.00     33.9±0.97µs        ? ?/sec
contrived/01x_entities_09_systems               1.03     48.2±1.86µs        ? ?/sec     1.00     46.8±0.95µs        ? ?/sec
contrived/01x_entities_12_systems               1.01     57.8±0.96µs        ? ?/sec     1.00     57.0±1.22µs        ? ?/sec
contrived/01x_entities_15_systems               1.06     71.5±8.13µs        ? ?/sec     1.00     67.8±1.87µs        ? ?/sec
contrived/02x_entities_03_systems               1.06     32.3±0.92µs        ? ?/sec     1.00     30.4±0.91µs        ? ?/sec
contrived/02x_entities_06_systems               1.00     51.1±1.04µs        ? ?/sec     1.04     53.4±4.80µs        ? ?/sec
contrived/02x_entities_09_systems               1.06     69.8±1.92µs        ? ?/sec     1.00     65.8±1.65µs        ? ?/sec
contrived/02x_entities_12_systems               1.01     82.3±3.38µs        ? ?/sec     1.00     81.7±0.82µs        ? ?/sec
contrived/02x_entities_15_systems               1.02     98.9±1.89µs        ? ?/sec     1.00     97.0±0.93µs        ? ?/sec
contrived/03x_entities_03_systems               1.00     38.2±0.74µs        ? ?/sec     1.01     38.5±1.64µs        ? ?/sec
contrived/03x_entities_06_systems               1.00     58.4±0.71µs        ? ?/sec     1.05     61.4±1.25µs        ? ?/sec
contrived/03x_entities_09_systems               1.00     80.4±1.01µs        ? ?/sec     1.04     83.3±1.95µs        ? ?/sec
contrived/03x_entities_12_systems               1.00    105.2±2.57µs        ? ?/sec     1.01    105.8±2.45µs        ? ?/sec
contrived/03x_entities_15_systems               1.00    127.0±3.00µs        ? ?/sec     1.01    127.8±5.86µs        ? ?/sec
contrived/04x_entities_03_systems               1.06     48.9±2.82µs        ? ?/sec     1.00     45.9±0.97µs        ? ?/sec
contrived/04x_entities_06_systems               1.03     74.9±1.93µs        ? ?/sec     1.00     72.6±1.86µs        ? ?/sec
contrived/04x_entities_09_systems               1.07    107.1±6.36µs        ? ?/sec     1.00    100.3±2.62µs        ? ?/sec
contrived/04x_entities_12_systems               1.00    129.1±3.09µs        ? ?/sec     1.02    132.2±7.67µs        ? ?/sec
contrived/04x_entities_15_systems               1.00    154.8±4.68µs        ? ?/sec     1.01    155.9±3.50µs        ? ?/sec
contrived/05x_entities_03_systems               1.02     52.6±1.96µs        ? ?/sec     1.00     51.6±1.06µs        ? ?/sec
contrived/05x_entities_06_systems               1.06     85.8±5.89µs        ? ?/sec     1.00     80.9±0.84µs        ? ?/sec
contrived/05x_entities_09_systems               1.04    117.1±4.09µs        ? ?/sec     1.00    112.4±2.93µs        ? ?/sec
contrived/05x_entities_12_systems               1.00    151.0±3.90µs        ? ?/sec     1.05   158.1±10.09µs        ? ?/sec
contrived/05x_entities_15_systems               1.00    181.1±7.14µs        ? ?/sec     1.02    185.1±9.66µs        ? ?/sec
empty_systems/000_systems                       1.00      4.9±0.21ns        ? ?/sec     1.05      5.2±0.29ns        ? ?/sec
empty_systems/001_systems                       1.00     10.1±0.40µs        ? ?/sec     1.03     10.5±0.44µs        ? ?/sec
empty_systems/002_systems                       1.00     14.4±0.82µs        ? ?/sec     1.00     14.3±0.46µs        ? ?/sec
empty_systems/003_systems                       1.00     15.4±0.23µs        ? ?/sec     1.01     15.6±0.75µs        ? ?/sec
empty_systems/004_systems                       1.01     15.5±0.17µs        ? ?/sec     1.00     15.4±0.33µs        ? ?/sec
empty_systems/005_systems                       1.03     16.0±0.39µs        ? ?/sec     1.00     15.5±0.27µs        ? ?/sec
empty_systems/010_systems                       1.01     17.5±0.30µs        ? ?/sec     1.00     17.4±0.50µs        ? ?/sec
empty_systems/015_systems                       1.05     22.4±0.39µs        ? ?/sec     1.00     21.3±0.53µs        ? ?/sec
empty_systems/020_systems                       1.06     24.3±0.33µs        ? ?/sec     1.00     23.0±0.39µs        ? ?/sec
empty_systems/025_systems                       1.01     26.6±0.79µs        ? ?/sec     1.00     26.4±0.80µs        ? ?/sec
empty_systems/030_systems                       1.04     29.2±0.62µs        ? ?/sec     1.00     28.1±0.66µs        ? ?/sec
empty_systems/035_systems                       1.03     31.4±0.67µs        ? ?/sec     1.00     30.6±0.57µs        ? ?/sec
empty_systems/040_systems                       1.04     34.5±0.66µs        ? ?/sec     1.00     33.2±0.68µs        ? ?/sec
empty_systems/045_systems                       1.03     36.9±0.62µs        ? ?/sec     1.00     35.8±0.49µs        ? ?/sec
empty_systems/050_systems                       1.02     38.6±1.02µs        ? ?/sec     1.00     37.7±0.96µs        ? ?/sec
empty_systems/055_systems                       1.00     41.0±0.75µs        ? ?/sec     1.00     41.1±0.89µs        ? ?/sec
empty_systems/060_systems                       1.05     45.5±1.76µs        ? ?/sec     1.00     43.5±1.06µs        ? ?/sec
empty_systems/065_systems                       1.05     47.7±0.61µs        ? ?/sec     1.00     45.5±1.12µs        ? ?/sec
empty_systems/070_systems                       1.04     50.0±0.74µs        ? ?/sec     1.00     47.9±1.08µs        ? ?/sec
empty_systems/075_systems                       1.04     52.4±0.53µs        ? ?/sec     1.00     50.5±0.70µs        ? ?/sec
empty_systems/080_systems                       1.11     58.6±0.99µs        ? ?/sec     1.00     53.0±0.63µs        ? ?/sec
empty_systems/085_systems                       1.06     60.1±0.78µs        ? ?/sec     1.00     56.9±0.71µs        ? ?/sec
empty_systems/090_systems                       1.07     63.9±1.54µs        ? ?/sec     1.00     59.9±1.50µs        ? ?/sec
empty_systems/095_systems                       1.07     65.5±0.81µs        ? ?/sec     1.00     61.5±0.54µs        ? ?/sec
empty_systems/100_systems                       1.06     68.6±1.49µs        ? ?/sec     1.00     64.6±0.93µs        ? ?/sec

image

@james7132
Copy link
Member

bors r+

bors bot pushed a commit that referenced this pull request Feb 27, 2023
#7745)

…e sendable, this improves performance

# Objective

- From prev experience, `.await` is not free, also I did a profiling a half year ago, bevy's multithread executor spend lots cycles on ArcWaker.

## Solution

-  this pr replace `sender.send().await` to `sender.try_send()` to cut some future/await cost.

benchmarked on `empty system`
```bash
➜ critcmp send_base send_optimize
group                                         send_base                              send_optimize
-----                                         ---------                              -------------
empty_systems/000_systems                     1.01      2.8±0.03ns        ? ?/sec    1.00      2.8±0.02ns        ? ?/sec
empty_systems/001_systems                     1.00      5.9±0.21µs        ? ?/sec    1.01      5.9±0.23µs        ? ?/sec
empty_systems/002_systems                     1.03      6.4±0.26µs        ? ?/sec    1.00      6.2±0.19µs        ? ?/sec
empty_systems/003_systems                     1.01      6.5±0.17µs        ? ?/sec    1.00      6.4±0.20µs        ? ?/sec
empty_systems/004_systems                     1.03      7.0±0.24µs        ? ?/sec    1.00      6.8±0.18µs        ? ?/sec
empty_systems/005_systems                     1.04      7.4±0.35µs        ? ?/sec    1.00      7.2±0.21µs        ? ?/sec
empty_systems/010_systems                     1.00      9.0±0.28µs        ? ?/sec    1.00      9.1±0.80µs        ? ?/sec
empty_systems/015_systems                     1.01     10.9±0.36µs        ? ?/sec    1.00     10.8±1.29µs        ? ?/sec
empty_systems/020_systems                     1.12     12.7±0.67µs        ? ?/sec    1.00     11.3±0.37µs        ? ?/sec
empty_systems/025_systems                     1.12     14.6±0.39µs        ? ?/sec    1.00     13.0±1.02µs        ? ?/sec
empty_systems/030_systems                     1.12     16.2±0.39µs        ? ?/sec    1.00     14.4±0.37µs        ? ?/sec
empty_systems/035_systems                     1.19     18.2±0.97µs        ? ?/sec    1.00     15.3±0.48µs        ? ?/sec
empty_systems/040_systems                     1.12     20.6±0.58µs        ? ?/sec    1.00     18.3±1.87µs        ? ?/sec
empty_systems/045_systems                     1.18     22.7±0.57µs        ? ?/sec    1.00     19.2±0.46µs        ? ?/sec
empty_systems/050_systems                     1.03     21.9±0.92µs        ? ?/sec    1.00     21.3±0.96µs        ? ?/sec
empty_systems/055_systems                     1.13     25.7±1.00µs        ? ?/sec    1.00     22.8±0.50µs        ? ?/sec
empty_systems/060_systems                     1.35     30.0±2.57µs        ? ?/sec    1.00     22.2±1.04µs        ? ?/sec
empty_systems/065_systems                     1.28     31.7±0.76µs        ? ?/sec    1.00     24.8±0.79µs        ? ?/sec
empty_systems/070_systems                     1.33    36.8±10.37µs        ? ?/sec    1.00     27.6±0.55µs        ? ?/sec
empty_systems/075_systems                     1.25     38.0±0.83µs        ? ?/sec    1.00     30.3±0.63µs        ? ?/sec
empty_systems/080_systems                     1.33     41.7±1.22µs        ? ?/sec    1.00     31.4±1.01µs        ? ?/sec
empty_systems/085_systems                     1.27     45.6±2.54µs        ? ?/sec    1.00     35.8±4.06µs        ? ?/sec
empty_systems/090_systems                     1.29     48.3±5.33µs        ? ?/sec    1.00     37.6±5.32µs        ? ?/sec
empty_systems/095_systems                     1.16     45.7±0.97µs        ? ?/sec    1.00     39.4±2.75µs        ? ?/sec
empty_systems/100_systems                     1.14     49.5±4.26µs        ? ?/sec    1.00     43.5±1.06µs        ? ?/sec
```
@bors bors bot changed the title use try_send to replace send.await, unbounded channel should always b… [Merged by Bors] - use try_send to replace send.await, unbounded channel should always b… Feb 27, 2023
@bors bors bot closed this Feb 27, 2023
bors bot pushed a commit that referenced this pull request Feb 27, 2023
# Objective
This is a follow-up to #7745. An unbounded `async_channel`  occasionally allocates whenever it exceeds the capacity of the current buffer in it's internal linked list. This is avoidable.

This also used to be a bounded channel before stageless, which was introduced in #4919.

## Solution
Use a bounded channel to avoid allocations on system completion.

This shouldn't conflict with #7745, as it's impossible for the scheduler to exceed the channel capacity, even if somehow every system completed at the same time.
bors bot pushed a commit that referenced this pull request Feb 27, 2023
# Objective
This is a follow-up to #7745. An unbounded `async_channel`  occasionally allocates whenever it exceeds the capacity of the current buffer in it's internal linked list. This is avoidable.

This also used to be a bounded channel before stageless, which was introduced in #4919.

## Solution
Use a bounded channel to avoid allocations on system completion.

This shouldn't conflict with #7745, as it's impossible for the scheduler to exceed the channel capacity, even if somehow every system completed at the same time.
Shfty pushed a commit to shfty-rust/bevy that referenced this pull request Mar 19, 2023
bevyengine#7745)

…e sendable, this improves performance

# Objective

- From prev experience, `.await` is not free, also I did a profiling a half year ago, bevy's multithread executor spend lots cycles on ArcWaker.

## Solution

-  this pr replace `sender.send().await` to `sender.try_send()` to cut some future/await cost.

benchmarked on `empty system`
```bash
➜ critcmp send_base send_optimize
group                                         send_base                              send_optimize
-----                                         ---------                              -------------
empty_systems/000_systems                     1.01      2.8±0.03ns        ? ?/sec    1.00      2.8±0.02ns        ? ?/sec
empty_systems/001_systems                     1.00      5.9±0.21µs        ? ?/sec    1.01      5.9±0.23µs        ? ?/sec
empty_systems/002_systems                     1.03      6.4±0.26µs        ? ?/sec    1.00      6.2±0.19µs        ? ?/sec
empty_systems/003_systems                     1.01      6.5±0.17µs        ? ?/sec    1.00      6.4±0.20µs        ? ?/sec
empty_systems/004_systems                     1.03      7.0±0.24µs        ? ?/sec    1.00      6.8±0.18µs        ? ?/sec
empty_systems/005_systems                     1.04      7.4±0.35µs        ? ?/sec    1.00      7.2±0.21µs        ? ?/sec
empty_systems/010_systems                     1.00      9.0±0.28µs        ? ?/sec    1.00      9.1±0.80µs        ? ?/sec
empty_systems/015_systems                     1.01     10.9±0.36µs        ? ?/sec    1.00     10.8±1.29µs        ? ?/sec
empty_systems/020_systems                     1.12     12.7±0.67µs        ? ?/sec    1.00     11.3±0.37µs        ? ?/sec
empty_systems/025_systems                     1.12     14.6±0.39µs        ? ?/sec    1.00     13.0±1.02µs        ? ?/sec
empty_systems/030_systems                     1.12     16.2±0.39µs        ? ?/sec    1.00     14.4±0.37µs        ? ?/sec
empty_systems/035_systems                     1.19     18.2±0.97µs        ? ?/sec    1.00     15.3±0.48µs        ? ?/sec
empty_systems/040_systems                     1.12     20.6±0.58µs        ? ?/sec    1.00     18.3±1.87µs        ? ?/sec
empty_systems/045_systems                     1.18     22.7±0.57µs        ? ?/sec    1.00     19.2±0.46µs        ? ?/sec
empty_systems/050_systems                     1.03     21.9±0.92µs        ? ?/sec    1.00     21.3±0.96µs        ? ?/sec
empty_systems/055_systems                     1.13     25.7±1.00µs        ? ?/sec    1.00     22.8±0.50µs        ? ?/sec
empty_systems/060_systems                     1.35     30.0±2.57µs        ? ?/sec    1.00     22.2±1.04µs        ? ?/sec
empty_systems/065_systems                     1.28     31.7±0.76µs        ? ?/sec    1.00     24.8±0.79µs        ? ?/sec
empty_systems/070_systems                     1.33    36.8±10.37µs        ? ?/sec    1.00     27.6±0.55µs        ? ?/sec
empty_systems/075_systems                     1.25     38.0±0.83µs        ? ?/sec    1.00     30.3±0.63µs        ? ?/sec
empty_systems/080_systems                     1.33     41.7±1.22µs        ? ?/sec    1.00     31.4±1.01µs        ? ?/sec
empty_systems/085_systems                     1.27     45.6±2.54µs        ? ?/sec    1.00     35.8±4.06µs        ? ?/sec
empty_systems/090_systems                     1.29     48.3±5.33µs        ? ?/sec    1.00     37.6±5.32µs        ? ?/sec
empty_systems/095_systems                     1.16     45.7±0.97µs        ? ?/sec    1.00     39.4±2.75µs        ? ?/sec
empty_systems/100_systems                     1.14     49.5±4.26µs        ? ?/sec    1.00     43.5±1.06µs        ? ?/sec
```
Shfty pushed a commit to shfty-rust/bevy that referenced this pull request Mar 19, 2023
# Objective
This is a follow-up to bevyengine#7745. An unbounded `async_channel`  occasionally allocates whenever it exceeds the capacity of the current buffer in it's internal linked list. This is avoidable.

This also used to be a bounded channel before stageless, which was introduced in bevyengine#4919.

## Solution
Use a bounded channel to avoid allocations on system completion.

This shouldn't conflict with bevyengine#7745, as it's impossible for the scheduler to exceed the channel capacity, even if somehow every system completed at the same time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants