Add particles stress test #78

johanhelsing · 2023-10-25T22:52:09Z

Objective

We need some easy way of checking for performance regressions.

Solution

Add an example that spawns particles when either of the players holds down a key.
Pass most parameters through command-line args

Depends on #76 and #77

todo:

checksumming
figure out why it desyncs
command-line options for using reflect-based snapshotting instead?

bushrat011899 · 2023-10-26T00:47:02Z

This looks great! Once #76 is merged I'll have a more detailed look into how this benchmark works.

bushrat011899 · 2023-10-26T05:35:59Z

I ran some profiling and I can see where the slowdown is coming from, the AddRollbackCommand needs exclusive world access, and each call needed to sort the whole list of Rollback components as you pointed out. I upstreamed this PR to include the changes I have since made and now that lag-spike appears to be non-existent.

johanhelsing · 2023-10-26T08:20:07Z

I added checksumming to Velocity, and unfortunately this example desyncs if I spawn quite a lot of particles.

I've also been experimenting a bit with bevy_gaff on this version. I fixed a bunch of non-determinism issues in bevy_xpbd (locally), but I'm still left with some strange rare desyncs there that I don't understand. So I'm thinking perhaps we still have a bug in bevy_ggrs?

EDIT:

Run like this to easily trigger a few rollbacks

cargo watch -cx "run --release --example particles -- --local-port 7000 --players localhost 127.0.0.1:7001 --input-delay 0 --desync-detection-interval 1 --rate 1"
cargo watch -cx "run --release --example particles -- --local-port 7001 --players 127.0.0.1:7000 localhost --input-delay 0 --desync-detection-interval 1 --rate 1"

Seems to happen on rollbacks (push or release button)

johanhelsing · 2023-10-26T09:14:13Z

Another interesting thing is that it only seems to happen if rollbacks actually spawn new particles. I added an N key for a no-op, it doesn't actually change anything, but still triggers a rollback because the input is different, and remapped spawning to the space key.

Releasing space doesn't trigger desyncs
Neither does pressing or releasing N
Pushing down space usually trigger desyncs, but not always

bushrat011899

Really like this stress test so far! It's highlighted some very interesting behaviour regarding desync detection and helped me solve at least 1 performance regression already (sort_unchecked in the previous PR)

bushrat011899 · 2023-10-27T06:06:04Z

Cargo.toml

@@ -43,3 +44,7 @@ path = "examples/box_game/box_game_spectator.rs"
 [[example]]
 name = "box_game_synctest"
 path = "examples/box_game/box_game_synctest.rs"
+
+[[example]]


It would be nice to add a short blurb in the README.md explaining how to run this stress test:

cargo run --release --example particles -- --local-port 7000 --players localhost 127.0.0.1:7001 --input-delay 0 --desync-detection-interval 1 --rate 1 cargo run --release --example particles -- --local-port 7001 --players 127.0.0.1:7000 localhost --input-delay 0 --desync-detection-interval 1 --rate 1

I put in a clap help section instead, which I think makes it easy to discover (and keep in sync, since it's in the source file).

dev/bevy_ggrs particles-stress-test↑1 ~1 →1 …1 cargo run --release --example particles Finished release [optimized] target(s) in 0.31s Running `target\release\examples\particles.exe` error: the following required arguments were not provided: --local-port <LOCAL_PORT> Usage: particles.exe --local-port <LOCAL_PORT> For more information, try '--help'. error: process didn't exit successfully: `target\release\examples\particles.exe` (exit code: 2) dev/bevy_ggrs particles-stress-test↑1 ~1 →1 …1 cargo run --release --example particles -- --help Finished release [optimized] target(s) in 0.30s Running `target\release\examples\particles.exe --help` Stress test for bevy_ggrs ## Basic usage: Player 1: cargo run --release --example particles -- --local-port 7000 --players localhost 127.0.0.1:7001 Player 2: cargo run --release --example particles -- --local-port 7001 --players 127.0.0.1:7001 localhost Usage: particles.exe [OPTIONS] --local-port <LOCAL_PORT> Options: -l, --local-port <LOCAL_PORT> The udp port to bind to for this peer -p, --players <PLAYERS>... Address and port for the players. Order is significant. Put yourself as "localhost". e.g. `--players localhost 127.0.0.1:7001` -s, --spectators <SPECTATORS>... Address and port for any spectators -i, --input-delay <INPUT_DELAY> How long inputs should be kept before they are deployed. A low value, such as 0 will result in low latency, but plenty of rollbacks [default: 2] -d, --desync-detection-interval <DESYNC_DETECTION_INTERVAL> How often the clients should exchange and compare checkums of state [default: 10] --continue-after-desync Whether to continue after a detected desync, the default is to panic -n, --rate <RATE> How many particles to spawn per frame [default: 100] -f, --fps <FPS> Simulation frame rate [default: 60] --max-prediction <MAX_PREDICTION> How far ahead we should simulate when we don't get any input from a player [default: 8] --reflect Whether to use reflect-based rollback. This is much slower than the default clone/copy-based rollback -h, --help Print help (see a summary with '-h') dev/bevy_ggrs particles-stress-test↑1 ~1 →1 …1

examples/stress_tests/particles.rs

bushrat011899 · 2023-10-27T06:14:30Z

examples/stress_tests/particles.rs

+        .rollback_component_with_copy::<Ttl>()
+        .rollback_resource_with_clone::<ParticleRng>()
+        .checksum_component_with_hash::<Velocity>()
+        // todo: ideally we'd also be doing checksums for Transforms, but that's


Agreed. Perhaps a ComponentChecksumCustomHashPlugin which would accept a function fn<H: Hasher>(&C, &mut H) could be used to retroactively add hashing to types which don't support it?

bushrat011899 · 2023-10-27T06:18:26Z

examples/stress_tests/particles.rs

+}
+
+fn update_particles(mut particles: Query<(&mut Transform, &mut Velocity)>, args: Res<Args>) {
+    let time_step = 1.0 / args.fps as f32; // todo: replace with bevy_ggrs resource?


On this theme, I think it would be good to replace the Time resource during advance-frame with one that bevy_ggrs manually controls the time of. Then, systems already using the canonical Bevy Time resource would just automatically be rollback aware.

examples/stress_tests/particles.rs

bushrat011899 · 2023-10-30T11:48:02Z

* Releasing space doesn't trigger desyncs

I'm still testing with this trying to work out what's going wrong and, unfortunately, I've seen it trigger desyncs purely on the release of spacebar. I've also tested using a much coarser hashing function (only hash the integer component of velocity) with no difference in desync behaviour. I suspect the issue is related to the creation or removal of rollback entities on predicted frames. I want to resolve this, since I see it as a real blocker to more advanced future functionality (e.g., de-sync resolution through resynchronisation, etc.)

bushrat011899 · 2023-10-30T12:02:20Z

Ok it is absolutely to do with entities only and has nothing to do with the velocity component. I created an EntityChecksumPlugin:

use std::hash::{BuildHasher, Hash, Hasher};

use bevy::prelude::*;

use crate::{ChecksumFlag, ChecksumPart, Rollback, RollbackOrdered, SaveWorld, SaveWorldSet};

pub struct EntityChecksumPlugin;

impl EntityChecksumPlugin {
    #[allow(clippy::type_complexity)]
    pub fn update(
        mut commands: Commands,
        rollback_ordered: Res<RollbackOrdered>,
        components: Query<&Rollback, (With<Rollback>, Without<ChecksumFlag<Entity>>)>,
        mut checksum: Query<&mut ChecksumPart, (Without<Rollback>, With<ChecksumFlag<Entity>>)>,
    ) {
        let mut hasher = bevy::utils::FixedState.build_hasher();

        let mut result = 0;

        for &rollback in components.iter() {
            let mut hasher = hasher.clone();

            // Hashing the rollback index ensures this hash is unique and stable
            rollback_ordered.order(rollback).hash(&mut hasher);

            // XOR chosen over addition or multiplication as it is closed on u64 and commutative
            result ^= hasher.finish();
        }

        // Hash the XOR'ed result to break commutativity with other types
        result.hash(&mut hasher);

        let result = ChecksumPart(hasher.finish() as u128);

        trace!(
            "Rollback Entities have checksum {:X}",
            result.0
        );

        if let Ok(mut checksum) = checksum.get_single_mut() {
            *checksum = result;
        } else {
            commands.spawn((result, ChecksumFlag::<Entity>::default()));
        }
    }
}

impl Plugin for EntityChecksumPlugin {
    fn build(&self, app: &mut App) {
        app.add_systems(SaveWorld, Self::update.in_set(SaveWorldSet::Checksum));
    }
}

And added it as the only system contributing to the checksum and the same desync behaviour is observed. I'm about to go to bed but that leaves 3 possibilities:

Particles are being spawned when they shouldn't be.
Particles aren't being despawned when they should be.
The value in RollbackOrdered is not synchronised.

Writing this out, I now know it has to be 3. I think if a rollback entity is added on a predicted frame, and then is rolled back, it'll stay in the order resource and break sync with all clients. In the morning I'll make a PR that just ensures rollback entities created pre-emptively are removed from RollbackOrdered.

bushrat011899 · 2023-10-31T01:49:34Z

I believe I have the desync issues with this branch solved with #82! It was a combination of what I suspected (rollback entities being de/spawned during a rollback messing up the RollbackOrdered ordering) and a completely different issue where Bevy would chose a different generation value for entities spawned during a rollback, destroying Rollback ordering for all clients that had to rollback.

I tested with a rebase and couldn't induce a desync. I'm honestly shocked that hashing a f32 velocity value by its bits wasn't the cause of the desync issues!

johanhelsing · 2023-10-31T06:37:06Z

I'm honestly shocked that hashing a f32 velocity value by its bits wasn't the cause of the desync issues!

As far as I've understood, it's fine as long as we don't get any NaN values, in fact we rely on it for determininism? I guess the hasher could check for NaN and just do a deterministic bogus checksum (or warn)?

gschup · 2023-10-31T06:44:27Z

We should definitely avoid hashing the bits of a float just like that. Special cases like NaN (both quiet and signalling) and Inf can become troublesome.

https://reference.opcfoundation.org/Core/Part6/v104/docs/5.2.2.3#:~:text=The%20floating%2Dpoint%20type%20supports,000000000000F8FF)%20or%20(0000C0FF).

gschup · 2023-10-31T06:46:19Z

On the other hand, working with floats in Rust has proven to be surprisingly deterministic in my experience.

johanhelsing · 2023-10-31T06:46:35Z

We should definitely avoid hashing the bits of a float just like that. Special cases like NaN (both quiet and signalling) and Inf can become troublesome.

https://reference.opcfoundation.org/Core/Part6/v104/docs/5.2.2.3#:~:text=The%20floating%2Dpoint%20type%20supports,000000000000F8FF)%20or%20(0000C0FF).

I'm of two minds... aren't you in a bit of a pickle if your floats have reached inf or nan anyway? Wouldn't it be good to know?

gschup · 2023-10-31T06:49:23Z

I'm of two minds... aren't you in a bit of a pickle if your floats have reached inf or nan anyway? Wouldn't it be good to know?

For our purposes, definitely. I think adding a NaN/Inf check just before hashing the bits should be sufficient for this example.

johanhelsing · 2023-10-31T07:22:48Z

For our purposes, definitely. I think adding a NaN/Inf check just before hashing the bits should be sufficient for this example.

I added an assert and a comment.

gschup · 2023-10-31T13:03:13Z

So this stress test is ready to be merged, then?

johanhelsing mentioned this pull request Oct 25, 2023

Refactor into Modular Plugins #76

Merged

4 tasks

johanhelsing force-pushed the particles-stress-test branch from e6bb352 to 0ae106f Compare October 26, 2023 08:15

johanhelsing marked this pull request as ready for review October 26, 2023 08:20

johanhelsing mentioned this pull request Oct 26, 2023

panic: Rollback requested was not created using AddRollbackCommand! #79

Open

bushrat011899 reviewed Oct 27, 2023

View reviewed changes

johanhelsing mentioned this pull request Oct 29, 2023

When rollbacks happen across entity spawning, desyncs are reported #81

Closed

bushrat011899 mentioned this pull request Oct 31, 2023

Ensure Rollback Order is Stable After Rollback #82

Merged

johanhelsing added 5 commits October 31, 2023 07:17

Add particles stress test example

83f8914

Add checksumming for Velocity

e660d48

Allow turning desync detection off

7e5e537

Log desync checksums as hex

538bd77

Change key bindings

b609f14

Add argument for not panicking on desyncs

480d8d0

johanhelsing force-pushed the particles-stress-test branch from 608a160 to 480d8d0 Compare October 31, 2023 06:56

johanhelsing added 4 commits October 31, 2023 08:06

Add reflect-based rollback and --help entries

2f36e4d

Add reflect-based rollback and --help entries

340f950

Add top level clap help

2bc82b8

Make sure we don't hash NaNs

f8ccaaa

Fix wording in comment

59f13f6

gschup merged commit bf239c5 into gschup:main Nov 1, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add particles stress test #78

Add particles stress test #78

johanhelsing commented Oct 25, 2023 •

edited

Loading

bushrat011899 commented Oct 26, 2023

bushrat011899 commented Oct 26, 2023

johanhelsing commented Oct 26, 2023 •

edited

Loading

johanhelsing commented Oct 26, 2023

bushrat011899 left a comment

bushrat011899 Oct 27, 2023

johanhelsing Oct 31, 2023

bushrat011899 Oct 27, 2023

bushrat011899 Oct 27, 2023

bushrat011899 commented Oct 30, 2023

bushrat011899 commented Oct 30, 2023

bushrat011899 commented Oct 31, 2023

johanhelsing commented Oct 31, 2023 •

edited

Loading

gschup commented Oct 31, 2023

gschup commented Oct 31, 2023

johanhelsing commented Oct 31, 2023 •

edited

Loading

gschup commented Oct 31, 2023

johanhelsing commented Oct 31, 2023

gschup commented Oct 31, 2023

Add particles stress test #78

Add particles stress test #78

Conversation

johanhelsing commented Oct 25, 2023 • edited Loading

Objective

Solution

todo:

bushrat011899 commented Oct 26, 2023

bushrat011899 commented Oct 26, 2023

johanhelsing commented Oct 26, 2023 • edited Loading

johanhelsing commented Oct 26, 2023

bushrat011899 left a comment

Choose a reason for hiding this comment

bushrat011899 Oct 27, 2023

Choose a reason for hiding this comment

johanhelsing Oct 31, 2023

Choose a reason for hiding this comment

bushrat011899 Oct 27, 2023

Choose a reason for hiding this comment

bushrat011899 Oct 27, 2023

Choose a reason for hiding this comment

bushrat011899 commented Oct 30, 2023

bushrat011899 commented Oct 30, 2023

bushrat011899 commented Oct 31, 2023

johanhelsing commented Oct 31, 2023 • edited Loading

gschup commented Oct 31, 2023

gschup commented Oct 31, 2023

johanhelsing commented Oct 31, 2023 • edited Loading

gschup commented Oct 31, 2023

johanhelsing commented Oct 31, 2023

gschup commented Oct 31, 2023

johanhelsing commented Oct 25, 2023 •

edited

Loading

johanhelsing commented Oct 26, 2023 •

edited

Loading

johanhelsing commented Oct 31, 2023 •

edited

Loading

johanhelsing commented Oct 31, 2023 •

edited

Loading