-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add particles stress test #78
Conversation
This looks great! Once #76 is merged I'll have a more detailed look into how this benchmark works. |
I ran some profiling and I can see where the slowdown is coming from, the |
e6bb352
to
0ae106f
Compare
I added checksumming to I've also been experimenting a bit with bevy_gaff on this version. I fixed a bunch of non-determinism issues in bevy_xpbd (locally), but I'm still left with some strange rare desyncs there that I don't understand. So I'm thinking perhaps we still have a bug in bevy_ggrs? EDIT: Run like this to easily trigger a few rollbacks
Seems to happen on rollbacks (push or release button) |
Another interesting thing is that it only seems to happen if rollbacks actually spawn new particles. I added an
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like this stress test so far! It's highlighted some very interesting behaviour regarding desync detection and helped me solve at least 1 performance regression already (sort_unchecked
in the previous PR)
@@ -43,3 +44,7 @@ path = "examples/box_game/box_game_spectator.rs" | |||
[[example]] | |||
name = "box_game_synctest" | |||
path = "examples/box_game/box_game_synctest.rs" | |||
|
|||
[[example]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a short blurb in the README.md
explaining how to run this stress test:
cargo run --release --example particles -- --local-port 7000 --players localhost 127.0.0.1:7001 --input-delay 0 --desync-detection-interval 1 --rate 1
cargo run --release --example particles -- --local-port 7001 --players 127.0.0.1:7000 localhost --input-delay 0 --desync-detection-interval 1 --rate 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put in a clap help section instead, which I think makes it easy to discover (and keep in sync, since it's in the source file).
dev/bevy_ggrs particles-stress-test↑1 ~1 →1 …1 cargo run --release --example particles
Finished release [optimized] target(s) in 0.31s
Running `target\release\examples\particles.exe`
error: the following required arguments were not provided:
--local-port <LOCAL_PORT>
Usage: particles.exe --local-port <LOCAL_PORT>
For more information, try '--help'.
error: process didn't exit successfully: `target\release\examples\particles.exe` (exit code: 2)
dev/bevy_ggrs particles-stress-test↑1 ~1 →1 …1 cargo run --release --example particles -- --help
Finished release [optimized] target(s) in 0.30s
Running `target\release\examples\particles.exe --help`
Stress test for bevy_ggrs
## Basic usage:
Player 1:
cargo run --release --example particles -- --local-port 7000 --players localhost 127.0.0.1:7001
Player 2:
cargo run --release --example particles -- --local-port 7001 --players 127.0.0.1:7001 localhost
Usage: particles.exe [OPTIONS] --local-port <LOCAL_PORT>
Options:
-l, --local-port <LOCAL_PORT>
The udp port to bind to for this peer
-p, --players <PLAYERS>...
Address and port for the players. Order is significant. Put yourself as "localhost".
e.g. `--players localhost 127.0.0.1:7001`
-s, --spectators <SPECTATORS>...
Address and port for any spectators
-i, --input-delay <INPUT_DELAY>
How long inputs should be kept before they are deployed. A low value, such as 0 will result in low latency, but plenty of rollbacks
[default: 2]
-d, --desync-detection-interval <DESYNC_DETECTION_INTERVAL>
How often the clients should exchange and compare checkums of state
[default: 10]
--continue-after-desync
Whether to continue after a detected desync, the default is to panic
-n, --rate <RATE>
How many particles to spawn per frame
[default: 100]
-f, --fps <FPS>
Simulation frame rate
[default: 60]
--max-prediction <MAX_PREDICTION>
How far ahead we should simulate when we don't get any input from a player
[default: 8]
--reflect
Whether to use reflect-based rollback. This is much slower than the default clone/copy-based rollback
-h, --help
Print help (see a summary with '-h')
dev/bevy_ggrs particles-stress-test↑1 ~1 →1 …1
.rollback_component_with_copy::<Ttl>() | ||
.rollback_resource_with_clone::<ParticleRng>() | ||
.checksum_component_with_hash::<Velocity>() | ||
// todo: ideally we'd also be doing checksums for Transforms, but that's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Perhaps a ComponentChecksumCustomHashPlugin
which would accept a function fn<H: Hasher>(&C, &mut H)
could be used to retroactively add hashing to types which don't support it?
} | ||
|
||
fn update_particles(mut particles: Query<(&mut Transform, &mut Velocity)>, args: Res<Args>) { | ||
let time_step = 1.0 / args.fps as f32; // todo: replace with bevy_ggrs resource? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this theme, I think it would be good to replace the Time resource during advance-frame with one that bevy_ggrs
manually controls the time of. Then, systems already using the canonical Bevy Time
resource would just automatically be rollback aware.
I'm still testing with this trying to work out what's going wrong and, unfortunately, I've seen it trigger desyncs purely on the release of spacebar. I've also tested using a much coarser hashing function (only hash the integer component of velocity) with no difference in desync behaviour. I suspect the issue is related to the creation or removal of rollback entities on predicted frames. I want to resolve this, since I see it as a real blocker to more advanced future functionality (e.g., de-sync resolution through resynchronisation, etc.) |
Ok it is absolutely to do with entities only and has nothing to do with the velocity component. I created an use std::hash::{BuildHasher, Hash, Hasher};
use bevy::prelude::*;
use crate::{ChecksumFlag, ChecksumPart, Rollback, RollbackOrdered, SaveWorld, SaveWorldSet};
pub struct EntityChecksumPlugin;
impl EntityChecksumPlugin {
#[allow(clippy::type_complexity)]
pub fn update(
mut commands: Commands,
rollback_ordered: Res<RollbackOrdered>,
components: Query<&Rollback, (With<Rollback>, Without<ChecksumFlag<Entity>>)>,
mut checksum: Query<&mut ChecksumPart, (Without<Rollback>, With<ChecksumFlag<Entity>>)>,
) {
let mut hasher = bevy::utils::FixedState.build_hasher();
let mut result = 0;
for &rollback in components.iter() {
let mut hasher = hasher.clone();
// Hashing the rollback index ensures this hash is unique and stable
rollback_ordered.order(rollback).hash(&mut hasher);
// XOR chosen over addition or multiplication as it is closed on u64 and commutative
result ^= hasher.finish();
}
// Hash the XOR'ed result to break commutativity with other types
result.hash(&mut hasher);
let result = ChecksumPart(hasher.finish() as u128);
trace!(
"Rollback Entities have checksum {:X}",
result.0
);
if let Ok(mut checksum) = checksum.get_single_mut() {
*checksum = result;
} else {
commands.spawn((result, ChecksumFlag::<Entity>::default()));
}
}
}
impl Plugin for EntityChecksumPlugin {
fn build(&self, app: &mut App) {
app.add_systems(SaveWorld, Self::update.in_set(SaveWorldSet::Checksum));
}
} And added it as the only system contributing to the checksum and the same desync behaviour is observed. I'm about to go to bed but that leaves 3 possibilities:
Writing this out, I now know it has to be 3. I think if a rollback entity is added on a predicted frame, and then is rolled back, it'll stay in the order resource and break sync with all clients. In the morning I'll make a PR that just ensures rollback entities created pre-emptively are removed from |
I believe I have the desync issues with this branch solved with #82! It was a combination of what I suspected (rollback entities being de/spawned during a rollback messing up the I tested with a rebase and couldn't induce a desync. I'm honestly shocked that hashing a |
As far as I've understood, it's fine as long as we don't get any NaN values, in fact we rely on it for determininism? I guess the hasher could check for NaN and just do a deterministic bogus checksum (or warn)? |
We should definitely avoid hashing the bits of a float just like that. Special cases like NaN (both quiet and signalling) and Inf can become troublesome. |
On the other hand, working with floats in Rust has proven to be surprisingly deterministic in my experience. |
I'm of two minds... aren't you in a bit of a pickle if your floats have reached inf or nan anyway? Wouldn't it be good to know? |
For our purposes, definitely. I think adding a NaN/Inf check just before hashing the bits should be sufficient for this example. |
608a160
to
480d8d0
Compare
I added an |
So this stress test is ready to be merged, then? |
Objective
We need some easy way of checking for performance regressions.
Solution
Depends on #76 and #77
todo: