Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush commands after every mutation in WorldEntityMut #16219

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

nakedible
Copy link
Contributor

@nakedible nakedible commented Nov 3, 2024

Objective

Solution

  • Allow WorldEntityMut to exist even when it refers to a despawned entity by allowing EntityLocation to be marked invalid
  • Add checks to all methods to panic if trying to access a despawned entity
  • Flush command queue after every operation that might trigger hooks or observers
  • Update entity location always after flushing command queue

Testing

  • Added test cases for currently broken behaviour
  • Added test cases that flushes happen in all operations
  • Added test cases to ensure hooks and commands are run exactly in correct order when nested

Todo:

  • Write migration guide
  • Add tests that using EntityWorldMut on a despawned entity panics
  • Add tests that commands are flushed after every operation that is supposed to flush them
  • Add tests that hooks, observers and their spawned commands are run in the correct order when nested

Migration Guide

Previously EntityWorldMut triggered command queue flushes in unpredictable places, which could interfere with hooks and observers. Now the command queue is flushed always immediately after any call in EntityWorldMut that spawns or despawns an entity, or adds, removes or replaces a component. This means hooks and observers will run their commands in the correct order.

As a side effect, there is a possibility that a hook or observer could despawn the entity that is being referred to by EntityWorldMut. This could already currently happen if an observer was added while keeping an EntityWorldMut referece and would cause unsound behaviour. If the entity has been despawned, calling any methods which require the entity location will panic. This matches the behaviour that Commands will panic if called on an already despawned entity. In the extremely rare case where taking a new EntityWorldMut reference or otherwise restructuring the code so that this case does not happen is not possible, there's a new is_despawned method that can be used to check if the referred entity has been despawned.

@alice-i-cecile alice-i-cecile added this to the 0.16 milestone Nov 3, 2024
@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-ECS Entities, components, systems, and events P-High This is particularly urgent, and deserves immediate attention M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide labels Nov 3, 2024
@alice-i-cecile
Copy link
Member

alice-i-cecile commented Nov 3, 2024

Yes please, I'd like a small Migration Guide note here.

@alice-i-cecile alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed P-High This is particularly urgent, and deserves immediate attention labels Nov 3, 2024
Copy link
Contributor

@maniwani maniwani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is pretty much what I was thinking. I assume branch prediction should hide any overhead from this since the condition should basically never be true.

I think hitting this case should issue a warning. And we might want to have a #[track_caller] on the affected EntityWorldMut methods so that anyone who sees the warning knows where to start looking.

Something like...

#[inline(never)]
#[cold]
fn on_entity_invalid(entity: Entity, caller: &'static Location) {
    warn!("error[B<insert-error-number-here>]: {caller} tried to modify the entity {:?}, but it no longer exists. See: https://bevyengine.org/learn/errors/b<insert-error-number-here>", entity);
}

@maniwani
Copy link
Contributor

maniwani commented Nov 3, 2024

Ah, wait. Since EntityWorldMut can be converted into the other kinds of entity references, we may need to add the invalid location check to them as well (or make the conversion fallible).

Some of the UnsafeEntityCell methods those references use probably assume its location is valid.

@nakedible
Copy link
Contributor Author

warn or warn_once?

@Maximetinu
Copy link
Contributor

Maximetinu commented Nov 5, 2024

Hello! 👋

I lack context about the changes in this PR, but by petition of @alice-i-cecile , I checked out this branch to see if I can still reproduce the crash of flush() needs to be called before this operation is legal that I was hitting in my code

The answer is that no, I cannot.

The thing is, I can't repro it in bevy main neither, so this bug must have been fixed at some point between Bevy 0.14.2 (the version where I encountered it) and current Bevy main atow

Here, for reference:


this is my minimal reproduceable example, on top of Bevy 0.14.2:

https://github.com/Maximetinu/bevy/commits/minimal-reproduceable-bug-on-remove-hook/

see changes in latest commit

running that modified example does reproduce the bug

notice that the cause is just spawning an empty entity during a on_remove hook


This is my minimal reproduceable example cherry-picked to this branch:

https://github.com/Maximetinu/bevy/commits/world-entity-mut-flushes-checking-if-panic-keeps-happening/

see changes in latest commit, they are the same as in the branch above

running that modified example does not reproduce the bug

so, this branch works 🥳


However, bevy main also works:

https://github.com/Maximetinu/bevy/commits/reproducing-on-remove-hook-panic-on-bevy-main/

which means that, well, this branch does not fix per se this bug, because it is already fixed on Bevy main

but at least it means that it's not re-introducing it neither 🙂


Maybe my minimal reproduceable example can help to write a test covering from this specific scenario

I hope this helps!

@alice-i-cecile
Copy link
Member

I'd love to see that turned into a test, either in this PR or separately :)

@Maximetinu
Copy link
Contributor

I'd love to see that turned into a test, either in this PR or separately :)

I have made a test for this, that fails at 0.14.2 but passes in current bevy main, so it should cover from this regression

#[test]
fn test_spawn_entities_during_on_remove_hook() {
    #[derive(Component)]
    struct MyComponent;

    App::new()
        .add_plugins(MinimalPlugins)
        .add_systems(Startup, |world: &mut World| {
            world.register_component_hooks::<MyComponent>().on_remove(
                |mut world, _entity, _component_id| {
                    world.commands().spawn_empty();
                },
            );
        })
        .add_systems(
            Update,
            |mut commands: Commands,
                my_component_q: Query<Entity, With<MyComponent>>,
                mut exit_event: EventWriter<AppExit>,
                mut after_first_frame: Local<bool>| {
                for entity in &my_component_q {
                    commands.entity(entity).despawn();
                }

                commands.spawn(MyComponent);

                if *after_first_frame {
                    exit_event.send(AppExit::Success);
                }

                *after_first_frame = true;
            },
        )
        .run();
}

It's a bit of an unusual test (I'm not even sure if MinimalPlugins can be used in a test like this). It's more an integration test rather than a unit test. I don't know if there is some place inside Bevy repo where more of these tests exist. If so, I can open a PR to add it.

@alice-i-cecile
Copy link
Member

Let's spin this out into a new issue :)

@nakedible
Copy link
Contributor Author

@maniwani Updated design (just quick notes for now):

  • Handle also UnsafeEntityCell
  • Instead of setting location entirely to invalid, instead set both archetype_id and table_id to "empty", but keep archetype_row and table_row as invalid. That allows us to behave more like an empty entity for most APIs.

@nakedible
Copy link
Contributor Author

nakedible commented Nov 17, 2024

I changed this pull to instead panic rather than warn and try to behave if nothing is wrong. The reason for this is that if we only try to warn, there's a quite wide ranging knock-off effect into UnsafeEntityCell and a lot of other places which store EntityLocation. If we instead panic, the effects (and the extra runtime check) are limited to just EntityWorldMut. Also, Bevy already opts to panicing if using commands to modify an entity that has been despawned (https://bevyengine.org/learn/errors/b0003/) so this is similar behaviour. There should be nobody depending on this behaviour currently as calling any of the methods after the entity has been despawned by hooks would have already been unsound in the past – so if it happens, then the panics will reveal places where there has been already been a bug.

@nakedible
Copy link
Contributor Author

@alice-i-cecile @maniwani I'd love to get feedback if the new panicing approach is good or bad. Also, is it reasonable to add a panics section to all the methods which might panic for this. It should truly be a corner case when this happens, so I'm not sure if it's sensible to spam it everywhere (and it cannot be put into some of the non-fallible From conversions).

@nakedible
Copy link
Contributor Author

Implementation of panic is still placeholderish, just to note.

@alice-i-cecile
Copy link
Member

Also, Bevy already opts to panicing if using commands to modify an entity that has been despawned

For what it's worth, this behavior is widely hated :)

I think I'm broadly on board with panicking here though, especially as a first pass. This really is an edge case, and I think that trying to figure out better ways to prevent this bad state from ever arising is more useful than bubbling the error up.

@nakedible nakedible marked this pull request as ready for review November 24, 2024 22:40
@nakedible nakedible force-pushed the world-entity-mut-flushes branch 2 times, most recently from b90ed0b to 4668d66 Compare November 24, 2024 23:38
@nakedible
Copy link
Contributor Author

If #16499 gets merged before this one, one of the test cases will break, as it should when the ordering of hooks and observers is changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Bug An unexpected or incorrect behavior M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged
Projects
None yet
4 participants