Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory size of SparseArray and integer-based ECS IDs #3678

Closed
wants to merge 71 commits into from

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Jan 15, 2022

Objective

Fixes #1558. Fixes #4820. SparseArray is found everywhere in ECS storage, and it's currently backed by a Vec of Option<usize>, where 7 of it's 16 bytes goes unused. This can be quite wasteful given the potential size of this Vec.

Solution

Use nonmax types to utilize rustc's Option<NonZero*> optimizations where possible without impacting existing implementations surrounding the changed IDs. Further reduce this by using NonMaxU32 for another 50% reduction in memory usage on 64-bit platforms. Most of these types shouldn't have more than u32::MAX in any given world.

Pros

  • Cuts the size of Option<ID> down to 4 bytes. Saving 75% memory on SparseArray and other areas where these IDs are optionally available.
  • Reducing the memory usage may improve cache utilization for types using these values.

Cons

  • Adds one extra XOR operation on every conversion to an index. This would be a +1/-1 on every conversion to an index if NonZero* types are used.
  • ID creation is faillable, and may lead to panics. This impact is reduced either by using new_unchecked and the fact that the ID would need to overflow the underlying type (typically u32).

Additional Impact

This changes the underlying size and binary representation of many of these types. Any system dependent on the Hash implementation of these types may be impacted.

Open Questions

Is it possible to apply the same optimization to Entity? EDIT: This is done in #3029.

@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Jan 15, 2022
@james7132 james7132 added C-Performance A change motivated by improving speed, memory usage or compile times A-ECS Entities, components, systems, and events and removed S-Needs-Triage This issue needs to be labelled labels Jan 15, 2022
@DJMcNab
Copy link
Member

DJMcNab commented Jan 15, 2022

The version for Entity would be #3029, I think

The question which comes to mind is do we want to do an operation to get the id, or is it better just to 'ignore' the null item

In theory the pointer stored could be actually one item before the start of the allocated area; then accessing an index would just be normal indexing, but the 0 index would be invalid. My first impression of that is disgust; but on the other hand it is quite a neat solution :P.

@james7132 james7132 marked this pull request as ready for review January 15, 2022 10:59
@james7132
Copy link
Member Author

james7132 commented Jan 16, 2022

In theory the pointer stored could be actually one item before the start of the allocated area; then accessing an index would just be normal indexing, but the 0 index would be invalid. My first impression of that is disgust; but on the other hand it is quite a neat solution :P.

This isn't too uncommon in other projects. IIRC, the .NET VM's object model relies on a pointer with non-zero base offset for their object model to optimize vtable access.

However, if we do this, I'd say we should make SparseArray private. That kind of implementation detail shouldn't be in a public interface.

@bjorn3
Copy link
Contributor

bjorn3 commented Jan 16, 2022

Rc also uses a pointer to the wrapped data rather than to the reference counts which are before the wrapped data.

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is just a review from the changes in the latest commit)

crates/bevy_ecs/src/archetype.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/archetype.rs Outdated Show resolved Hide resolved
@james7132
Copy link
Member Author

james7132 commented Jan 30, 2022

This seems to have sizable amount of overlap with #2104. Considering either closing this or having it supercede that implementation. The perf implications shown in that PR are not encouraging though. Nevermind, misread the benchmarks backwards.

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Feb 4, 2022

I prefer this PR to #2104: the macro approach is cleaner.

james7132 and others added 25 commits January 6, 2023 21:10
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
…#7106)

# Objective

`SystemParam` `Local`s documentation currently leaves out information that should be documented.
- What happens when multiple `SystemParam`s within the same system have the same `Local` type.
- What lifetime parameter is expected by `Local`.
 
## Solution

- Added sentences to documentation to communicate this information.
- Renamed `Local` lifetimes in code to `'s` where they previously were not. Users can get complicated incorrect suggested fixes if they pass the wrong lifetime. Some instance of the code had `'w` indicating the expected lifetime might not have been known to those that wrote the code either.

Co-authored-by: iiYese <83026177+iiYese@users.noreply.github.com>
# Objective
Doc comment mentions turbo which is a sensitivity that doesn't exist.

## Solution

Change the comment to "Extreme" which does exist
(github made me type out a message for the commit which looked like it was for the pr, sorry)

# Objective

- Add a way to get all of the input devices of an `Axis`, primarily useful for looping through them

## Solution

- Adds `Axis<T>::devices()` which returns a `FixedSizeIterator<Item = &T>`
- Adds a (probably unneeded) `test_axis_devices` test because tests are cool.

---

## Changelog

- Added `Axis<T>::devices()` method

## Migration Guide

Not a breaking change.
# Objective

- The doctest for `Mut::map_unchanged` uses a fake function `set_if_not_equal` to demonstrate usage.
- Now that bevyengine#6853 has been merged, we can use `Mut::set_if_neq` directly instead of mocking it.
…evyengine#6919)

Spiritual successor to bevyengine#5205.
Actual successor to bevyengine#6865.

# Objective

Currently, system params are defined using three traits: `SystemParam`, `ReadOnlySystemParam`, `SystemParamState`. The behavior for each param is specified by the `SystemParamState` trait, while `SystemParam` simply defers to the state.

Splitting the traits in this way makes it easier to implement within macros, but it increases the cognitive load. Worst of all, this approach requires each `MySystemParam` to have a public `MySystemParamState` type associated with it.

## Solution

* Merge the trait `SystemParamState` into `SystemParam`.
* Remove all trivial `SystemParam` state types. 
  * `OptionNonSendMutState<T>`: you will not be missed.

---

- [x] Fix/resolve the remaining test failure.

## Changelog

* Removed the trait `SystemParamState`, merging its functionality into `SystemParam`.

## Migration Guide

**Note**: this should replace the migration guide for bevyengine#6865.
This is relative to Bevy 0.9, not main.

The traits `SystemParamState` and `SystemParamFetch` have been removed, and their functionality has been transferred to `SystemParam`.


```rust
// Before (0.9)
impl SystemParam for MyParam<'_, '_> {
    type State = MyParamState;
}
unsafe impl SystemParamState for MyParamState {
    fn init(world: &mut World, system_meta: &mut SystemMeta) -> Self { ... }
}
unsafe impl<'w, 's> SystemParamFetch<'w, 's> for MyParamState {
    type Item = MyParam<'w, 's>;
    fn get_param(&mut self, ...) -> Self::Item;
}
unsafe impl ReadOnlySystemParamFetch for MyParamState { }

// After (0.10)
unsafe impl SystemParam for MyParam<'_, '_> {
    type State = MyParamState;
    type Item<'w, 's> = MyParam<'w, 's>;
    fn init_state(world: &mut World, system_meta: &mut SystemMeta) -> Self::State { ... }
    fn get_param<'w, 's>(state: &mut Self::State, ...) -> Self::Item<'w, 's>;
}
unsafe impl ReadOnlySystemParam for MyParam<'_, '_> { }
```

The trait `ReadOnlySystemParamFetch` has been replaced with `ReadOnlySystemParam`.

```rust
// Before
unsafe impl ReadOnlySystemParamFetch for MyParamState {}

// After
unsafe impl ReadOnlySystemParam for MyParam<'_, '_> {}
```
…ems (bevyengine#7127)

# Objective

- Avoid slower than necessary first frame after spawning many entities due to them not having `Aabb`s and so being marked visible
  - Avoids unnecessarily large system and VRAM allocations as a consequence

## Solution

- I noticed when debugging the `many_cubes` stress test in Xcode that the `MeshUniform` binding was much larger than it needed to be. I realised that this was because initially, all mesh entities are marked as being visible because they don't have `Aabb`s because `calculate_bounds` is being run in `PostUpdate` and there are no system commands applications before executing the visibility check systems that need the `Aabb`s. The solution then is to run the `calculate_bounds` system just before the previous system commands are applied which is at the end of the `Update` stage.
# Objective

- Storage buffers are useful and not currently supported by the `AsBindGroup` derive which means you need to expand the macro if you need a storage buffer

## Solution

- Add a new `#[storage]` attribute to the derive `AsBindGroup` macro.
	- Support and optional `read_only` parameter that defaults to false when not present.
	- Support visibility parameters like the texture and sampler attributes.

---

## Changelog

- Add a new `#[storage(index)]` attribute to the derive `AsBindGroup` macro.


Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
…ne#6388)

# Objective
Fix bevyengine#6301 

## Solution
Add new features in `bevy_audio` to use `symphonia` sound format from `rodio` 
Also add in `bevy`
# Objective

- Fixes bevyengine#6338

This PR allows for smooth transitions between different animations.

## Solution

- This PR uses very simple linear blending of animations.
- When starting a new animation, you can give it a duration, and throughout that duration, the previous and the new animation are being linearly blended, until only the new animation is running.
- I'm aware of bevyengine/rfcs#49 and bevyengine/rfcs#51, which are more complete solutions to this problem, but they seem still far from being implemented. Until they're ready, this PR allows for the most basic use case of blending, i.e. smoothly transitioning between different animations.

## Migration Guide

- no bc breaking changes
# Objective 

- Remove redundant gamepad events
- Simplify consuming gamepad events.
- Refactor: Separate handling of gamepad events into multiple systems.

## Solution

- Removed `GamepadEventRaw`, and `GamepadEventType`.
- Added bespoke `GamepadConnectionEvent`, `GamepadAxisChangedEvent`, and `GamepadButtonChangedEvent`. 
- Refactored `gamepad_event_system`.
- Added `gamepad_button_event_system`, `gamepad_axis_event_system`, and `gamepad_connection_system`, which update the `Input` and `Axis` resources using their corresponding event type.

Gamepad events are now handled in their own systems and have their own types. 

This allows for querying for gamepad events without having to match on `GamepadEventType` and makes creating handlers for specific gamepad event types, like a `GamepadConnectionEvent` or `GamepadButtonChangedEvent` possible.

We remove `GamepadEventRaw` by filtering the gamepad events, using `GamepadSettings`, _at the source_, in `bevy_gilrs`. This way we can create `GamepadEvent`s directly and avoid creating `GamepadEventRaw` which do not pass the user defined filters. 

We expose ordered `GamepadEvent`s and we can respond to individual gamepad event types.

## Migration Guide

- Replace `GamepadEvent` and `GamepadEventRaw` types with their specific gamepad event type.
# Objective

- This pulls out some of the changes to Plugin setup and sub apps from bevyengine#6503 to make that PR easier to review.
- Separate the extract stage from running the sub app's schedule to allow for them to be run on separate threads in the future
- Fixes bevyengine#6990

## Solution

- add a run method to `SubApp` that runs the schedule
- change the name of `sub_app_runner` to extract to make it clear that this function is only for extracting data between the main app and the sub app
- remove the extract stage from the sub app schedule so it can be run separately. This is done by adding a `setup` method to the `Plugin` trait that runs after all plugin build methods run. This is required to allow the extract stage to be removed from the schedule after all the plugins have added their systems to the stage. We will also need the setup method for pipelined rendering to setup the render thread. See https://github.com/bevyengine/bevy/blob/e3267965e15f14be18eec942dcaf16807144eb05/crates/bevy_render/src/pipelined_rendering.rs#L57-L98

## Changelog

- Separate SubApp Extract stage from running the sub app schedule.

## Migration Guide

### SubApp `runner` has conceptually been changed to an `extract` function.

The `runner` no longer is in charge of running the sub app schedule. It's only concern is now moving data between the main world and the sub app. The `sub_app.app.schedule` is now run for you after the provided function is called.

```rust
// before
fn main() {
    let sub_app = App::empty();
    sub_app.add_stage(MyStage, SystemStage::parallel());
    
    App::new().add_sub_app(MySubApp, sub_app, move |main_world, sub_app| {
        extract(app_world, render_app);
        render_app.app.schedule.run();
    });
}

// after
fn main() {
        let sub_app = App::empty();
    sub_app.add_stage(MyStage, SystemStage::parallel());
    
    App::new().add_sub_app(MySubApp, sub_app, move |main_world, sub_app| {
        extract(app_world, render_app);
        // schedule is automatically called for you after extract is run
    });
}
```
# Objective
Speed up the render phase for rendering.

## Solution
 - Follow up bevyengine#6988 and make the internals of atomic IDs `NonZeroU32`. This niches the `Option`s of the IDs in draw state, which reduces the size and branching behavior when evaluating for equality.
 - Require `&RenderDevice` to get the device's `Limits` when initializing a `TrackedRenderPass` to preallocate the bind groups and vertex buffer state in `DrawState`, this removes the branch on needing to resize those `Vec`s.

## Performance
This produces a similar speed up akin to that of bevyengine#6885. This shows an approximate 6% speed up in `main_opaque_pass_3d` on `many_foxes` (408.79 us -> 388us). This should be orthogonal to the gains seen there.

![image](https://user-images.githubusercontent.com/3137680/209906239-e430f026-63c2-4b95-957e-a2045b810d79.png)

---

## Changelog
Added: `RenderContext::begin_tracked_render_pass`.
Changed: `TrackedRenderPass` now requires a `&RenderDevice` on construction.
Removed: `bevy_render::render_phase::DrawState`. It was not usable in any form outside of `bevy_render`.

## Migration Guide
TODO
# Objective

- Fixes bevyengine#7061

## Solution

- Add and implement `insert` and `remove` methods for `List`.

---

## Changelog

- Added `insert` and `remove` methods to `List`.
- Changed the `push` and `pop` methods on `List` to have default implementations.

## Migration Guide

- Manual implementors of `List` need to implement the new methods `insert` and `remove` and 
consider whether to use the new default implementation of `push` and `pop`.

Co-authored-by: radiish <thesethskigamer@gmail.com>
# Objective

Fixes bevyengine#3310. Fixes bevyengine#6282. Fixes bevyengine#6278. Fixes bevyengine#3666.

## Solution
Split out `!Send` resources into `NonSendResources`. Add a `origin_thread_id` to all `!Send` Resources, check it on dropping `NonSendResourceData`, if there's a mismatch, panic. Moved all of the checks that `MainThreadValidator` would do into `NonSendResources` instead.

All `!Send` resources now individually track which thread they were inserted from. This is validated against for every access, mutation, and drop that could be done against the value. 

A regression test using an altered version of the example from bevyengine#3310 has been added.

This is a stopgap solution for the current status quo. A full solution may involve fully removing `!Send` resources/components from `World`, which will likely require a much more thorough design on how to handle the existing in-engine and ecosystem use cases.

This PR also introduces another breaking change:

```rust
    use bevy_ecs::prelude::*;

    #[derive(Resource)]
    struct Resource(u32);

    fn main() {
        let mut world = World::new();
        world.insert_resource(Resource(1));
        world.insert_non_send_resource(Resource(2));
        let res = world.get_resource_mut::<Resource>().unwrap();
        assert_eq!(res.0, 2);
    }
```

This code will run correctly on 0.9.1 but not with this PR, since NonSend resources and normal resources have become actual distinct concepts storage wise.

## Changelog
Changed: Fix soundness bug with `World: Send`. Dropping a `World` that contains a `!Send` resource on the wrong thread will now panic.

## Migration Guide
Normal resources and `NonSend` resources no longer share the same backing storage. If `R: Resource`, then `NonSend<R>` and `Res<R>` will return different instances from each other. If you are using both `Res<T>` and `NonSend<T>` (or their mutable variants), to fetch the same resources, it's strongly advised to use `Res<T>`.
…7040)

# Objective

The type `Local<T>` unnecessarily has the bound `T: Sync` when the local is used in an exclusive system.

## Solution

Lift the bound.

---

## Changelog

Removed the bound `T: Sync` from `Local<T>` when used as an `ExclusiveSystemParam`.
# Objective

- Fixes bevyengine#6777, fixes bevyengine#2998, replaces bevyengine#5518
- Help avoid confusing error message when using an older version of Rust

## Solution

- Add the `rust-version` field to `Cargo.toml`
- Add a CI job checking the MSRV
- Add the job to bors
# Objective

- Fixes bevyengine#7124

## Solution

- Add Hash Derive on `WorldId`
- Add `SparseSetIndex` impl
# Objective

- Fix the name of function parameter name in docs

## Solution

- Change `f` to `sub_app_runner`

---

It confused me a bit when I was reading the docs in the autocomplete hint.
Hesitated about filing a PR since it's just a one single word change in the comment.
Is this the right process to change these docs?
# Objective

- Fixes bevyengine#4057 
- Do not multiply position by scale factor
…ne#7094)

# Objective

This a follow-up to bevyengine#6894, see bevyengine#6894 (comment)

The goal is to avoid cloning any string when getting a `&TypeRegistration` corresponding to a string which is being deserialized. As a bonus code duplication is also reduced.

## Solution

The manual deserialization of a string and lookup into the type registry has been moved into a separate `TypeRegistrationDeserializer` type, which implements `DeserializeSeed` with a `Visitor` that accepts any string with `visit_str`, even ones that may not live longer than that function call.
`BorrowedStr` has been removed since it's no longer used.

---

## Changelog

- The type `TypeRegistrationDeserializer` has been added, which simplifies getting a `&TypeRegistration` while deserializing a string.
# Objective

- In some cases, you need a `Mut<T>` pointer, but you only have a mutable reference to one. There is no easy way of converting `&'a mut Mut<'_, T>` -> `Mut<'a, T>` outside of the engine.

### Example (Before)

```rust
fn do_with_mut<T>(val: Mut<T>) { ... }

for x: Mut<T> in &mut query {
    // The function expects a `Mut<T>`, so `x` gets moved here.
    do_with_mut(x);
    // Error: use of moved value.
    do_a_thing(&x);
}
```

## Solution

- Add the function `reborrow`, which performs the mapping. This is analogous to `PtrMut::reborrow`.

### Example (After)

```rust
fn do_with_mut<T>(val: Mut<T>) { ... }

for x: Mut<T> in &mut query {
    // We reborrow `x`, so the original does not get moved.
    do_with_mut(x.reborrow());
    // Works fine.
    do_a_thing(&x);
}
```

---

## Changelog

- Added the method `reborrow` to `Mut`, `ResMut`, `NonSendMut`, and `MutUntyped`.
@james7132
Copy link
Member Author

Ugh yet another bad merge off of main. I think I'll chunk this up into smaller bits and remake this off of main.

@danchia made an interesting observation about this PR: Option<EntityLocation> is 16 bytes, which is returnable via a register instead of the stack. We may be able to get most of the benefits from this PR just by niching ArchetypeId. The other changes can help reduce memory usage, but may not have that much CPU time impact.

@james7132 james7132 closed this Jan 21, 2023
alradish pushed a commit to alradish/bevy that referenced this pull request Jan 22, 2023
# Objective
`Query::get` and other random access methods require looking up `EntityLocation` for every provided entity, then always looking up the `Archetype` to get the table ID and table row. This requires 4 total random fetches from memory: the `Entities` lookup, the `Archetype` lookup, the table row lookup, and the final fetch from table/sparse sets. If `EntityLocation` contains the table ID and table row, only the `Entities` lookup and the final storage fetch are required.

## Solution
Add `TableId` and table row to `EntityLocation`. Ensure it's updated whenever entities are moved around. To ensure `EntityMeta` does not grow bigger, both `TableId` and `ArchetypeId` have been shrunk to u32, and the archetype index and table row are stored as u32s instead of as usizes. This should shrink `EntityMeta` by 4 bytes, from 24 to 20 bytes, as there is no padding anymore due to the change in alignment.

This idea was partially concocted by @BoxyUwU. 

## Performance
This should restore the `Query::get` "gains" lost to bevyengine#6625 that were introduced in bevyengine#4800 without being unsound, and also incorporates some of the memory usage reductions seen in bevyengine#3678.

This also removes the same lookups during add/remove/spawn commands, so there may be a bit of a speedup in commands and `Entity{Ref,Mut}`.

---

## Changelog
Added: `EntityLocation::table_id`
Added: `EntityLocation::table_row`.
Changed: `World`s can now only hold a maximum of 2<sup>32</sup>- 1 archetypes.
Changed: `World`s can now only hold a maximum of 2<sup>32</sup> - 1 tables.

## Migration Guide

A `World` can only hold a maximum of 2<sup>32</sup> - 1 archetypes and tables now. If your use case requires more than this, please file an issue explaining your use case.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective
`Query::get` and other random access methods require looking up `EntityLocation` for every provided entity, then always looking up the `Archetype` to get the table ID and table row. This requires 4 total random fetches from memory: the `Entities` lookup, the `Archetype` lookup, the table row lookup, and the final fetch from table/sparse sets. If `EntityLocation` contains the table ID and table row, only the `Entities` lookup and the final storage fetch are required.

## Solution
Add `TableId` and table row to `EntityLocation`. Ensure it's updated whenever entities are moved around. To ensure `EntityMeta` does not grow bigger, both `TableId` and `ArchetypeId` have been shrunk to u32, and the archetype index and table row are stored as u32s instead of as usizes. This should shrink `EntityMeta` by 4 bytes, from 24 to 20 bytes, as there is no padding anymore due to the change in alignment.

This idea was partially concocted by @BoxyUwU. 

## Performance
This should restore the `Query::get` "gains" lost to bevyengine#6625 that were introduced in bevyengine#4800 without being unsound, and also incorporates some of the memory usage reductions seen in bevyengine#3678.

This also removes the same lookups during add/remove/spawn commands, so there may be a bit of a speedup in commands and `Entity{Ref,Mut}`.

---

## Changelog
Added: `EntityLocation::table_id`
Added: `EntityLocation::table_row`.
Changed: `World`s can now only hold a maximum of 2<sup>32</sup>- 1 archetypes.
Changed: `World`s can now only hold a maximum of 2<sup>32</sup> - 1 tables.

## Migration Guide

A `World` can only hold a maximum of 2<sup>32</sup> - 1 archetypes and tables now. If your use case requires more than this, please file an issue explaining your use case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider shrinking ECS IDs Replace SparseArray Option<T> with T::MAX to cut down on branching