-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize memory size of SparseArray and integer-based ECS IDs #3678
Conversation
The version for The question which comes to mind is do we want to do an operation to get the id, or is it better just to 'ignore' the In theory the pointer stored could be actually one item before the start of the allocated area; then accessing an index would just be normal indexing, but the 0 index would be invalid. My first impression of that is disgust; but on the other hand it is quite a neat solution :P. |
This isn't too uncommon in other projects. IIRC, the .NET VM's object model relies on a pointer with non-zero base offset for their object model to optimize vtable access. However, if we do this, I'd say we should make SparseArray private. That kind of implementation detail shouldn't be in a public interface. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is just a review from the changes in the latest commit)
This seems to have sizable amount of overlap with #2104. Considering either closing this or having it supercede that implementation. |
I prefer this PR to #2104: the macro approach is cleaner. |
Co-authored-by: Daniel McNab <36049421+DJMcNab@users.noreply.github.com>
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
…#7106) # Objective `SystemParam` `Local`s documentation currently leaves out information that should be documented. - What happens when multiple `SystemParam`s within the same system have the same `Local` type. - What lifetime parameter is expected by `Local`. ## Solution - Added sentences to documentation to communicate this information. - Renamed `Local` lifetimes in code to `'s` where they previously were not. Users can get complicated incorrect suggested fixes if they pass the wrong lifetime. Some instance of the code had `'w` indicating the expected lifetime might not have been known to those that wrote the code either. Co-authored-by: iiYese <83026177+iiYese@users.noreply.github.com>
# Objective Doc comment mentions turbo which is a sensitivity that doesn't exist. ## Solution Change the comment to "Extreme" which does exist
(github made me type out a message for the commit which looked like it was for the pr, sorry) # Objective - Add a way to get all of the input devices of an `Axis`, primarily useful for looping through them ## Solution - Adds `Axis<T>::devices()` which returns a `FixedSizeIterator<Item = &T>` - Adds a (probably unneeded) `test_axis_devices` test because tests are cool. --- ## Changelog - Added `Axis<T>::devices()` method ## Migration Guide Not a breaking change.
# Objective - The doctest for `Mut::map_unchanged` uses a fake function `set_if_not_equal` to demonstrate usage. - Now that bevyengine#6853 has been merged, we can use `Mut::set_if_neq` directly instead of mocking it.
…evyengine#6919) Spiritual successor to bevyengine#5205. Actual successor to bevyengine#6865. # Objective Currently, system params are defined using three traits: `SystemParam`, `ReadOnlySystemParam`, `SystemParamState`. The behavior for each param is specified by the `SystemParamState` trait, while `SystemParam` simply defers to the state. Splitting the traits in this way makes it easier to implement within macros, but it increases the cognitive load. Worst of all, this approach requires each `MySystemParam` to have a public `MySystemParamState` type associated with it. ## Solution * Merge the trait `SystemParamState` into `SystemParam`. * Remove all trivial `SystemParam` state types. * `OptionNonSendMutState<T>`: you will not be missed. --- - [x] Fix/resolve the remaining test failure. ## Changelog * Removed the trait `SystemParamState`, merging its functionality into `SystemParam`. ## Migration Guide **Note**: this should replace the migration guide for bevyengine#6865. This is relative to Bevy 0.9, not main. The traits `SystemParamState` and `SystemParamFetch` have been removed, and their functionality has been transferred to `SystemParam`. ```rust // Before (0.9) impl SystemParam for MyParam<'_, '_> { type State = MyParamState; } unsafe impl SystemParamState for MyParamState { fn init(world: &mut World, system_meta: &mut SystemMeta) -> Self { ... } } unsafe impl<'w, 's> SystemParamFetch<'w, 's> for MyParamState { type Item = MyParam<'w, 's>; fn get_param(&mut self, ...) -> Self::Item; } unsafe impl ReadOnlySystemParamFetch for MyParamState { } // After (0.10) unsafe impl SystemParam for MyParam<'_, '_> { type State = MyParamState; type Item<'w, 's> = MyParam<'w, 's>; fn init_state(world: &mut World, system_meta: &mut SystemMeta) -> Self::State { ... } fn get_param<'w, 's>(state: &mut Self::State, ...) -> Self::Item<'w, 's>; } unsafe impl ReadOnlySystemParam for MyParam<'_, '_> { } ``` The trait `ReadOnlySystemParamFetch` has been replaced with `ReadOnlySystemParam`. ```rust // Before unsafe impl ReadOnlySystemParamFetch for MyParamState {} // After unsafe impl ReadOnlySystemParam for MyParam<'_, '_> {} ```
…ems (bevyengine#7127) # Objective - Avoid slower than necessary first frame after spawning many entities due to them not having `Aabb`s and so being marked visible - Avoids unnecessarily large system and VRAM allocations as a consequence ## Solution - I noticed when debugging the `many_cubes` stress test in Xcode that the `MeshUniform` binding was much larger than it needed to be. I realised that this was because initially, all mesh entities are marked as being visible because they don't have `Aabb`s because `calculate_bounds` is being run in `PostUpdate` and there are no system commands applications before executing the visibility check systems that need the `Aabb`s. The solution then is to run the `calculate_bounds` system just before the previous system commands are applied which is at the end of the `Update` stage.
# Objective - Storage buffers are useful and not currently supported by the `AsBindGroup` derive which means you need to expand the macro if you need a storage buffer ## Solution - Add a new `#[storage]` attribute to the derive `AsBindGroup` macro. - Support and optional `read_only` parameter that defaults to false when not present. - Support visibility parameters like the texture and sampler attributes. --- ## Changelog - Add a new `#[storage(index)]` attribute to the derive `AsBindGroup` macro. Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
…ne#6388) # Objective Fix bevyengine#6301 ## Solution Add new features in `bevy_audio` to use `symphonia` sound format from `rodio` Also add in `bevy`
# Objective - Fixes bevyengine#6338 This PR allows for smooth transitions between different animations. ## Solution - This PR uses very simple linear blending of animations. - When starting a new animation, you can give it a duration, and throughout that duration, the previous and the new animation are being linearly blended, until only the new animation is running. - I'm aware of bevyengine/rfcs#49 and bevyengine/rfcs#51, which are more complete solutions to this problem, but they seem still far from being implemented. Until they're ready, this PR allows for the most basic use case of blending, i.e. smoothly transitioning between different animations. ## Migration Guide - no bc breaking changes
# Objective - Remove redundant gamepad events - Simplify consuming gamepad events. - Refactor: Separate handling of gamepad events into multiple systems. ## Solution - Removed `GamepadEventRaw`, and `GamepadEventType`. - Added bespoke `GamepadConnectionEvent`, `GamepadAxisChangedEvent`, and `GamepadButtonChangedEvent`. - Refactored `gamepad_event_system`. - Added `gamepad_button_event_system`, `gamepad_axis_event_system`, and `gamepad_connection_system`, which update the `Input` and `Axis` resources using their corresponding event type. Gamepad events are now handled in their own systems and have their own types. This allows for querying for gamepad events without having to match on `GamepadEventType` and makes creating handlers for specific gamepad event types, like a `GamepadConnectionEvent` or `GamepadButtonChangedEvent` possible. We remove `GamepadEventRaw` by filtering the gamepad events, using `GamepadSettings`, _at the source_, in `bevy_gilrs`. This way we can create `GamepadEvent`s directly and avoid creating `GamepadEventRaw` which do not pass the user defined filters. We expose ordered `GamepadEvent`s and we can respond to individual gamepad event types. ## Migration Guide - Replace `GamepadEvent` and `GamepadEventRaw` types with their specific gamepad event type.
# Objective - This pulls out some of the changes to Plugin setup and sub apps from bevyengine#6503 to make that PR easier to review. - Separate the extract stage from running the sub app's schedule to allow for them to be run on separate threads in the future - Fixes bevyengine#6990 ## Solution - add a run method to `SubApp` that runs the schedule - change the name of `sub_app_runner` to extract to make it clear that this function is only for extracting data between the main app and the sub app - remove the extract stage from the sub app schedule so it can be run separately. This is done by adding a `setup` method to the `Plugin` trait that runs after all plugin build methods run. This is required to allow the extract stage to be removed from the schedule after all the plugins have added their systems to the stage. We will also need the setup method for pipelined rendering to setup the render thread. See https://github.com/bevyengine/bevy/blob/e3267965e15f14be18eec942dcaf16807144eb05/crates/bevy_render/src/pipelined_rendering.rs#L57-L98 ## Changelog - Separate SubApp Extract stage from running the sub app schedule. ## Migration Guide ### SubApp `runner` has conceptually been changed to an `extract` function. The `runner` no longer is in charge of running the sub app schedule. It's only concern is now moving data between the main world and the sub app. The `sub_app.app.schedule` is now run for you after the provided function is called. ```rust // before fn main() { let sub_app = App::empty(); sub_app.add_stage(MyStage, SystemStage::parallel()); App::new().add_sub_app(MySubApp, sub_app, move |main_world, sub_app| { extract(app_world, render_app); render_app.app.schedule.run(); }); } // after fn main() { let sub_app = App::empty(); sub_app.add_stage(MyStage, SystemStage::parallel()); App::new().add_sub_app(MySubApp, sub_app, move |main_world, sub_app| { extract(app_world, render_app); // schedule is automatically called for you after extract is run }); } ```
# Objective Speed up the render phase for rendering. ## Solution - Follow up bevyengine#6988 and make the internals of atomic IDs `NonZeroU32`. This niches the `Option`s of the IDs in draw state, which reduces the size and branching behavior when evaluating for equality. - Require `&RenderDevice` to get the device's `Limits` when initializing a `TrackedRenderPass` to preallocate the bind groups and vertex buffer state in `DrawState`, this removes the branch on needing to resize those `Vec`s. ## Performance This produces a similar speed up akin to that of bevyengine#6885. This shows an approximate 6% speed up in `main_opaque_pass_3d` on `many_foxes` (408.79 us -> 388us). This should be orthogonal to the gains seen there. ![image](https://user-images.githubusercontent.com/3137680/209906239-e430f026-63c2-4b95-957e-a2045b810d79.png) --- ## Changelog Added: `RenderContext::begin_tracked_render_pass`. Changed: `TrackedRenderPass` now requires a `&RenderDevice` on construction. Removed: `bevy_render::render_phase::DrawState`. It was not usable in any form outside of `bevy_render`. ## Migration Guide TODO
# Objective - Fixes bevyengine#7061 ## Solution - Add and implement `insert` and `remove` methods for `List`. --- ## Changelog - Added `insert` and `remove` methods to `List`. - Changed the `push` and `pop` methods on `List` to have default implementations. ## Migration Guide - Manual implementors of `List` need to implement the new methods `insert` and `remove` and consider whether to use the new default implementation of `push` and `pop`. Co-authored-by: radiish <thesethskigamer@gmail.com>
# Objective Fixes bevyengine#3310. Fixes bevyengine#6282. Fixes bevyengine#6278. Fixes bevyengine#3666. ## Solution Split out `!Send` resources into `NonSendResources`. Add a `origin_thread_id` to all `!Send` Resources, check it on dropping `NonSendResourceData`, if there's a mismatch, panic. Moved all of the checks that `MainThreadValidator` would do into `NonSendResources` instead. All `!Send` resources now individually track which thread they were inserted from. This is validated against for every access, mutation, and drop that could be done against the value. A regression test using an altered version of the example from bevyengine#3310 has been added. This is a stopgap solution for the current status quo. A full solution may involve fully removing `!Send` resources/components from `World`, which will likely require a much more thorough design on how to handle the existing in-engine and ecosystem use cases. This PR also introduces another breaking change: ```rust use bevy_ecs::prelude::*; #[derive(Resource)] struct Resource(u32); fn main() { let mut world = World::new(); world.insert_resource(Resource(1)); world.insert_non_send_resource(Resource(2)); let res = world.get_resource_mut::<Resource>().unwrap(); assert_eq!(res.0, 2); } ``` This code will run correctly on 0.9.1 but not with this PR, since NonSend resources and normal resources have become actual distinct concepts storage wise. ## Changelog Changed: Fix soundness bug with `World: Send`. Dropping a `World` that contains a `!Send` resource on the wrong thread will now panic. ## Migration Guide Normal resources and `NonSend` resources no longer share the same backing storage. If `R: Resource`, then `NonSend<R>` and `Res<R>` will return different instances from each other. If you are using both `Res<T>` and `NonSend<T>` (or their mutable variants), to fetch the same resources, it's strongly advised to use `Res<T>`.
…7040) # Objective The type `Local<T>` unnecessarily has the bound `T: Sync` when the local is used in an exclusive system. ## Solution Lift the bound. --- ## Changelog Removed the bound `T: Sync` from `Local<T>` when used as an `ExclusiveSystemParam`.
# Objective - Fixes bevyengine#6777, fixes bevyengine#2998, replaces bevyengine#5518 - Help avoid confusing error message when using an older version of Rust ## Solution - Add the `rust-version` field to `Cargo.toml` - Add a CI job checking the MSRV - Add the job to bors
# Objective - Fixes bevyengine#7124 ## Solution - Add Hash Derive on `WorldId` - Add `SparseSetIndex` impl
# Objective - Fix the name of function parameter name in docs ## Solution - Change `f` to `sub_app_runner` --- It confused me a bit when I was reading the docs in the autocomplete hint. Hesitated about filing a PR since it's just a one single word change in the comment. Is this the right process to change these docs?
# Objective - Fixes bevyengine#4057 - Do not multiply position by scale factor
…ne#7094) # Objective This a follow-up to bevyengine#6894, see bevyengine#6894 (comment) The goal is to avoid cloning any string when getting a `&TypeRegistration` corresponding to a string which is being deserialized. As a bonus code duplication is also reduced. ## Solution The manual deserialization of a string and lookup into the type registry has been moved into a separate `TypeRegistrationDeserializer` type, which implements `DeserializeSeed` with a `Visitor` that accepts any string with `visit_str`, even ones that may not live longer than that function call. `BorrowedStr` has been removed since it's no longer used. --- ## Changelog - The type `TypeRegistrationDeserializer` has been added, which simplifies getting a `&TypeRegistration` while deserializing a string.
# Objective - In some cases, you need a `Mut<T>` pointer, but you only have a mutable reference to one. There is no easy way of converting `&'a mut Mut<'_, T>` -> `Mut<'a, T>` outside of the engine. ### Example (Before) ```rust fn do_with_mut<T>(val: Mut<T>) { ... } for x: Mut<T> in &mut query { // The function expects a `Mut<T>`, so `x` gets moved here. do_with_mut(x); // Error: use of moved value. do_a_thing(&x); } ``` ## Solution - Add the function `reborrow`, which performs the mapping. This is analogous to `PtrMut::reborrow`. ### Example (After) ```rust fn do_with_mut<T>(val: Mut<T>) { ... } for x: Mut<T> in &mut query { // We reborrow `x`, so the original does not get moved. do_with_mut(x.reborrow()); // Works fine. do_a_thing(&x); } ``` --- ## Changelog - Added the method `reborrow` to `Mut`, `ResMut`, `NonSendMut`, and `MutUntyped`.
Ugh yet another bad merge off of main. I think I'll chunk this up into smaller bits and remake this off of @danchia made an interesting observation about this PR: |
# Objective `Query::get` and other random access methods require looking up `EntityLocation` for every provided entity, then always looking up the `Archetype` to get the table ID and table row. This requires 4 total random fetches from memory: the `Entities` lookup, the `Archetype` lookup, the table row lookup, and the final fetch from table/sparse sets. If `EntityLocation` contains the table ID and table row, only the `Entities` lookup and the final storage fetch are required. ## Solution Add `TableId` and table row to `EntityLocation`. Ensure it's updated whenever entities are moved around. To ensure `EntityMeta` does not grow bigger, both `TableId` and `ArchetypeId` have been shrunk to u32, and the archetype index and table row are stored as u32s instead of as usizes. This should shrink `EntityMeta` by 4 bytes, from 24 to 20 bytes, as there is no padding anymore due to the change in alignment. This idea was partially concocted by @BoxyUwU. ## Performance This should restore the `Query::get` "gains" lost to bevyengine#6625 that were introduced in bevyengine#4800 without being unsound, and also incorporates some of the memory usage reductions seen in bevyengine#3678. This also removes the same lookups during add/remove/spawn commands, so there may be a bit of a speedup in commands and `Entity{Ref,Mut}`. --- ## Changelog Added: `EntityLocation::table_id` Added: `EntityLocation::table_row`. Changed: `World`s can now only hold a maximum of 2<sup>32</sup>- 1 archetypes. Changed: `World`s can now only hold a maximum of 2<sup>32</sup> - 1 tables. ## Migration Guide A `World` can only hold a maximum of 2<sup>32</sup> - 1 archetypes and tables now. If your use case requires more than this, please file an issue explaining your use case.
# Objective `Query::get` and other random access methods require looking up `EntityLocation` for every provided entity, then always looking up the `Archetype` to get the table ID and table row. This requires 4 total random fetches from memory: the `Entities` lookup, the `Archetype` lookup, the table row lookup, and the final fetch from table/sparse sets. If `EntityLocation` contains the table ID and table row, only the `Entities` lookup and the final storage fetch are required. ## Solution Add `TableId` and table row to `EntityLocation`. Ensure it's updated whenever entities are moved around. To ensure `EntityMeta` does not grow bigger, both `TableId` and `ArchetypeId` have been shrunk to u32, and the archetype index and table row are stored as u32s instead of as usizes. This should shrink `EntityMeta` by 4 bytes, from 24 to 20 bytes, as there is no padding anymore due to the change in alignment. This idea was partially concocted by @BoxyUwU. ## Performance This should restore the `Query::get` "gains" lost to bevyengine#6625 that were introduced in bevyengine#4800 without being unsound, and also incorporates some of the memory usage reductions seen in bevyengine#3678. This also removes the same lookups during add/remove/spawn commands, so there may be a bit of a speedup in commands and `Entity{Ref,Mut}`. --- ## Changelog Added: `EntityLocation::table_id` Added: `EntityLocation::table_row`. Changed: `World`s can now only hold a maximum of 2<sup>32</sup>- 1 archetypes. Changed: `World`s can now only hold a maximum of 2<sup>32</sup> - 1 tables. ## Migration Guide A `World` can only hold a maximum of 2<sup>32</sup> - 1 archetypes and tables now. If your use case requires more than this, please file an issue explaining your use case.
Objective
Fixes #1558. Fixes #4820.
SparseArray
is found everywhere in ECS storage, and it's currently backed by a Vec ofOption<usize>
, where 7 of it's 16 bytes goes unused. This can be quite wasteful given the potential size of this Vec.Solution
Use
nonmax
types to utilize rustc'sOption<NonZero*>
optimizations where possible without impacting existing implementations surrounding the changed IDs. Further reduce this by usingNonMaxU32
for another 50% reduction in memory usage on 64-bit platforms. Most of these types shouldn't have more thanu32::MAX
in any given world.Pros
Option<ID>
down to 4 bytes. Saving 75% memory onSparseArray
and other areas where these IDs are optionally available.Cons
NonZero*
types are used.new_unchecked
and the fact that the ID would need to overflow the underlying type (typicallyu32
).Additional Impact
This changes the underlying size and binary representation of many of these types. Any system dependent on the Hash implementation of these types may be impacted.
Open Questions
Is it possible to apply the same optimization toEDIT: This is done in #3029.Entity
?