Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Bevy ECS V2 #1525

Closed
wants to merge 24 commits into from
Closed

[Merged by Bors] - Bevy ECS V2 #1525

wants to merge 24 commits into from

Conversation

cart
Copy link
Member

@cart cart commented Feb 26, 2021

Bevy ECS V2

This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details:

  • Complete World rewrite
  • Multiple component storage types:
    • Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes)
    • Sparse Sets: fast add/remove, slower iteration
  • Stateful Queries (caches query results for faster iteration. fragmented iteration is fast now)
  • Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in [Slightly less WIP] Design SystemParams to hold state #1364)
  • Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work)
  • Archetypes are now "just metadata", component storage is separate
  • Archetype Graph (for faster archetype changes)
  • Component Metadata
    • Configure component storage type
    • Retrieve information about component size/type/name/layout/send-ness/etc
    • Components are uniquely identified by a densely packed ComponentId
    • TypeIds are now totally optional (which should make implementing scripting easier)
  • Super fast "for_each" query iterators
  • Merged Resources into World. Resources are now just a special type of component
  • EntityRef/EntityMut builder apis (more efficient and more ergonomic)
  • Fast bitset-backed Access<T> replaces old hashmap-based approach everywhere
  • Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime)
    • With/Without are still taken into account for conflicts, so this should still be comfy to use
  • Much simpler IntoSystem impl
  • Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId)
  • Safety Improvements
    • Entity reservation uses a normal world reference instead of unsafe transmute
    • QuerySets no longer transmute lifetimes
    • Made traits "unsafe" where relevant
    • More thorough safety docs
  • WorldCell
    • Exposes safe mutable access to multiple resources at a time in a World
  • Replaced "catch all" System::update_archetypes(world: &World) with System::new_archetype(archetype: &Archetype)
  • Simpler Bundle implementation
  • Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection"
  • Removed Mut<T> query impl. it is better to only support one way: &mut T
  • Removed with() from Flags<T> in favor of Option<Flags<T>>, which allows querying for flags to be "filtered" by default
  • Components now have is_send property (currently only resources support non-send)
  • More granular module organization
  • New RemovedComponents<T> SystemParam that replaces query.removed::<T>()
  • world.resource_scope() for mutable access to resources and world at the same time
  • WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it
  • Significantly slimmed down SystemState in favor of individual SystemParam state
  • System Commands changed from commands: &mut Commands back to mut commands: Commands (to allow Commands to have a World reference)

Fixes #1320

World Rewrite

This is a from-scratch rewrite of World that fills the niche that hecs used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own!

(the only shared code between the projects is the entity id allocator, which is already basically ideal)

A huge shout out to @SanderMertens (author of flecs) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details.

Component Storage (The Problem)

Two ECS storage paradigms have gained a lot of traction over the years:

  • Archetypal ECS:
    • Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity.
    • Each "archetype" has its own table. Adding/removing an entity's component changes the archetype.
    • Enables super-fast Query iteration due to its cache-friendly data layout
    • Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table"
  • Sparse Set ECS:
    • Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids)
    • Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array.
    • Adding/removing components is a cheap, constant time operation

Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate.

Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because:

  1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform.
  2. users need to take manual action to optimize

Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance.

Hybrid Component Storage (The Solution)

In Bevy ECS V2, we get to have our cake and eat it too. It now has both of the component storage types above (and more can be added later if needed):

  • Tables (aka "archetypal" storage)
    • The default storage. If you don't configure anything, this is what you get
    • Fast iteration by default
    • Slower add/remove operations
  • Sparse Sets
    • Opt-in
    • Slower iteration
    • Faster add/remove operations

These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set":

world.register_component(
    ComponentDescriptor::new::<MyComponent>(StorageType::SparseSet)
).unwrap();

Archetypes

Archetypes are now "just metadata" ... they no longer store components directly. They do store:

  • The ComponentIds of each of the Archetype's components (and that component's storage type)
    • Archetypes are uniquely defined by their component layouts
    • For example: entities with "table" components [A, B, C] and "sparse set" components [D, E] will always be in the same archetype.
  • The TableId associated with the archetype
    • For now each archetype has exactly one table (which can have no components),
    • There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it:
      • Ex: an entity with "table storage" components [A, B, C] and "sparse set" components [D, E] will share the same [A, B, C] table as an entity with [A, B, C] table component and [F] sparse set components.
      • This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later)
  • A list of entities that are in the archetype and the row id of the table they are in
  • ArchetypeComponentIds
    • unique densely packed identifiers for (ArchetypeId, ComponentId) pairs
    • used by the schedule executor for cheap system access control
  • "Archetype Graph Edges" (see the next section)

The "Archetype Graph"

Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the already expensive full copy of all components to the new table storage.

The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If ComponentIds are densely packed, you can use sparse sets to cheaply jump between archetypes.

Bevy takes this one step further by using add/remove Bundle edges instead of Component edges. Bevy encourages the use of Bundles to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. Bundles now also have densely-packed BundleIds. This allows us to use a single edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph.

As a result, an operation that used to be heavy (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations.

Stateful Queries

World queries are now stateful. This allows us to:

  1. Cache archetype (and table) matches
    • This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs).
  2. Cache Fetch and Filter state
    • The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed
  3. Incrementally build up state
    • When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes)

As a result, the direct World query api now looks like this:

let mut query = world.query::<(&A, &mut B)>();
for (a, mut b) in query.iter_mut(&mut world) {
}

Requiring World to generate stateful queries (rather than letting the QueryState type be constructed separately) allows us to ensure that all queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world).

However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam.

Stateful SystemParams

Like Queries, SystemParams now also cache state. For example, Query system params store the "stateful query" state mentioned above. Commands store their internal CommandQueue. This means you can now safely use as many separate Commands parameters in your system as you want. Local<T> system params store their T value in their state (instead of in Resources).

SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now.

Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple Commands system params).

(credit goes to @DJMcNab for the initial idea and draft pr here #1364)

Configurable SystemParams

@DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is ()), but the Local<T> param now supports user-provided parameters:

fn foo(value: Local<usize>) {    
}

app.add_system(foo.system().config(|c| c.0 = Some(10)));

Uber Fast "for_each" Query Iterators

Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration.

fn system(query: Query<(&A, &mut B)>) {
    // you now have the option to do this for a speed boost
    query.for_each_mut(|(a, mut b)| {
    });

    // however normal iterators are still available
    for (a, mut b) in query.iter_mut() {
    }
}

I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use for_each.

We should also consider using for_each for internal bevy systems to give our users a nice speed boost (but that should be a separate pr).

Component Metadata

World now has a Components collection, which is accessible via world.components(). This stores mappings from ComponentId to ComponentInfo, as well as TypeId to ComponentId mappings (where relevant). ComponentInfo stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type.

Significantly Cheaper Access<T>

We used to use TypeAccess<TypeId> to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the TypeAccess<TypeId>sources every time archetypes changed.

This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed ComponentIds and ArchetypeComponentIds.

Merged Resources into World

Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the only major difference between them was that they were unique (and didn't correlate to an entity).

Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state.

I initially got the "separate resources" idea from legion. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use Access<T> internally).

This pr merges Resources into World:

world.insert_resource(1);
world.insert_resource(2.0);
let a = world.get_resource::<i32>().unwrap();
let mut b = world.get_resource_mut::<f64>().unwrap();
*b = 3.0;

Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new unique_components sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single Access<ArchetypeComponentId> per system. It should also make scripting language integration easier.

But this merge did create problems for people directly interacting with World. What if you need mutable access to multiple resources at the same time? world.get_resource_mut() borrows World mutably!

WorldCell

WorldCell applies the Access<ArchetypeComponentId> concept to direct world access:

let world_cell = world.cell();
let a = world_cell.get_resource_mut::<i32>().unwrap();
let b = world_cell.get_resource_mut::<f64>().unwrap();

This adds cheap runtime checks (a sparse set lookup of ArchetypeComponentId and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a WorldBorrow<'w, T> or WorldBorrowMut<'w, T> wrapper type, which will release the relevant ArchetypeComponentId resources when dropped.

World caches the access sparse set (and only one cell can exist at a time), so world.cell() is a cheap operation.

WorldCell does not use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple Rc<RefCell<ArchetypeComponentAccess>> wrapper in each WorldBorrow pointer.

The api is currently limited to resource access, but it can and should be extended to queries / entity component access.

Resource Scopes

WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref and a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe world.get_resource_unchecked_mut(), but that is not ideal!

Instead developers can use a "resource scope"

world.resource_scope(|world: &mut World, a: &mut A| {
})

This temporarily removes the A resource from World, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation.

If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty.

Query Conflicts Use ComponentId Instead of ArchetypeComponentId

For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy main, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters:

// these queries will never conflict due to their filters
fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) {
}

But it also has a significant downside:

// these queries will not conflict _until_ an entity with A, B, and C is spawned
fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) {
}

The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing.

In this pr, I switched to using ComponentId instead. This is more constraining. maybe_conflicts_system will now always fail, but it will do it consistently at startup. Naively, it would also disallow filter_system, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace.

To resolve this, I added a new FilteredAccess<T> type, which wraps Access<T> and adds with/without filters. If two FilteredAccess have with/without values that prove they are disjoint, they will no longer conflict.

EntityRef / EntityMut

World entity operations on main require that the user passes in an entity id to each operation:

let entity = world.spawn((A, )); // create a new entity with A
world.get::<A>(entity);
world.insert(entity, (B, C));
world.insert_one(entity, D);

This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required).

These operations have been replaced by EntityRef and EntityMut, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity:

// spawn now takes no inputs and returns an EntityMut
let entity = world.spawn()
    .insert(A) // insert a single component into the entity
    .insert_bundle((B, C)) // insert a bundle of components into the entity
    .id() // id returns the Entity id

// Returns EntityMut (or panics if the entity does not exist)
world.entity_mut(entity)
    .insert(D)
    .insert_bundle(SomeBundle::default());
{
    // returns EntityRef (or panics if the entity does not exist)
    let d = world.entity(entity)
        .get::<D>() // gets the D component
        .unwrap();
    // world.get still exists for ergonomics
    let d = world.get::<D>(entity).unwrap();
}

// These variants return Options if you want to check existence instead of panicing 
world.get_entity_mut(entity)
    .unwrap()
    .insert(E);

if let Some(entity_ref) = world.get_entity(entity) {
    let d = entity_ref.get::<D>().unwrap();
}

This does not affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change.

Safety Improvements

  • Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute
  • QuerySets no longer transmutes lifetimes
  • Made traits "unsafe" when implementing a trait incorrectly could cause unsafety
  • More thorough safety docs

RemovedComponents SystemParam

The old approach to querying removed components: query.removed:<T>() was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState:

fn system(removed: RemovedComponents<T>) {
    for entity in removed.iter() {
    }
} 

Simpler Bundle implementation

Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used.

Unified WorldQuery and QueryFilter types

(don't worry they are still separate type parameters in Queries .. this is a non-breaking change)

WorldQuery and QueryFilter were already basically identical apis. With the addition of FetchState and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful).

QueryFilters are now just F: WorldQuery where F::Fetch: FilterFetch. FilterFetch requires Fetch<Item = bool> and adds new "short circuit" variants of fetch methods. This enables a filter tuple like (With<A>, Without<B>, Changed<C>) to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for Fetch implementations that return bool.

This forces fetch implementations that return things like (bool, bool, bool) (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit.

More Granular Modules

World no longer globs all of the internal modules together. It now exports core, system, and schedule separately. I'm also considering exporting core submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here).

Remaining Draft Work (to be done in this pr)

  • panic on conflicting WorldQuery fetches (&A, &mut A)
    • bevy main and hecs both currently allow this, but we should protect against it if possible
  • batch_iter / par_iter (currently stubbed out)
  • ChangedRes
  • The Archetypes and Tables collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)
  • It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each query.do_thing(&world) operation. This does add an extra branch to each query operation, so I'm open to other suggestions if people have them.
  • Nested Bundles (if i find time)

Potential Future Work

  • Expand WorldCell to support queries.
  • Consider not allocating in the empty archetype on world.spawn()
    • ex: return something like EntityMutUninit, which turns into EntityMut after an insert or insert_bundle op
    • this actually regressed performance last time i tried it, but in theory it should be faster
  • Optimize SparseSet::insert (see PERF comment on insert)
  • Replace SparseArray Option<T> with T::MAX to cut down on branching
    • would enable cheaper get_unchecked() operations
  • upstream fixedbitset optimizations
    • fixedbitset could be allocation free for small block counts (store blocks in a SmallVec)
    • fixedbitset could have a const constructor
  • Consider implementing Tags (archetype-specific by-value data that affects archetype identity)
    • ex: ArchetypeA could have [A, B, C] table components and [D(1)] "tag" component. ArchetypeB could have [A, B, C] table components and a [D(2)] tag component. The archetypes are different, despite both having D tags because the value inside D is different.
    • this could potentially build on top of the archetype.unique_components added in this pr for resource storage.
  • Consider reverting all_tuples proc macro in favor of the old macro_rules implementation
    • all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints)
    • but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of bevy_ecs (does not affect user code)
  • Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell
    • this is basically just "systems" so maybe it's not worth it
  • Add more world ops
    • world.clear()
    • world.reserve<T: Bundle>(count: usize)
  • Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :)
  • Adapt Commands apis for consistency with new World apis

Benchmarks

key:

  • bevy_old: bevy main branch
  • bevy: this branch
  • _foreach: uses an optimized for_each iterator
  • _sparse: uses sparse set storage (if unspecified assume table storage)
  • _system: runs inside a system (if unspecified assume test happens via direct world ops)

Simple Insert (from ecs_bench_suite)

image

Simpler Iter (from ecs_bench_suite)

image

Fragment Iter (from ecs_bench_suite)

image

Sparse Fragmented Iter

Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes

image

Schedule (from ecs_bench_suite)

image

Add Remove Component (from ecs_bench_suite)

image

Add Remove Component Big

Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed

image

Get Component

Looks up a single component value a large number of times

image

@cart cart added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events labels Feb 26, 2021
@cart cart added this to the Bevy 0.5 milestone Feb 26, 2021
@cart cart marked this pull request as draft February 26, 2021 02:25
@alice-i-cecile
Copy link
Member

A change that you'd mentioned elsewhere, but didn't include in this list: exposing Archetypes etc as new SystemParam. Did this make it in?

@alice-i-cecile
Copy link
Member

It's awesome to see this all brought together. Outstanding thoughts:

  1. Remaining syntax inconsistencies on resources vs components: why do we need to use Res<T> and ResMut<T> when we can simply use Query<&T> and Query<&mut T>?

  2. The need for resource_scope makes me wonder if we should deprecate Resources as a distinct concept and use an enhanced version of the singleton entity pattern discussed in [Merged by Bors] - Query::get_unique #1263 instead, storing each resource in its own entity with a single component (or maybe two, to mark it as a resource). It has a number of benefits, notably much better integration with the rest of the ECS, but we would need to preserve the current ease of accessing singleton entities with only one component (the current Resource pattern) in order for it to be a truly suitable replacement.

This would also mean adaption of the Local and NonSend patterns, but I'm not too worried about the complexity of that.

  1. [EntityRef/EntityMut changes] do not affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change.

I much prefer the new syntax: the effects are much clearer and it's more ergonomic for adding entities with 0 or 1 components. I'm strongly in favor of this change being copied over to Commands, especially since that will improve consistency. It should be done in a separate PR though, and listed in "Potential Future Work".

  1. Consider implementing Tags (archetype-specific by-value data that affects archetype identity)

These sound a lot like minimal versions of indexes (#1205), but run into the issues discussed by Sander with the "pluggable storage model" approach: namely, you're blending concerns and that you often want to be able to index the same component in more than one way. I don't think this is a great approach to a very real issue.

@cart
Copy link
Member Author

cart commented Feb 26, 2021

@alice-i-cecile

A change that you'd mentioned elsewhere, but didn't include in this list: exposing Archetypes etc as new SystemParam. Did this make it in?

It didn't, but thats a trivial change. I see no reason not to include it. Thanks for the reminder! I just forgot.

Remaining syntax inconsistencies on resources vs components: why do we need to use Res and ResMut when we can simply use Query<&T> and Query<&mut T>?

The heart of it is that Queries iterate entities, but resources are not entities. We could implement this trivially by creating one entity for each resource (as you mentioned), but (1) it would perform worse (one extra layer of indirection) and (2) it would have implications for things like scene serialization (3) it would cause resources to show up in entity queries (which isn't necessarily a good thing) . I do think its worth investigating, but I'm not sure blurring the line even more is a good thing / id prefer it if we did it outside of this pr.

This would also mean adaption of the Local and NonSend patterns, but I'm not too worried about the complexity of that.

NonSend components on entities is actually non-trivial because world.despawn(entity) could happen on any thread (and you can't know going into it if the entity has nonsend components). we'd need to sort out a good error handling strategy.

I much prefer the new syntax: the effects are much clearer and it's more ergonomic for adding entities with 0 or 1 components. I'm strongly in favor of this change being copied over to Commands, especially since that will improve consistency. It should be done in a separate PR though, and listed in "Potential Future Work".

yeah i do think we should re-visit Command apis, but theres a lot of paths we could take there (ex the typed commands we've discussed in the past), so its definitely worth tabling for later. Adding that to "potential future work" is a good idea.

These sound a lot like minimal versions of indexes (#1205), but run into the issues discussed by Sander with the "pluggable storage model" approach: namely, you're blending concerns and that you often want to be able to index the same component in more than one way. I don't think this is a great approach to a very real issue.

I don't think its a "pluggable storage" (as described in the discord discussion) and it doesn't suffer from the "multiple indices" problem. You can have (and query for) any number of tags. Its just a different form of archetype identity other than "has(component)". I view it more as a more limited form of Sander's relationship stuff (which are also typed values that affect archetype identity).

crates/bevy_ecs/Cargo.toml Outdated Show resolved Hide resolved
@bjorn3
Copy link
Contributor

bjorn3 commented Feb 26, 2021

It seems that you accidentally added an empty file out.rs.

@jakobhellermann
Copy link
Contributor

I'm trying to update bevy_egui, and I'm a little stuck with the changes to Resources.
Before, it was possible to do Resources::get_mut(&self) without mutable access to the resources, which is used e.g. in a render node impl in bevy_egui: https://github.com/mvlabat/bevy_egui/blob/271b42bfcae1b62f9b8fc45e8b491e2d2348276e/src/egui_node.rs#L169. Now that the Resources are merged into the World this isn't possible anymore, is that intentional?

Some of these mutable resource uses could be moved into Node::prepare, but others require the &mut dyn RenderContext aswell.

@bjorn3
Copy link
Contributor

bjorn3 commented Feb 26, 2021

I think you can use world.cell().

@jakobhellermann
Copy link
Contributor

That requires mutable access to the world aswell. If I understand correctly, world.cell() is for getting multiple mutable resources from a mutable world.

Copy link
Contributor

@Ratysz Ratysz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through most of mod schedule and examples, so far only minor nits; I'll go through more later. Do tell if something needs a closer look or needs to be prioritized.

Particularly love the revamped prelude, Mut<T> -> &mut T in queries, and condensed-by-default access info.

crates/bevy_ecs/src/schedule/mod.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/executor_parallel.rs Outdated Show resolved Hide resolved
examples/scene/scene.rs Outdated Show resolved Hide resolved
crates/bevy_ecs/src/schedule/executor_parallel.rs Outdated Show resolved Hide resolved
@Ratysz
Copy link
Contributor

Ratysz commented Feb 26, 2021

Bikeshedding:

  • .config() doesn't sound very descriptive to me. Something along the lines of .initialize_parameter() or .init_arg() would work better.
  • I feel .insert() and .insert_bundle() should be .with() and .with_bundle() instead, since they are used like builder methods.

I'm also considering exporting core submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here).

I'm in favor, nesting is deep and descriptive enough as it is with bevy::ecs::module.

The Archetypes and Tables collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId.

What's the issue here?

It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe).

Short of scoping them with upgraded WorldCell, I don't think the branch is avoidable.

Consider implementing Tags

If we do, we really should think up a better name - I've seen people misunderstand what they are and confuse them with marker components.

I remember that when Legion shed its tags (I think that was a thing that happened, it's been a while) folks were talking about there not being a real usecase for them, citing the fate of the same feature in Unity. When this feature was suggested for hecs, @Ralith raised concerns that overreliance on tags could fragment archetypes more than necessary, impacting performance; stateful queries should help here, though.

@Ralith
Copy link

Ralith commented Feb 26, 2021

Thanks for writing up these changes! I'll definitely be cribbing some of the ideas when I can find time...

stateful queries should help [optimize tags].

They don't do anything about the fundamental "problem," which is that you're (intentionally!) fragmenting your archetypes. Since that's really the entire point of the feature, it's up to the user to do so iff it's beneficial, which is a nice option to have in theory but could be a footgun in practice, suggesting that the feature should be justified by a strong use case.

@cart
Copy link
Member Author

cart commented Feb 26, 2021

@Ratysz

.config() doesn't sound very descriptive to me. Something along the lines of .initialize_parameter() or .init_arg() would work better.

I think initialize_parameter implies that you are actually setting the parameter, but that isn't actually true here. You are "configuring" the parameter (ex: setting the value of SystemParamState::Config). I'm happy to revisit the name of the associated type, but i do think the terminology should line up. At the very least we could make it more explicit: config_params(), configure_parameters(), config_param_state(), etc.

I feel .insert() and .insert_bundle() should be .with() and .with_bundle() instead, since they are used like builder methods.

I think I would like to reel in with_x in general as the "builder operation prefix". with_x instead of add_x, insert_x, set_x, etc erases "operation type" in favor of pointing out that the thing is a builder. Bevy uses builders in a lot of places, which means we're erasing a lot of useful context. Its worth coming up with a set of rules that accomplish a good balance between "operation semantics", "ergonomics", "consistency", and "namespacing". "namespacing" is a bit tricky. Some types have both builder and non-builder functions for the same operation. The current approach with_x vs insert_x solves that problem, but it comes at the cost of operation type erasure. And something like with_insert_x is pretty gross to read/grok/type.

Some prior discussion / operation definitions:
#1352 (comment)

What's the issue here? [Archtype/Table hashes]

The issue is that HashMap falls back to Eq to resolve hash collisions. If we use an int hash as the key, the collision is "resolved" by treating the different items as "the same", even if they aren't.

The problem with using something like struct ArchetypeKey { ids: Vec<ComponentId> } as the key is that we need to allocate. My ideal solution would be to use HashMap::raw_entry, but that is unstable. The other solution (Cow everything) has the unfortunate side effect of either requiring double allocations for the component ids (not terrible given that this is a cached operation) or changing the internal component id lists in Archetype to Cow (which is also not the worst thing in the world).

If we do [add Tags], we really should think up a better name - I've seen people misunderstand what they are and confuse them with marker components.

I think the primary motivator is to use them as "indices" (ex: give me everything with Parent(x), Handle<T>, etc). We could call them that (and "market" them as such?).

I think one of the big issues with the legion implementation was that tags were required in every spawn api. Users were forced to think about them.

@Ralith raised concerns that overreliance on tags could fragment archetypes more than necessary, impacting performance; stateful queries should help here, though.

Yeah its worth measuring. Our sparse iteration performance is pretty great now, but it will still have some indirection in this case because Tags serve as "non-table filters", which means we can't directly iterate tables. We'd instead need to iterate archetypes like we do for sparse sets (which has the potential to jump around positions in tables, although in practice they still tend to glob up in cache friendly ways). I expect perf to still big pretty good, but not as good as direct table iteration. But yeah maybe separate index storage is the right call.

@alice-i-cecile you volunteered to create issues for the "potential future work". "Tags" is probably a good place to start so we can move this conversation there.

@jakobhellermann

I'm trying to update bevy_egui, and I'm a little stuck with the changes to Resources.
Before, it was possible to do Resources::get_mut(&self) without mutable access to the resources, which is used e.g. in a render node impl in bevy_egui. Now that the Resources are merged into the World this isn't possible anymore, is that intentional?
Some of these mutable resource uses could be moved into Node::prepare, but others require the &mut dyn RenderContext aswell.

Allowing mutable resource access to be "bootstrapped" from an &Resource without unsafe was technically unsound (because a schedule (which didn't use the old atomic access control) could be running while you are getting your mutable reference from &Resource). Node::prepare was added to give nodes a chance to interact with the World mutably before kicking off a (potentially) parallel graph run, which hopefully resolves most problems. The example you linked to should be resolvable that way. Can you send a link to something that can't be resolved with prepare?

@Ralith

Thanks for writing up these changes! I'll definitely be cribbing some of the ideas when I can find time...

Fantastic. Lets keep this cross pollination going ❤️

They don't do anything about the fundamental "problem," which is that you're (intentionally!) fragmenting your archetypes. Since that's really the entire point of the feature, it's up to the user to do so iff it's beneficial, which is a nice option to have in theory but could be a footgun in practice, suggesting that the feature should be justified by a strong use case

Agreed. I think the "strong" use case becomes clear if we rename Tags to Indices (which could be used for basically anything). In this case it wouldn't actually fragment the Table these things are stored in (as tags would live in a separate storage), but it would fragment the access pattern (because we need to iterate Archetypes instead of directly iterating Tables). Its worth measuring the real cost and comparing that to other "index" implementations (such as storing the index->entity table elsewhere in something that doesn't affect archetype identity, which notably would also fragment table access patterns).

@jakobhellermann
Copy link
Contributor

Allowing mutable resource access to be "bootstrapped" from an &Resource without unsafe was technically unsound (because a schedule (which didn't use the old atomic access control) could be running while you are getting your mutable reference from &Resource). Node::prepare was added to give nodes a chance to interact with the World mutably before kicking off a (potentially) parallel graph run, which hopefully resolves most problems. The example you linked to should be resolvable that way. Can you send a link to something that can't be resolved with prepare?

The three functions process_asset_events, remove_unused_textures and init_textures which get run inside the Node::update seem to be harder to do in prepare: https://github.com/mvlabat/bevy_egui/blob/271b42bfcae1b62f9b8fc45e8b491e2d2348276e/src/egui_node.rs#L180

I'll look into to more tomorrow, maybe this could be done as a SystemNode instead.

@TomGillen
Copy link

Can you contribute "Sparse Fragmented Iter" to ecs_bench_suite?

@cart
Copy link
Member Author

cart commented Feb 26, 2021

@TomGillen sure thing! I think get_component is another good one to include.

@cart
Copy link
Member Author

cart commented Feb 26, 2021

@jakobhellermann

Yeah SystemNode seems like an option here. RenderResourcesNode is a SystemNode whose system interacts with the world mutably, creates resources using Res<Box<dyn RenderResourceContext>>, and queues up RenderContext commands using a command queue (whose reference is passed to the Node impl https://github.com/bevyengine/bevy/pull/1525/files#diff-454d965b87cfdc4fb81e3da21b073fe7545caa90469aa65fab5ab221db064025R403).

Alternatively you could do the same thing in Node::prepare without a SystemNode (which might be easier given that you already have non-system code written).

Most of the code that you linked to doesn't actually need the RenderContext. In prepare() I recommend using the RenderResourceContext world resource mentioned above to allocate gpu resources and queue up whatever RenderContext operations are required. Then Node::update() can just run those commands on the RenderContext.

@cart
Copy link
Member Author

cart commented Feb 26, 2021

Loving how the new Archetypes, Components, Entities, and Bundles SystemParams interact with Queries: 32be92b

@alice-i-cecile
Copy link
Member

Loving how the new Archetypes, Components, Entities, and Bundles SystemParams interact with Queries: 32be92b

Those are pretty new SystemParams... Should be nice to reduce the need for exclusive systems :3

@cart
Copy link
Member Author

cart commented Feb 27, 2021

I opted to implement par_for_each / par_for_each_mut instead of par_iter for now, both because it should technically be faster and because the overall complexity cost was lower. We can follow up with par_iter later if needed.

slimsag added a commit to hexops/mach that referenced this pull request Jan 27, 2022
:: Limitations of our ECS

Previously, we had thought about our ECS in terms of archetypes defined at compile time (effectively arrays
of archetype structs with comptime defined fields as components.) I believe that this is likely *the most
efficient* way that one could ever represent entities. However, it comes with many limitations, namely that:

You have to define which components your entity will have _at compile time_: with our implementation,
adding/removing components to an entity at runtime was not possible (although declaring components at comptime
that had optional _values_ at runtime was). This is contradictory with some goals that we have:

* The ability to add/remove components at runtime:
    * In an editor for the game engine, e.g. adding a Physics component or similar to see how it behaves.
    * In a code file as part of Zig hot code swapping in the future, adding an arbitrary component to an entity
      while your game is running.
    * In more obscure cases: adding components at runtime as part of loading a config file, in response to network
      operations, etc.

:: Investigating sparse sets

To find the best way to solve this, I did begin to investigate sparse sets which I saw mentioned in various contexts
with ECS implementations. My understanding is that many ECS implementations utilize sparse sets to store a relation
between an entity ID and the dense arrays of components associated with it. My understanding is that sparse sets
often imply storing components as distinct dense arrays (e.g. an array of physics component values, an array of weapon
component values, etc.) and then using the sparse set to map entity IDs -> indexes within those dense component arrays,
`weapon_components[weapons_sparse_set[entityID]]` is effectively used to lookup an entity's weapon component value,
because not every entity is guaranteed to have the same components and so `weapon_components[entityID]` is not possible.

This of course introduces overhead, not only due to two arrays needed to lookup a component's value, but also because
you may now be accessing `weapon_components` values non-sequentially which can easily introduce CPU cache misses. And
so I began to think about how to reconcile the comptime-component-definition archetype approach I had written before
and this sparse set approach that seems to be popular among other ECS implementations.

:: Thinking in terms of databases

What helped me was thinking about an ECS in terms of databases, where tables represent a rather arbitrary "type" of
entity, rows represent entities (of that type) themselves, and the columns represent component values. This makes a lot
of sense to me, and can be implemented at runtime easily to allow adding/removing "columns" (components) to an entity.

The drawback of this database model made the benefit of sparse sets obvious: If I have a table representing monster
entities, and add a Weapon component to one monster - every monster must now pay the cost of storing such a component
as we've introduced a column, whether they intend to store a value there or not. In this context, having a way to
separately store components and associate them with an entity via a sparse set is nice: you pay a bit more to iterate
over such components (because they are not stored as dense arrays), but you only pay the cost of storing them for
entities that actually intend to use them. In fact, iteration could be faster due to not having to skip over "empty"
column values.

So this was the approach I implemented here:

* `Entities` is a database of tables.
    * It's a hashmap of table names (entity type names) to tables (`EntityTypeStorage`).
    * An "entity type" is some arbitrary type of entity _likely to have the same components_. It's optimized for that.
      But unlike an "archetype", adding/removing ocmponents does not change the type - it just adds/removes a new column
      (array) of data.
    * You would use just one set of these for any entities that would pass through the same system. e.g. one of these
      for all 3D objects, one for all 2D objects, one for UI components. Or one for all three.
* `EntityTypeStorage` is a table, whose rows are entities and columns are components.
    * It's a hashmap of component names -> `ComponentStorage(T)`
    * Adding/removing a component is as simple as adding/removing a hashmap entry.
* `ComponentStorage(T)` is one of two things:
    * (default) a dense array of component values, making it quite optimal for iterating over.
    * (optional) a sparsely stored map of (row ID) -> (component value).
* `EntityID` thus becomes a simple 32-bit row ID + a 16-bit table ID, and it's globally unique within a set of `Entities`.
    * Also enables O(1) entity ID lookups, effectively `entities.tables[tableID].rows[rowID]`

:: Benefits

::: Faster "give me all entities with components (T, U, V) queries"

One nice thing about this approach is that to answer a query like "give me all entities with a 'weapon' component", we can
reduce the search space dramatically right off the bat due to the entity types: an `EntityTypeStorage` has fast access to
the set of components all entities within it may have set. Now, not all of them will have such a component, but _most of
them will_. We just "know" that without doing any computations, our data is structured to hint this to us. And this makes
sense logically, because most entities are similar: buttons, ogre monsters, players, etc. are often minor variations of
something, not a truly unique type of entity with 100% random components.

::: Shared component values

In addition to having sparse storage for `entity ID -> component value` relations, we can _also_ offer a third type of
storage: shared storage. Because we allow the user to arbitrarily define entity types, we can offer to store components
at the entity type (table) level: pay to store the component only once, not per-entity. This seems quite useful (and perhaps
even unique to our ECS? I'd be curious to hear if others offer this!)

For example, if you want to have all entities of type "monster" share the same `Renderer` component value for example,
we simply elevate the storage of that component value to the `EntityTypeStorage` / as part of the table itself, not as a column
or sparse relation. This is a mere `component name -> component value` map. There is no `entity ID -> component value`
relationship involved here, we just "know" that every entity of the "monster" entity type has that component value.

::: Runtime/editor introspection

This is not a benefit of thinking in terms of databases, but this implementation opens the possibility for runtime (future editor)
manipulation & introspection:

* Adding/removing components to an entity at runtime
* Iterating all entity types within a world
    * Iterating all entities of a given type
        * Iterating all possibly-stored components for entities of this type
        * Iterating all entities of this type
            * Iterating all components of this entity (future)
* Converting from sparse -> dense storage at runtime

:: A note about Bevy/EnTT

After writing this, and the above commit message, I got curious how Bevy/EnTT handle this. Do they do something similar?

I found [Bevy has hybrid component storage (pick between dense and sparse)](https://bevyengine.org/news/bevy-0-5/#hybrid-component-storage-the-solution)
which appears to be more clearly specified in [this linked PR](bevyengine/bevy#1525) which also indicates:

> hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es".
> Shipyard and EnTT are "sparse set ecs-es".

:: Is our archetypal memory layout better than other ECS implementations?

One notable difference is that Bevy states about Archetypal ECS:

> Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need
> to be copied to the new archetype's "table"

I've seen this stated elsewhere, outside of Bevy, too. I've had folks tell me that archetypal ECS implementations
use an AoS memory layout in order to make iteration faster (where `A`, `B`, and `C` are component values):

```
ABCABCABCABC
```

I have no doubt a sparse set is worse for iteration, as it involves accessing non-sequentially into the underlying dense
arrays of the sparse set (from what I understand.) However, I find the archetypal storage pattern most have settled on
(AoS memory layout) to be a strange choice. The other choice is an SoA memory layout:

```
AAAA
BBBB
CCCC
```

My understanding from data oriented design (primarily from Andrew Kelley's talk) is that due to struct padding and alignment
SoA is in fact better as it reduces the size of data (up to nearly half, IIRC) and that ensures more actually ends up in CPU
cache despite accessing distinct arrays (which apparently CPUs are quite efficient at.)

Obviously, I have no benchmarks, and so making such a claim is super naive. However, if true, it means that our memory layout
is not just more CPU cache efficient but also largely eliminates the typically increased cost of adding/removing components
with archetypal storage: others pay to copy every single entity when adding/removing a component, we don't. We only pay to
allocate space for the new component. We don't pay to copy anything. Of course, in our case adding/removing a component to
sparse storage is still cheaper: effectively a hashmap insert for affected entities only, rather than allocating an entire
array of size `len(entities)`.

An additional advantage of this, is that even when iterating over every entity your intent is often not to access every component.
For example, a physics system may access multiple components but will not be interested in rendering/game-logic components and
those will "push" data we care about out of the limited cache space.

:: Future

Major things still not implemented here include:

* Multi-threading
* Querying, iterating
* "Indexes"
    * Graph relations index: e.g. parent-child entity relations for a DOM / UI / scene graph.
    * Spatial index: "give me all entities within 5 units distance from (x, y, z)"
    * Generic index: "give me all entities where arbitraryFunction(e) returns true"

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
slimsag added a commit to hexops/mach that referenced this pull request Jan 28, 2022
:: Limitations of our ECS

Previously, we had thought about our ECS in terms of archetypes defined at compile time (effectively arrays
of archetype structs with comptime defined fields as components.) I believe that this is likely *the most
efficient* way that one could ever represent entities. However, it comes with many limitations, namely that:

You have to define which components your entity will have _at compile time_: with our implementation,
adding/removing components to an entity at runtime was not possible (although declaring components at comptime
that had optional _values_ at runtime was). This is contradictory with some goals that we have:

* The ability to add/remove components at runtime:
    * In an editor for the game engine, e.g. adding a Physics component or similar to see how it behaves.
    * In a code file as part of Zig hot code swapping in the future, adding an arbitrary component to an entity
      while your game is running.
    * In more obscure cases: adding components at runtime as part of loading a config file, in response to network
      operations, etc.

:: Investigating sparse sets

To find the best way to solve this, I did begin to investigate sparse sets which I saw mentioned in various contexts
with ECS implementations. My understanding is that many ECS implementations utilize sparse sets to store a relation
between an entity ID and the dense arrays of components associated with it. My understanding is that sparse sets
often imply storing components as distinct dense arrays (e.g. an array of physics component values, an array of weapon
component values, etc.) and then using the sparse set to map entity IDs -> indexes within those dense component arrays,
`weapon_components[weapons_sparse_set[entityID]]` is effectively used to lookup an entity's weapon component value,
because not every entity is guaranteed to have the same components and so `weapon_components[entityID]` is not possible.

This of course introduces overhead, not only due to two arrays needed to lookup a component's value, but also because
you may now be accessing `weapon_components` values non-sequentially which can easily introduce CPU cache misses. And
so I began to think about how to reconcile the comptime-component-definition archetype approach I had written before
and this sparse set approach that seems to be popular among other ECS implementations.

:: Thinking in terms of databases

What helped me was thinking about an ECS in terms of databases, where tables represent a rather arbitrary "type" of
entity, rows represent entities (of that type) themselves, and the columns represent component values. This makes a lot
of sense to me, and can be implemented at runtime easily to allow adding/removing "columns" (components) to an entity.

The drawback of this database model made the benefit of sparse sets obvious: If I have a table representing monster
entities, and add a Weapon component to one monster - every monster must now pay the cost of storing such a component
as we've introduced a column, whether they intend to store a value there or not. In this context, having a way to
separately store components and associate them with an entity via a sparse set is nice: you pay a bit more to iterate
over such components (because they are not stored as dense arrays), but you only pay the cost of storing them for
entities that actually intend to use them. In fact, iteration could be faster due to not having to skip over "empty"
column values.

So this was the approach I implemented here:

* `Entities` is a database of tables.
    * It's a hashmap of table names (entity type names) to tables (`EntityTypeStorage`).
    * An "entity type" is some arbitrary type of entity _likely to have the same components_. It's optimized for that.
      But unlike an "archetype", adding/removing ocmponents does not change the type - it just adds/removes a new column
      (array) of data.
    * You would use just one set of these for any entities that would pass through the same system. e.g. one of these
      for all 3D objects, one for all 2D objects, one for UI components. Or one for all three.
* `EntityTypeStorage` is a table, whose rows are entities and columns are components.
    * It's a hashmap of component names -> `ComponentStorage(T)`
    * Adding/removing a component is as simple as adding/removing a hashmap entry.
* `ComponentStorage(T)` is one of two things:
    * (default) a dense array of component values, making it quite optimal for iterating over.
    * (optional) a sparsely stored map of (row ID) -> (component value).
* `EntityID` thus becomes a simple 32-bit row ID + a 16-bit table ID, and it's globally unique within a set of `Entities`.
    * Also enables O(1) entity ID lookups, effectively `entities.tables[tableID].rows[rowID]`

:: Benefits

::: Faster "give me all entities with components (T, U, V) queries"

One nice thing about this approach is that to answer a query like "give me all entities with a 'weapon' component", we can
reduce the search space dramatically right off the bat due to the entity types: an `EntityTypeStorage` has fast access to
the set of components all entities within it may have set. Now, not all of them will have such a component, but _most of
them will_. We just "know" that without doing any computations, our data is structured to hint this to us. And this makes
sense logically, because most entities are similar: buttons, ogre monsters, players, etc. are often minor variations of
something, not a truly unique type of entity with 100% random components.

::: Shared component values

In addition to having sparse storage for `entity ID -> component value` relations, we can _also_ offer a third type of
storage: shared storage. Because we allow the user to arbitrarily define entity types, we can offer to store components
at the entity type (table) level: pay to store the component only once, not per-entity. This seems quite useful (and perhaps
even unique to our ECS? I'd be curious to hear if others offer this!)

For example, if you want to have all entities of type "monster" share the same `Renderer` component value for example,
we simply elevate the storage of that component value to the `EntityTypeStorage` / as part of the table itself, not as a column
or sparse relation. This is a mere `component name -> component value` map. There is no `entity ID -> component value`
relationship involved here, we just "know" that every entity of the "monster" entity type has that component value.

::: Runtime/editor introspection

This is not a benefit of thinking in terms of databases, but this implementation opens the possibility for runtime (future editor)
manipulation & introspection:

* Adding/removing components to an entity at runtime
* Iterating all entity types within a world
    * Iterating all entities of a given type
        * Iterating all possibly-stored components for entities of this type
        * Iterating all entities of this type
            * Iterating all components of this entity (future)
* Converting from sparse -> dense storage at runtime

:: A note about Bevy/EnTT

After writing this, and the above commit message, I got curious how Bevy/EnTT handle this. Do they do something similar?

I found [Bevy has hybrid component storage (pick between dense and sparse)](https://bevyengine.org/news/bevy-0-5/#hybrid-component-storage-the-solution)
which appears to be more clearly specified in [this linked PR](bevyengine/bevy#1525) which also indicates:

> hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es".
> Shipyard and EnTT are "sparse set ecs-es".

:: Is our archetypal memory layout better than other ECS implementations?

One notable difference is that Bevy states about Archetypal ECS:

> Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need
> to be copied to the new archetype's "table"

I've seen this stated elsewhere, outside of Bevy, too. I've had folks tell me that archetypal ECS implementations
use an AoS memory layout in order to make iteration faster (where `A`, `B`, and `C` are component values):

```
ABCABCABCABC
```

I have no doubt a sparse set is worse for iteration, as it involves accessing non-sequentially into the underlying dense
arrays of the sparse set (from what I understand.) However, I find the archetypal storage pattern most have settled on
(AoS memory layout) to be a strange choice. The other choice is an SoA memory layout:

```
AAAA
BBBB
CCCC
```

My understanding from data oriented design (primarily from Andrew Kelley's talk) is that due to struct padding and alignment
SoA is in fact better as it reduces the size of data (up to nearly half, IIRC) and that ensures more actually ends up in CPU
cache despite accessing distinct arrays (which apparently CPUs are quite efficient at.)

Obviously, I have no benchmarks, and so making such a claim is super naive. However, if true, it means that our memory layout
is not just more CPU cache efficient but also largely eliminates the typically increased cost of adding/removing components
with archetypal storage: others pay to copy every single entity when adding/removing a component, we don't. We only pay to
allocate space for the new component. We don't pay to copy anything. Of course, in our case adding/removing a component to
sparse storage is still cheaper: effectively a hashmap insert for affected entities only, rather than allocating an entire
array of size `len(entities)`.

An additional advantage of this, is that even when iterating over every entity your intent is often not to access every component.
For example, a physics system may access multiple components but will not be interested in rendering/game-logic components and
those will "push" data we care about out of the limited cache space.

:: Future

Major things still not implemented here include:

* Multi-threading
* Querying, iterating
* "Indexes"
    * Graph relations index: e.g. parent-child entity relations for a DOM / UI / scene graph.
    * Spatial index: "give me all entities within 5 units distance from (x, y, z)"
    * Generic index: "give me all entities where arbitraryFunction(e) returns true"

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
bors bot pushed a commit that referenced this pull request Mar 31, 2022
# Objective

- The perf comments, added (by me) in #1349, became outdated once the initialisation call started to take an exclusive reference, (presumably in #1525).
- They have been naïvely transferred along ever since

## Solution

- Remove them
GrantMoyer added a commit to GrantMoyer/bevy that referenced this pull request Apr 16, 2022
All uses of ParallelIterator in the Query API were removed in bevyengine#1525, and it was replaced with Query::par_for_each(). This change removes the practically dead code related to ParallelIterator.

It also updates the comments in the parallel_query example, which became out of date when ParallelIterator was replaced. It also increases the sprite count in the example, since sprite rendering is no longer a bottleneck. Finally, it fixes a bug in the example which caused sprites to get stuck on the edge of the window when the window's size was reduced.
GrantMoyer added a commit to GrantMoyer/bevy that referenced this pull request Apr 16, 2022
All uses of ParallelIterator in the Query API were removed in bevyengine#1525, when it was replaced with Query::par_for_each(). This change removes the practically dead code related to ParallelIterator.

It also updates the comments in the parallel_query example, which became out of date when ParallelIterator was replaced. It also increases the sprite count in the example, since sprite rendering is no longer a bottleneck. Finally, it fixes a bug in the example which caused sprites to get stuck on the edge of the window when the window's size was reduced.
aevyrie pushed a commit to aevyrie/bevy that referenced this pull request Jun 7, 2022
# Objective

- The perf comments, added (by me) in bevyengine#1349, became outdated once the initialisation call started to take an exclusive reference, (presumably in bevyengine#1525).
- They have been naïvely transferred along ever since

## Solution

- Remove them
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

- The perf comments, added (by me) in bevyengine#1349, became outdated once the initialisation call started to take an exclusive reference, (presumably in bevyengine#1525).
- They have been naïvely transferred along ever since

## Solution

- Remove them
slimsag added a commit to hexops-graveyard/mach-ecs that referenced this pull request Apr 5, 2023
:: Limitations of our ECS

Previously, we had thought about our ECS in terms of archetypes defined at compile time (effectively arrays
of archetype structs with comptime defined fields as components.) I believe that this is likely *the most
efficient* way that one could ever represent entities. However, it comes with many limitations, namely that:

You have to define which components your entity will have _at compile time_: with our implementation,
adding/removing components to an entity at runtime was not possible (although declaring components at comptime
that had optional _values_ at runtime was). This is contradictory with some goals that we have:

* The ability to add/remove components at runtime:
    * In an editor for the game engine, e.g. adding a Physics component or similar to see how it behaves.
    * In a code file as part of Zig hot code swapping in the future, adding an arbitrary component to an entity
      while your game is running.
    * In more obscure cases: adding components at runtime as part of loading a config file, in response to network
      operations, etc.

:: Investigating sparse sets

To find the best way to solve this, I did begin to investigate sparse sets which I saw mentioned in various contexts
with ECS implementations. My understanding is that many ECS implementations utilize sparse sets to store a relation
between an entity ID and the dense arrays of components associated with it. My understanding is that sparse sets
often imply storing components as distinct dense arrays (e.g. an array of physics component values, an array of weapon
component values, etc.) and then using the sparse set to map entity IDs -> indexes within those dense component arrays,
`weapon_components[weapons_sparse_set[entityID]]` is effectively used to lookup an entity's weapon component value,
because not every entity is guaranteed to have the same components and so `weapon_components[entityID]` is not possible.

This of course introduces overhead, not only due to two arrays needed to lookup a component's value, but also because
you may now be accessing `weapon_components` values non-sequentially which can easily introduce CPU cache misses. And
so I began to think about how to reconcile the comptime-component-definition archetype approach I had written before
and this sparse set approach that seems to be popular among other ECS implementations.

:: Thinking in terms of databases

What helped me was thinking about an ECS in terms of databases, where tables represent a rather arbitrary "type" of
entity, rows represent entities (of that type) themselves, and the columns represent component values. This makes a lot
of sense to me, and can be implemented at runtime easily to allow adding/removing "columns" (components) to an entity.

The drawback of this database model made the benefit of sparse sets obvious: If I have a table representing monster
entities, and add a Weapon component to one monster - every monster must now pay the cost of storing such a component
as we've introduced a column, whether they intend to store a value there or not. In this context, having a way to
separately store components and associate them with an entity via a sparse set is nice: you pay a bit more to iterate
over such components (because they are not stored as dense arrays), but you only pay the cost of storing them for
entities that actually intend to use them. In fact, iteration could be faster due to not having to skip over "empty"
column values.

So this was the approach I implemented here:

* `Entities` is a database of tables.
    * It's a hashmap of table names (entity type names) to tables (`EntityTypeStorage`).
    * An "entity type" is some arbitrary type of entity _likely to have the same components_. It's optimized for that.
      But unlike an "archetype", adding/removing ocmponents does not change the type - it just adds/removes a new column
      (array) of data.
    * You would use just one set of these for any entities that would pass through the same system. e.g. one of these
      for all 3D objects, one for all 2D objects, one for UI components. Or one for all three.
* `EntityTypeStorage` is a table, whose rows are entities and columns are components.
    * It's a hashmap of component names -> `ComponentStorage(T)`
    * Adding/removing a component is as simple as adding/removing a hashmap entry.
* `ComponentStorage(T)` is one of two things:
    * (default) a dense array of component values, making it quite optimal for iterating over.
    * (optional) a sparsely stored map of (row ID) -> (component value).
* `EntityID` thus becomes a simple 32-bit row ID + a 16-bit table ID, and it's globally unique within a set of `Entities`.
    * Also enables O(1) entity ID lookups, effectively `entities.tables[tableID].rows[rowID]`

:: Benefits

::: Faster "give me all entities with components (T, U, V) queries"

One nice thing about this approach is that to answer a query like "give me all entities with a 'weapon' component", we can
reduce the search space dramatically right off the bat due to the entity types: an `EntityTypeStorage` has fast access to
the set of components all entities within it may have set. Now, not all of them will have such a component, but _most of
them will_. We just "know" that without doing any computations, our data is structured to hint this to us. And this makes
sense logically, because most entities are similar: buttons, ogre monsters, players, etc. are often minor variations of
something, not a truly unique type of entity with 100% random components.

::: Shared component values

In addition to having sparse storage for `entity ID -> component value` relations, we can _also_ offer a third type of
storage: shared storage. Because we allow the user to arbitrarily define entity types, we can offer to store components
at the entity type (table) level: pay to store the component only once, not per-entity. This seems quite useful (and perhaps
even unique to our ECS? I'd be curious to hear if others offer this!)

For example, if you want to have all entities of type "monster" share the same `Renderer` component value for example,
we simply elevate the storage of that component value to the `EntityTypeStorage` / as part of the table itself, not as a column
or sparse relation. This is a mere `component name -> component value` map. There is no `entity ID -> component value`
relationship involved here, we just "know" that every entity of the "monster" entity type has that component value.

::: Runtime/editor introspection

This is not a benefit of thinking in terms of databases, but this implementation opens the possibility for runtime (future editor)
manipulation & introspection:

* Adding/removing components to an entity at runtime
* Iterating all entity types within a world
    * Iterating all entities of a given type
        * Iterating all possibly-stored components for entities of this type
        * Iterating all entities of this type
            * Iterating all components of this entity (future)
* Converting from sparse -> dense storage at runtime

:: A note about Bevy/EnTT

After writing this, and the above commit message, I got curious how Bevy/EnTT handle this. Do they do something similar?

I found [Bevy has hybrid component storage (pick between dense and sparse)](https://bevyengine.org/news/bevy-0-5/#hybrid-component-storage-the-solution)
which appears to be more clearly specified in [this linked PR](bevyengine/bevy#1525) which also indicates:

> hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es".
> Shipyard and EnTT are "sparse set ecs-es".

:: Is our archetypal memory layout better than other ECS implementations?

One notable difference is that Bevy states about Archetypal ECS:

> Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need
> to be copied to the new archetype's "table"

I've seen this stated elsewhere, outside of Bevy, too. I've had folks tell me that archetypal ECS implementations
use an AoS memory layout in order to make iteration faster (where `A`, `B`, and `C` are component values):

```
ABCABCABCABC
```

I have no doubt a sparse set is worse for iteration, as it involves accessing non-sequentially into the underlying dense
arrays of the sparse set (from what I understand.) However, I find the archetypal storage pattern most have settled on
(AoS memory layout) to be a strange choice. The other choice is an SoA memory layout:

```
AAAA
BBBB
CCCC
```

My understanding from data oriented design (primarily from Andrew Kelley's talk) is that due to struct padding and alignment
SoA is in fact better as it reduces the size of data (up to nearly half, IIRC) and that ensures more actually ends up in CPU
cache despite accessing distinct arrays (which apparently CPUs are quite efficient at.)

Obviously, I have no benchmarks, and so making such a claim is super naive. However, if true, it means that our memory layout
is not just more CPU cache efficient but also largely eliminates the typically increased cost of adding/removing components
with archetypal storage: others pay to copy every single entity when adding/removing a component, we don't. We only pay to
allocate space for the new component. We don't pay to copy anything. Of course, in our case adding/removing a component to
sparse storage is still cheaper: effectively a hashmap insert for affected entities only, rather than allocating an entire
array of size `len(entities)`.

An additional advantage of this, is that even when iterating over every entity your intent is often not to access every component.
For example, a physics system may access multiple components but will not be interested in rendering/game-logic components and
those will "push" data we care about out of the limited cache space.

:: Future

Major things still not implemented here include:

* Multi-threading
* Querying, iterating
* "Indexes"
    * Graph relations index: e.g. parent-child entity relations for a DOM / UI / scene graph.
    * Spatial index: "give me all entities within 5 units distance from (x, y, z)"
    * Generic index: "give me all entities where arbitraryFunction(e) returns true"

Signed-off-by: Stephen Gutekanst <stephen@hexops.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Less permissive component conflict allowance in system queries