Query by Indexed Component Value #17608

bushrat011899 · 2025-01-30T01:14:51Z

Objective

Fixes Indexes: look up entities by the value of their components #4513
Possible fix for Entity groups #1592
Fixes Entity Markers #6556
Fixes Query filter for component value rather than component existence #5639
Add a first-class option for querying by component value.

Solution

Users can now call app.add_index(...) for an appropriate component C to have Bevy automatically track its value and provide a high-performance archetypical indexing solution. Users can then query by-value using the QueryByIndex system parameter all entities with a specific indexed value. The value is provided within the body of a system, meaning QueryByIndex can access any entity matching the provided compile-time query filters.

This is achieved by adding OnInsert and OnReplace observers and requiring that the index component C be immutable, ensuring those two hooks are able to capture all mutations. Since this is done with observers, users can index 3rd party components, provided they are immutable and implement the traits required by the index storage back-end chosen.

By default, indexing uses a HashMap as its backing storage, which requires that the indexed component implements Eq + Hash + Clone. If these traits aren't available on the component to be indexed, users can either use BTreeMap as a back-end, only requiring Ord + Clone, or implement the IndexStorage trait on their own type. Any storage back-end can be used with an index, but only one index will ever exist for a given component.

Within the observers, a private index resource is managed which tracks the component value against a collection of runtime marker components. These runtime marker components are used to group entities by-value at the archetype level for the index component. By using multiple marking components, their combination can be used to uniquely address a particular component value, allowing a relatively small number of markers to totally service even complex components.

This grouping allows for maximum performance in iteration, since querying is simply filtering for a hidden component by-archetype.

To-Do / Follow-up

It would be nice to index against a whole bundle instead. This requires an immutable-bundle concept and for the user to provide a function to create their derived index from the supplied bundle.

Testing

New index example
CI

Performance

I've added some index benchmarks to the ecs bench which test iteration speed (index_iter) and mutation speed (index_update). Running these benchmarks locally, I see a large performance improvement (20x) when looking for rare values amongst a large selection of entities (1-in-1,000 amongst 1,000,000), and a notable performance loss (6x) when mutating indexed components. This confirms expectation: indexing is only a performance win in scenarios where a value is rarely updated but frequently searched for.

Indexing Strategy

To find an Entity with a particular value of a component C, a first choice might just be to use filter on a Query. No setup required, no cache invalidation, nice and simple.

Big downside to this approach is you need to iterate all entities matching your Query. If most entities match your filter, this may be perfectly fine, but if you have a large number of entities to search, and only a small number with the value you're searching for, it quickly becomes too costly.

First optimisation available: store a mapping from C to Entity and use Query::get.

This avoids iterating all entities, but introduces some new problems to solve. First, multiple entities could have the same value for C, so we need a mapping from C to many entities such as an EntityHashSet. Next, iteration performance suffers as we're no longer iterating over entities and tables in-order; we're randomly selecting entities based on however they're stored in our index. For the best performance, we want to iterate the Query and check against the index, just "somehow" skipping entities that don't match our value.

Second optimisation: use marker components to group entities with like C values.

What we do here is we instead map values of C to ComponentIds rather than Entitys, and then ensure every entity with a C has the required marker component for that value and only that marker component. This fragments the archetypes on the value of C, grouping all entities with the C component by its value, giving us maximum iteration performance. This isn't free however. First, changing the value of C now requires an archetype move. Secondly, we need 1 ComponentId per unique value of C currently in the World. Since there are only usize ComponentIds, that limits us to 2^64 unique values across all indexes (on 64-bit platforms, even worse on 32-bit). We could just hope users don't have too many unique values in the index, but this is an easy mistake to make. Indexing a component that internally stores just 2 usizes would allow for the index to totally exhaust all ComponentIds.

Final optimisation: use the absence of a marker component to contain information. Instead of marking all entities of a certain value with a single marker component, we could instead encode a binary address using With and Without of several marker components to build a unique filter. This allows us to address 2^64 unique values with only 64 unique marker components, exponentially fewer components required.

This still gives us fragmentation on C values for dense iteration, at the expense of a slightly more complex QueryFilter, since we need to exclude and include some combination of markers. This is the technique adopted by this PR.

Release Notes

Users can now query by-value using indexes. To setup an index, first create an immutable component with Clone, Eq, and Hash implementations:

#[derive(Component, PartialEq, Eq, Hash, Clone)]
#[component(immutable)]
struct TilePosition {
    x: i32,
    y: i32,
}

Next, request that an index be managed for this component:

app.add_index(IndexOptions::<TilePosition>::default());

Finally, you can query by-value using the new QueryByIndex system parameter:

fn query_for_tiles(mut query: QueryByIndex<TilePosition, (Entity, &TilePosition)>) {
    let mut lens = query.at(&TilePosition { x: 1, y: 1 });
    let count = lens.query().iter().count();

    println!("Found {count} at (1,1)!");
}

QueryByIndex is just like Query except it has a third generic argument, C, which denotes the component you'd like to use as the index:

// Typical query
query: Query<&Foo, With<Bar>>,

// Indexed query
query: QueryByIndex<MyIndex, &Foo, With<Bar>>,

Internally, the index will move entities with differing values into their own archetypes, allowing for fast iteration at the cost of some added overhead when modifying the indexed value. As such, indexes are best used where you have an infrequently changing component that you frequently need to access a subset of.

Additional control around how a component is indexed is provided by the IndexOptions argument passed to add_index. This allows controlling:

the storage type of the marker components (SparseSet or Table),
the address space used, limiting the number of marker components allocated
the storage back-end used to map component values to a unique ID number

#[derive(Component, PartialEq, Eq, PartialOrd, Ord, Clone)]
#[component(immutable)]
enum Planet {
    Mercury,
    Venus,
    Earth,
    Mars,
    Jupiter,
    Saturn,
    Uranus,
    Neptune,
}

app.add_index(IndexOptions::<Planet, _> {
    // SparseSet may give better performance if an indexed
    // value frequently changes.
    marker_storage: StorageType::Table,

    // The above Planet enum only has 8 possible values,
    // so only 4 bits are required instead of the implied 8.
    // By default, the index will use `size_of` to determine the
    // number of bits required, which can be excessive for certain types.
    address_space: 4,

    // An alternate storage back-end may be more appropriate
    // for your index component, always benchmark to confirm!
    index_storage: BTreeMap::<Planet, usize>::new(),

    ..default()
});

crates/bevy_ecs/src/index/mod.rs

bushrat011899 · 2025-01-30T04:15:24Z

Marking as ready-for-review as I have the finalised MVP API I would like for this feature. As mentioned, it would be nice to allow Ord instead of Hash as an alternative, but I'll keep it simple for now just to see if the underlying strategy of fragmenting on an immutable component's value using observers is valid.

crates/bevy_ecs/src/index/mod.rs

alice-i-cecile · 2025-01-30T04:26:55Z

crates/bevy_ecs/src/index/mod.rs

+    ///     }
+    /// }
+    /// ```
+    pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> {


Follow-up: this implies the ability to do between, which would be super sweet. Another argument for an Ord backend.

Definitely! Generalised storage backends could allow some really nice methods here for sure.

crates/bevy_ecs/src/index/mod.rs

Co-Authored-By: Alice Cecile <[email protected]>

Was required in an earlier draft of this PR Co-Authored-By: Alice Cecile <[email protected]>

Co-Authored-By: Alice Cecile <[email protected]>

crates/bevy_ecs/src/index/mod.rs

alice-i-cecile

Nice cleanup of the issues I raised; thanks!

I've left a few more suggestions on documenting performance drawbacks, capturing the discussion on Discord from earlier today. Once those are in, I'm willing to give this my blessing.

For posterity, this will cause issues with archetype fragmentation that affect various parts of Bevy, including the parallel scheduler (see #16784). These are broadly tracked by #17564.

That said, I think that this is a valuable feature (for some applications) today, and an excellent base to build off of. Indexes are a really critical tool for optimizing the performance characteristics of table-based data stores, and this is a robust design with good places for us to add options for users to test in their specific applications. If this design ends up being slower / worse than relatively simple "hashmap in a resource" designs, or we want to move to using a distinct marker component for each value, we can do that, without breaking end user code.

Finally the fact that this sort of thing is possible (and the fact that archetypal relations, value queries or other advanced features might be needed or desired in the future) means that we probably can't just dodge the perf problems that come with huge numbers of archetypes forever. This is a relatively advanced, performance-focused feature, which makes it more likely (but not guaranteed) that users playing with this are a) actually benchmarking the impact of using this feature/ changes we make to it b) unlikely to be introduced willy-nilly by crates that people are relying on. As a result, I think that this offers us a real world testbed for the optimizations proposed in #17564, which were previously only getting stress-tested on very artificial benchmarks at best.

…uently changing entities Co-Authored-By: Alice Cecile <[email protected]>

Co-Authored-By: Alice Cecile <[email protected]>

cart · 2025-02-06T23:25:39Z

crates/bevy_ecs/src/index/mod.rs

+/// This [`Resource`] is responsible for managing a value-to-[`ComponentId`] mapping, allowing
+/// [`QueryByIndex`] to simply filter by [`ComponentId`] on a standard [`Query`].
+#[derive(Resource)]
+struct Index<C: Component<Mutability = Immutable>> {


First, I do generally like the shape of this PR (relatively standalone observer-driven index resources with query helpers).

My biggest concern with this implementation is that it is very memory hungry.

3d_scene on its own idles at 360 MB in debug mode.

If I spawn an index on a 500x500 grid (250,000 entities) in 3d_scene, it uses 2.8 GB. Additionally, the spawn takes 17 seconds. This behaves the same if I split it into two component types each having 125,000 indices.

(note that in release mode the memory usage stays the same, but the 17 seconds drops to 3)

For comparison, if I just manually create a hashmap of the spawned entities, it uses 400 MB and starts instantly. I suspect the observer execution does play a role in the time expense (it would be interesting to see how much).

But from this perspective, the stated concern about the 2^64 = 18,446,744,070,000,000,000 value limit imposed by using a single ComponentId feels a bit silly. We could not possibly store that many values in memory using the system in place currently (we would need a computer with 2 x 10^14 GB of memory).

Probably related to this: each value constituting a new Archetype, and each value contributing a very large number (ex: 64) components to that archetype means each value creates 64 new ArchetypeComponentIds. In a world where you have 250,000 values, thats 16,000,000 new ArchetypeComponentIds. Note that this ArchetypeComponentId explosion also counteracts the "ComponentId identiy-space saving" we accomplish with the components-as-bits approach.

Under the current constraints, its feels very hard to endorse this approach. It seems like a simple value->ComponentId map would perform better and use less memory, which actually increases the number of unique indices we can use (even if theoretically we're limiting ourselves more).

Thanks for taking the time to read through the PR!

That level of memory usage is definitely unacceptable. Before going down this path of using ComponentIds to form an address I was unaware of the ArchetypeComponentId system and how poorly it interacts with the kind of abuse the addressing scheme imparts.

I will attempt to refactor this PR to allow choosing between this addressing scheme and a more simplified 1:1 mapping of value to ComponentId. Part of my concern with value to ComponentId 1:1 mapping is that complicates the QueryByIndex implementation. Right now it can know ahead of time exactly which marker components it needs to track, regardless of how many unique values are spawned.

Regarding spawn times, I definitely think that has to do with the observers and would be largely unavoidable. In the addressing scheme I insert all the required marker components in a single bundle, which should have the same cost as inserting a single marker component (creating the new archetype if required, moving the entity, etc.).

I am glad that you somewhat approve of the overall user-facing elements of the PR, as that should help this PR (or a followup if this goes south) by giving a relatively clear API to strive for.

Straight away, changing the storage type for the marker components to SparseSet shaves the memory usage down to 1.5GB instead of 2.8GB in my local testing. Since I have exposed this as a user configurable option I'll change the default to the more memory conservative option and let users who need iteration performance change it back.

Much more memory efficient

bushrat011899 · 2025-02-07T05:01:45Z

Initial testing appears to indicate that component-per-value would not solve the memory usage issues identified by Cart. Comparing the addressing technique used in this PR to just spawning a unique dynamic component with an entity:

Entities	Indexing with Addresses	Component per Value
10	1.6MB	1.7MB
100	1.8MB	2.1MB
1000	3.8MB	11.9MB
10000	23.2MB	789MB
50000	106MB	19.2GB
100000	213MB	OOM
1000000	2.57GB	OOM

From this, I'm confident in saying that without substantial changes to Bevy to allow component-per-value to not use so much memory, that the only way to do fragmenting indexing is via the addressing technique proposed in this PR. Non-fragmenting indexing, such as with a HashMap<C, EntityHashSet>, may still be preferable to this PR, I need to do more testing.

I will also note here that indexing memory usage drops substantially without bevy_render enabled (1.5GB down to 600MB), indicating there is an additional interaction here that needs investigation.

bushrat011899 · 2025-02-07T07:56:41Z

Ok I have created an as-close-as-possible replication of this PR except using a HashMap<T, EntityHashSet> as the way of mapping component values to entities. You can find a single-file Gist here, which also includes a benchmark comparing it against this PR. To run the benchmark, you must be on this branch (e.g., paste this benchmark into examples/empty.rs).

In this benchmark, I create 10,000,000 entities and place them on one of 10 planets randomly (seeded RNG for consistency). I also randomly add a pair of marker components to simulate having different archetypes within the indexed set (e.g., some entities are Players, some are Enemies, etc.). I then iterate all entities by planet. In the benchmark I just count them, but this is a stand-in for real work you may do on a per-entity basis. For example, in Avian, we may iterate over all Transforms by a particular physics world and compute collisions, etc.

With this PR, it takes 0.00003 seconds (30us) to complete this lookup. With the non-fragmenting alternative in the Gist it takes 0.17 seconds, (about 5,000x slower). This is to be expected, as the non-fragmenting version iterates out of order, potentially going back and forth across archetypes even. While the fragmenting version is able to densely iterate as if the user had inserted an ahead-of-time ZST marker to isolate their particular entities.

Comparing memory usage, non-fragmenting uses 770MB of RAM, while fragmenting uses 710MB. I am unsure why the fragmenting version is more memory efficient. It could be that since there are only 10 unique values, it is more memory efficient to only have 10 entries in a hashmap from C value to a set of ComponentIds as a marker compared to a mapping to EntityHashSet, which must store every entity.

Regardless, I believe that this technique is an acceptable compromise between the two next-best alternatives:

ComponentId-per-value has catastrophic memory consumption properties at moderate levels of indexing, making it trivial for users to OOM. In addition, registering and tracking these ComponentIds in a system parameter that can cleanly interface with Query was challenging for me in earlier iterations of this PR. Not impossible, but may require further changes to the rest of bevy_ecs.
HashMap<C, EntityHashSet> and other value-to-entity techniques sacrifice iteration performance by orders of magnitude. This may be suitable for unique or low-count lookups however, and so there is value in trying to offer this as either an option within this PR or a follow-up.

In my opinion, this PR strikes a careful middle-ground between potential footguns that makes it suitable as a first-party offering. The memory consumption observed is high and should be improved (startup times as well!), but the scaling this technique has makes it far better suited as a general purpose option that a user can reasonably reach for.

omaskery · 2025-02-07T12:31:41Z

Sorry if this has been considered already, it's sometimes hard to follow all the comments:

With the first optimisation technique discussed in the OP ("mapping of component value to list of entities with that value"):

As I understand it, the major downside was the iteration order being inefficient.
If these indexes are prioritising read performance over write performance, can't we change the index data structure to mitigate this?
- Example 1, sort entities e.g.: HashMap<ComponentValue, Vec<Entity>>
  - Tracking entities involves a sorted insert into the Vec<Entity> by (say) (TableID, RowInTable)
  - During iteration you can cache the table for successive entities
  - Depending on what the sort key was this would also incur overheads when tracked entities change archetype/table, as the vec would need sorting.
- Example 2, further divide the index by (say) table ID e.g.: HashMap<ComponentValue, Vec<(TableId, Vec<Entity>)>> (could be a nested map, let profiling guide you)
  - Tracking entities involves finding the correct vec based on the the component value & table ID, and doing a sorted insert based on the row number of the entity in that table
  - During iteration you can explicitly loop over the tables, same as example 1 but just explicit
  - The overheads of re-sorting when tracked entities change archetype is reduced slightly by having separate vecs for each table ID, turning it into some removes and inserts rather than a full sort over all entities with the same component value.
- Example 3, writing the above got me thinking about storing data in key-value stores like TiKV, where you can get efficient ordered traversal and fast lookup through your key design.
  - SortedHashMap<(ComponentValue, TableId, RowInTable, Entity), MaybeUnusedMaybeMetadataIdk>
  - Tracking entities involves inserting into the hash map
  - During iteration you find the row numbers for (ComponentValue, *, *, *) and (ComponentValue + 1, *, *, *), then just iterate the rows (exclusive of the end row with component value + 1).
  - When an entity changes archetype you remove its previous entry and insert a new one.
  - EDIT: doing a bit of reading online, it seems the data structure for this in TiKV (based on RockDB) is a skiplist.

I assume where all these thoughts fall down is that, fundamentally, maintaining the ordering requires adding overhead to all architype moves. Whereas the previously discussed solutions only add overhead to (the rare) mutations of the component being tracked by the index.

maniwani · 2025-02-07T16:14:12Z

Do we need to use component IDs at all?

If I've understood things so far, you've tried

filtering entities in the inner loop on HashMap<T, EntityHashSet>
filtering archetypes in the outer loop (building the query cache) and entities in the inner loop by fragmenting archetypes based on component value
- by using 64 ComponentId for each type to encode each unique value OR
- by using 1 ComponentId for each unique value

You've noted a huge performance gap between fragmenting the archetypes and just filtering entities using a HashMap<T, EntityHashSet>, but I wonder if we could mostly attribute that to the difference in archetype filtering.

For a fairer comparison, I think you could still filter archetypes in the non-fragmenting case. You might get performance close to the fragmenting strategy without using as much memory.

IIRC a query builds (or could build) its cache by iterating the list of archetypes matching its first term and filtering them using the other terms. If an archetype passes all terms, it's added to the cache and its entities will be iterated.

If you're using observers to maintain a HashMap<T, HashSet<Entity>>, you can also maintain a HashMap<T, HashMap<ArchetypeId, usize>>, which can be used just like a component term to filter archetypes going into the query cache.

If a query uses this value index, you'd then have to rebuild its cache whenever an archetype is added to or removed from it.

bushrat011899 · 2025-02-08T00:47:30Z

Ok based on discussions on discord and feedback here (both of which I'm incredibly appreciative of!) I have decided to change my focus here. Clearly the question of how to index entities is far from settled. But there is consensus that it should cleanly interact with Query. Instead of providing a first party indexing solution that is a jack of all trades and master of none, I am going to change this PR to instead provide a consistent API that users can supply their own indexing solution into, and then provide a good-enough implementation as an example.

After sleeping on it, I believe QueryFilter and the associated query iteration types can be slightly extended to allow supplying the QueryFilter with information from within the body of a system. My goal API is:

let query: Query<Foo, ByIndex<Planet, (With<Bar>, ...)>>;

for foo in query.iter().provide_filter(&Planet("Earth")) {
    // ...
}

Since this would be used to reduce the total results from a query (either by archetype or by row as determined by the filter itself) I don't foresee any issues around access or safety.

Once (and if) I have a working prototype I will close this PR in favor of the new proposal.

cart · 2025-02-08T01:48:46Z

Initial testing appears to indicate that component-per-value would not solve the memory usage issues identified by Cart. Comparing the addressing technique used in this PR to just spawning a unique dynamic component with an entity

Yup I whipped up a quick impl myself and also got these results. A big contributor to the memory usage is SparseSet<ComponentId, X>. By swapping that out for a HashSet in Table and Archetype I get 1.1 GB with the "componentid -> value map" impl (which is better than the impl in this PR, but still suboptimal ... I suspect adjusting our approach to Access might also help). #14928 attempted to prepare us for this (although it uses a slightly different approach). I think embarking on the quest to make higher componentids less expensive is the right path to be on. Given that folks have already been working on this (@james-j-obrien, @cBournhonesque , @chescock, @SanderMertens), I'm guessing we could make progress on that in short order.

Also note that I'm pretty interested in making "value components" a first-class internalized concept (ex: generate and store the maps internally instead of using observers). That was one part of my "incremental fragmenting relations" path. Doing that would also require solving this exact problem space (as they're essentially the same implementation).

Instead of providing a first party indexing solution that is a jack of all trades and master of none, I am going to change this PR to instead provide a consistent API that users can supply their own indexing solution into, and then provide a good-enough implementation as an example.

I like this plan. I think we should come back to built-in fragmenting indexing once we sort out the "high componentids" problem.

SanderMertens · 2025-02-08T03:38:38Z

From the discussion yesterday, these are the numbers in Flecs when creating a component per value:

Entities/values	Memory	Time to create
10	1.4 MB	6 us
100	1.5 MB	60 us
1000	2.5 MB	474 us
10000	12.8 MB	4.5 ms
50000	57.2 MB	20.7 ms
100000	113.9 MB	42.2 ms
250000	282.9 MB	125 ms
1000000	1.09 GB	589 ms

Can be reproduced with:

#include "flecs.h"

int main() {
  ecs_world_t *world = ecs_mini();

  for (int i = 0; i < 1000 * 1000; i ++) {
    ecs_entity_t v = ecs_new(world);
    ecs_entity_t e = ecs_new(world);
    ecs_add_id(world, e, v);
  }

  while (true) {}

  return ecs_fini(world);
}

BD103

I just reviewed the benchmarks, since I have no qualifications to look over the actual ECS implementation. The benchmarks overall look good, though I do have a few questions!

BD103 · 2025-02-08T15:35:49Z

benches/benches/bevy_ecs/index/index_iter_indexed.rs

+fn find_planet_zeroes_indexed(query: QueryByIndex<Planet, &Planet>) {
+    let mut query = query.at(&Planet(0));
+    for planet in query.query().iter() {
+        let _ = black_box(planet);


Suggested change

let _ = black_box(planet);

black_box(planet);

(Nit) The let _ = isn't necessary, since black_box() forces the expression it is passed to be evaluated.

In practice, black_box serves two purposes:

It prevents the compiler from making optimizations related to the value returned by black_box

It forces the value passed to black_box to be calculated, even if the return value of black_box is unused

black_box() - How to use this

BD103 · 2025-02-08T15:37:51Z

benches/benches/bevy_ecs/index/mod.rs

+    group.warm_up_time(core::time::Duration::from_millis(500));
+    group.measurement_time(core::time::Duration::from_secs(4));


Is there a specific reason you chose these times, which are smaller than the default? criterion by default has a warm-up time of 3 seconds and a measurements time of 5 seconds.

BD103 · 2025-02-08T15:40:46Z

benches/benches/bevy_ecs/index/index_iter_naive.rs

+        Self(world, id)
+    }
+
+    #[inline(never)]


Why did you choose to add #[inline(never)]? It seems counter-intuitive to me, since we're just trying to benchmark the contained run_system().

Initial Commit

001b227

bushrat011899 marked this pull request as draft January 30, 2025 01:15

Update Examples

44408ba

alice-i-cecile reviewed Jan 30, 2025

View reviewed changes

crates/bevy_ecs/src/index/mod.rs Outdated Show resolved Hide resolved

bushrat011899 added 4 commits January 30, 2025 12:32

Address CI

df42b47

Create system param and expose via App

8f6069a

CI

9c66cd2

Docs

5b10abb

bushrat011899 changed the title ~~DRAFT: Query by Indexed Component Value~~ Query by Indexed Component Value Jan 30, 2025

bushrat011899 marked this pull request as ready for review January 30, 2025 04:13

bushrat011899 added S-Needs-Review Needs reviewer attention (from anyone!) to move forward S-Needs-SME Decision or review from an SME is required and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Jan 30, 2025