-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query by Indexed Component Value #17608
base: main
Are you sure you want to change the base?
Conversation
Marking as ready-for-review as I have the finalised MVP API I would like for this feature. As mentioned, it would be nice to allow |
crates/bevy_ecs/src/index/mod.rs
Outdated
/// } | ||
/// } | ||
/// ``` | ||
pub fn at(&mut self, value: &C) -> Query<'_, '_, D, (F, With<C>)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up: this implies the ability to do between
, which would be super sweet. Another argument for an Ord
backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! Generalised storage backends could allow some really nice methods here for sure.
Co-Authored-By: Alice Cecile <[email protected]>
Was required in an earlier draft of this PR Co-Authored-By: Alice Cecile <[email protected]>
Co-Authored-By: Alice Cecile <[email protected]>
Co-Authored-By: Alice Cecile <[email protected]>
Co-Authored-By: Alice Cecile <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup of the issues I raised; thanks!
I've left a few more suggestions on documenting performance drawbacks, capturing the discussion on Discord from earlier today. Once those are in, I'm willing to give this my blessing.
For posterity, this will cause issues with archetype fragmentation that affect various parts of Bevy, including the parallel scheduler (see #16784). These are broadly tracked by #17564.
That said, I think that this is a valuable feature (for some applications) today, and an excellent base to build off of. Indexes are a really critical tool for optimizing the performance characteristics of table-based data stores, and this is a robust design with good places for us to add options for users to test in their specific applications. If this design ends up being slower / worse than relatively simple "hashmap in a resource" designs, or we want to move to using a distinct marker component for each value, we can do that, without breaking end user code.
Finally the fact that this sort of thing is possible (and the fact that archetypal relations, value queries or other advanced features might be needed or desired in the future) means that we probably can't just dodge the perf problems that come with huge numbers of archetypes forever. This is a relatively advanced, performance-focused feature, which makes it more likely (but not guaranteed) that users playing with this are a) actually benchmarking the impact of using this feature/ changes we make to it b) unlikely to be introduced willy-nilly by crates that people are relying on. As a result, I think that this offers us a real world testbed for the optimizations proposed in #17564, which were previously only getting stress-tested on very artificial benchmarks at best.
…uently changing entities Co-Authored-By: Alice Cecile <[email protected]>
Co-Authored-By: Alice Cecile <[email protected]>
/// This [`Resource`] is responsible for managing a value-to-[`ComponentId`] mapping, allowing | ||
/// [`QueryByIndex`] to simply filter by [`ComponentId`] on a standard [`Query`]. | ||
#[derive(Resource)] | ||
struct Index<C: Component<Mutability = Immutable>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I do generally like the shape of this PR (relatively standalone observer-driven index resources with query helpers).
My biggest concern with this implementation is that it is very memory hungry.
3d_scene on its own idles at 360 MB in debug mode.
If I spawn an index on a 500x500 grid (250,000 entities) in 3d_scene, it uses 2.8 GB. Additionally, the spawn takes 17 seconds. This behaves the same if I split it into two component types each having 125,000 indices.
(note that in release mode the memory usage stays the same, but the 17 seconds drops to 3)
For comparison, if I just manually create a hashmap of the spawned entities, it uses 400 MB and starts instantly. I suspect the observer execution does play a role in the time expense (it would be interesting to see how much).
But from this perspective, the stated concern about the 2^64 = 18,446,744,070,000,000,000 value limit imposed by using a single ComponentId feels a bit silly. We could not possibly store that many values in memory using the system in place currently (we would need a computer with 2 x 10^14 GB of memory).
Probably related to this: each value constituting a new Archetype, and each value contributing a very large number (ex: 64) components to that archetype means each value creates 64 new ArchetypeComponentIds. In a world where you have 250,000 values, thats 16,000,000 new ArchetypeComponentIds. Note that this ArchetypeComponentId explosion also counteracts the "ComponentId identiy-space saving" we accomplish with the components-as-bits approach.
Under the current constraints, its feels very hard to endorse this approach. It seems like a simple value->ComponentId
map would perform better and use less memory, which actually increases the number of unique indices we can use (even if theoretically we're limiting ourselves more).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to read through the PR!
That level of memory usage is definitely unacceptable. Before going down this path of using ComponentId
s to form an address I was unaware of the ArchetypeComponentId
system and how poorly it interacts with the kind of abuse the addressing scheme imparts.
I will attempt to refactor this PR to allow choosing between this addressing scheme and a more simplified 1:1 mapping of value
to ComponentId
. Part of my concern with value
to ComponentId
1:1 mapping is that complicates the QueryByIndex
implementation. Right now it can know ahead of time exactly which marker components it needs to track, regardless of how many unique values are spawned.
Regarding spawn times, I definitely think that has to do with the observers and would be largely unavoidable. In the addressing scheme I insert all the required marker components in a single bundle, which should have the same cost as inserting a single marker component (creating the new archetype if required, moving the entity, etc.).
I am glad that you somewhat approve of the overall user-facing elements of the PR, as that should help this PR (or a followup if this goes south) by giving a relatively clear API to strive for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Straight away, changing the storage type for the marker components to SparseSet
shaves the memory usage down to 1.5GB instead of 2.8GB in my local testing. Since I have exposed this as a user configurable option I'll change the default to the more memory conservative option and let users who need iteration performance change it back.
Much more memory efficient
Initial testing appears to indicate that component-per-value would not solve the memory usage issues identified by Cart. Comparing the addressing technique used in this PR to just spawning a unique dynamic component with an entity:
From this, I'm confident in saying that without substantial changes to Bevy to allow component-per-value to not use so much memory, that the only way to do fragmenting indexing is via the addressing technique proposed in this PR. Non-fragmenting indexing, such as with a I will also note here that indexing memory usage drops substantially without |
Ok I have created an as-close-as-possible replication of this PR except using a In this benchmark, I create 10,000,000 entities and place them on one of 10 planets randomly (seeded RNG for consistency). I also randomly add a pair of marker components to simulate having different archetypes within the indexed set (e.g., some entities are With this PR, it takes 0.00003 seconds (30us) to complete this lookup. With the non-fragmenting alternative in the Gist it takes 0.17 seconds, (about 5,000x slower). This is to be expected, as the non-fragmenting version iterates out of order, potentially going back and forth across archetypes even. While the fragmenting version is able to densely iterate as if the user had inserted an ahead-of-time ZST marker to isolate their particular entities. Comparing memory usage, non-fragmenting uses 770MB of RAM, while fragmenting uses 710MB. I am unsure why the fragmenting version is more memory efficient. It could be that since there are only 10 unique values, it is more memory efficient to only have 10 entries in a hashmap from Regardless, I believe that this technique is an acceptable compromise between the two next-best alternatives:
In my opinion, this PR strikes a careful middle-ground between potential footguns that makes it suitable as a first-party offering. The memory consumption observed is high and should be improved (startup times as well!), but the scaling this technique has makes it far better suited as a general purpose option that a user can reasonably reach for. |
Sorry if this has been considered already, it's sometimes hard to follow all the comments: With the first optimisation technique discussed in the OP ("mapping of component value to list of entities with that value"):
I assume where all these thoughts fall down is that, fundamentally, maintaining the ordering requires adding overhead to all architype moves. Whereas the previously discussed solutions only add overhead to (the rare) mutations of the component being tracked by the index. |
Do we need to use component IDs at all? If I've understood things so far, you've tried
You've noted a huge performance gap between fragmenting the archetypes and just filtering entities using a For a fairer comparison, I think you could still filter archetypes in the non-fragmenting case. You might get performance close to the fragmenting strategy without using as much memory. IIRC a query builds (or could build) its cache by iterating the list of archetypes matching its first term and filtering them using the other terms. If an archetype passes all terms, it's added to the cache and its entities will be iterated. If you're using observers to maintain a If a query uses this value index, you'd then have to rebuild its cache whenever an archetype is added to or removed from it. |
Ok based on discussions on discord and feedback here (both of which I'm incredibly appreciative of!) I have decided to change my focus here. Clearly the question of how to index entities is far from settled. But there is consensus that it should cleanly interact with After sleeping on it, I believe let query: Query<Foo, ByIndex<Planet, (With<Bar>, ...)>>;
for foo in query.iter().provide_filter(&Planet("Earth")) {
// ...
} Since this would be used to reduce the total results from a query (either by archetype or by row as determined by the filter itself) I don't foresee any issues around access or safety. Once (and if) I have a working prototype I will close this PR in favor of the new proposal. |
Yup I whipped up a quick impl myself and also got these results. A big contributor to the memory usage is Also note that I'm pretty interested in making "value components" a first-class internalized concept (ex: generate and store the maps internally instead of using observers). That was one part of my "incremental fragmenting relations" path. Doing that would also require solving this exact problem space (as they're essentially the same implementation).
I like this plan. I think we should come back to built-in fragmenting indexing once we sort out the "high componentids" problem. |
From the discussion yesterday, these are the numbers in Flecs when creating a component per value:
Can be reproduced with: #include "flecs.h"
int main() {
ecs_world_t *world = ecs_mini();
for (int i = 0; i < 1000 * 1000; i ++) {
ecs_entity_t v = ecs_new(world);
ecs_entity_t e = ecs_new(world);
ecs_add_id(world, e, v);
}
while (true) {}
return ecs_fini(world);
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reviewed the benchmarks, since I have no qualifications to look over the actual ECS implementation. The benchmarks overall look good, though I do have a few questions!
fn find_planet_zeroes_indexed(query: QueryByIndex<Planet, &Planet>) { | ||
let mut query = query.at(&Planet(0)); | ||
for planet in query.query().iter() { | ||
let _ = black_box(planet); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let _ = black_box(planet); | |
black_box(planet); |
(Nit) The let _ =
isn't necessary, since black_box()
forces the expression it is passed to be evaluated.
In practice, black_box serves two purposes:
- It prevents the compiler from making optimizations related to the value returned by black_box
- It forces the value passed to black_box to be calculated, even if the return value of black_box is unused
group.warm_up_time(core::time::Duration::from_millis(500)); | ||
group.measurement_time(core::time::Duration::from_secs(4)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you chose these times, which are smaller than the default? criterion
by default has a warm-up time of 3 seconds and a measurements time of 5 seconds.
Self(world, id) | ||
} | ||
|
||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to add #[inline(never)]
? It seems counter-intuitive to me, since we're just trying to benchmark the contained run_system()
.
Objective
Solution
Users can now call
app.add_index(...)
for an appropriate componentC
to have Bevy automatically track its value and provide a high-performance archetypical indexing solution. Users can then query by-value using theQueryByIndex
system parameter all entities with a specific indexed value. The value is provided within the body of a system, meaningQueryByIndex
can access any entity matching the provided compile-time query filters.This is achieved by adding
OnInsert
andOnReplace
observers and requiring that the index componentC
be immutable, ensuring those two hooks are able to capture all mutations. Since this is done with observers, users can index 3rd party components, provided they are immutable and implement the traits required by the index storage back-end chosen.By default, indexing uses a
HashMap
as its backing storage, which requires that the indexed component implementsEq + Hash + Clone
. If these traits aren't available on the component to be indexed, users can either useBTreeMap
as a back-end, only requiringOrd + Clone
, or implement theIndexStorage
trait on their own type. Any storage back-end can be used with an index, but only one index will ever exist for a given component.Within the observers, a private index resource is managed which tracks the component value against a collection of runtime marker components. These runtime marker components are used to group entities by-value at the archetype level for the index component. By using multiple marking components, their combination can be used to uniquely address a particular component value, allowing a relatively small number of markers to totally service even complex components.
This grouping allows for maximum performance in iteration, since querying is simply filtering for a hidden component by-archetype.
To-Do / Follow-up
Testing
index
examplePerformance
I've added some
index
benchmarks to theecs
bench which test iteration speed (index_iter
) and mutation speed (index_update
). Running these benchmarks locally, I see a large performance improvement (20x) when looking for rare values amongst a large selection of entities (1-in-1,000 amongst 1,000,000), and a notable performance loss (6x) when mutating indexed components. This confirms expectation: indexing is only a performance win in scenarios where a value is rarely updated but frequently searched for.Indexing Strategy
To find an
Entity
with a particular value of a componentC
, a first choice might just be to usefilter
on aQuery
. No setup required, no cache invalidation, nice and simple.Big downside to this approach is you need to iterate all entities matching your
Query
. If most entities match your filter, this may be perfectly fine, but if you have a large number of entities to search, and only a small number with the value you're searching for, it quickly becomes too costly.First optimisation available: store a mapping from
C
toEntity
and useQuery::get
.This avoids iterating all entities, but introduces some new problems to solve. First, multiple entities could have the same value for
C
, so we need a mapping fromC
to many entities such as anEntityHashSet
. Next, iteration performance suffers as we're no longer iterating over entities and tables in-order; we're randomly selecting entities based on however they're stored in our index. For the best performance, we want to iterate theQuery
and check against the index, just "somehow" skipping entities that don't match our value.Second optimisation: use marker components to group entities with like
C
values.What we do here is we instead map values of
C
toComponentId
s rather thanEntity
s, and then ensure every entity with aC
has the required marker component for that value and only that marker component. This fragments the archetypes on the value ofC
, grouping all entities with theC
component by its value, giving us maximum iteration performance. This isn't free however. First, changing the value ofC
now requires an archetype move. Secondly, we need 1ComponentId
per unique value ofC
currently in theWorld
. Since there are onlyusize
ComponentId
s, that limits us to 2^64 unique values across all indexes (on 64-bit platforms, even worse on 32-bit). We could just hope users don't have too many unique values in the index, but this is an easy mistake to make. Indexing a component that internally stores just 2usize
s would allow for the index to totally exhaust allComponentId
s.Final optimisation: use the absence of a marker component to contain information. Instead of marking all entities of a certain value with a single marker component, we could instead encode a binary address using
With
andWithout
of several marker components to build a unique filter. This allows us to address 2^64 unique values with only 64 unique marker components, exponentially fewer components required.This still gives us fragmentation on
C
values for dense iteration, at the expense of a slightly more complexQueryFilter
, since we need to exclude and include some combination of markers. This is the technique adopted by this PR.Release Notes
Users can now query by-value using indexes. To setup an index, first create an immutable component with
Clone
,Eq
, andHash
implementations:Next, request that an index be managed for this component:
Finally, you can query by-value using the new
QueryByIndex
system parameter:QueryByIndex
is just likeQuery
except it has a third generic argument,C
, which denotes the component you'd like to use as the index:Internally, the index will move entities with differing values into their own archetypes, allowing for fast iteration at the cost of some added overhead when modifying the indexed value. As such, indexes are best used where you have an infrequently changing component that you frequently need to access a subset of.
Additional control around how a component is indexed is provided by the
IndexOptions
argument passed toadd_index
. This allows controlling:SparseSet
orTable
),