Make state resolution faster #1922
Labels
A-Room-spec
Something to do with the room version specifications
improvement
An idea/future MSC for the spec
room-vNext
An idea which will require a bump in room version
Rationale
State resolution v2 is using some graph algorithms, which can make it work in at least linear time of the number of state events in a room DAG and create a big load on servers.
The goal of this issue is to try to discuss and develop changes to state resolution to make it work in$O(n \log n)$ time total when processing a room with $n$ state events (i.e. $O(\log n)$ average) in realistic scenarios, while keeping good user experience.
What described below is closer to state resolution v1, but trying to fix state resets in a different way.
Use a suitable data structure with key versions
For state maps use the data structure based on 1. Instead of mapping
$$(\mathrm{event\_type}, \mathrm{state\_key}) \to \mathrm{event\_id}$$
$$(\mathrm{event\_type}, \mathrm{state\_key}) \to (\mathrm{version}, \mathrm{event\_id}),$$
we will use
where
version
is incremented every timeevent_id
for the key is changed.When merging two states, a value with higher version takes precedence on
conflict (if both versions are equal, then the bigger value wins). Here is a slightly modified version from 1 (in pseudocode):
Another difference from 1 is caching of
merge
results, it gives significant speed ups in some scenarios in my tests.merge
is commutative and associative, to resolve several states just merge them one by one.State resets
Full state resets are not possible, but they can be done per key. E.g. if a user on a malicious server was banned, then the server can create a chain of join/leave events for that user making the version higher, and then on resolving it will overwrite the ban event.
To avoid that, I suggest to use different event types for events that require privileges and that are not. E.g.
m.room.member
can be used forjoin
andleave
events, andm.room.affiliation
forinvite
andban
.Making state difference smaller
Note, that many algorithms in 1 take$O(d\, \log(n/d))$ time, where $n$ is the size of the bigger map, and $d$ is the size of the difference between maps. But that difference is not the number of different keys, but the number of blocks of consecutive keys with difference if you sort their union.
If key comparator first uses host part of
state_key
(where applicable) when comparing keys, then events from the same server are more likely to be grouped together. I guess it might be helpful in a situation, when one server is disconnected from the rest of the world for a while, but have active users.Another heuristic
If some server creates an event with$O(n)$ time. But on a server with active users in the room it's still possible to do it efficiently:
prev_events
pointing to a state map half the size of the current, then merging it will still takeWhen a new event is created, it points to all known leafs in its DAG, making it an ancestor for all currently known events and the only leaf. Let's call such vertices/events "narrow". So when a server sees only one leaf in a DAG, it marks it as narrow. Note, that different servers can have different sets of narrow vertices. For the rest of events, it stores the latest reachable narrow event (i.e. the one with bigger depth from narrow events of
prev_events
).Suppose we need to merge state maps of events$A$ and $B$ . Let $C$ and $D$ be their latest reachable narrow events, and $C.\mathrm{depth} < D.\mathrm{depth}$ . Then $C$ is a common ancestor of $A$ and $B$ and the difference between $C$ and $A$ is usually small. Then
$$\mathrm{merge}(A, B) = \mathrm{merge}(\mathrm{diff}(C, A), B),$$
where$\mathrm{diff}(C, A)$ returns elements in $A$ not present in $C$ , or having higher version, or same version and bigger value than in $C$ . It can be implemented similar to
difference
in 1 with the same time cost.If the size of$\mathrm{diff}(C, A)$ is $d$ , then it takes $O(d\, \log(n/d))$ time instead of $O(n)$ in the worst case.
Worst case
Now, I believe, this version of state resolution should work in logarithmic time for legitimate cases. To make it linear, malicious servers would need to create a huge chain of state events and then events merging random parts of that chain. With reasonable rate limiting it can be spotted early and administrative actions taken.
Implementation details
The effectiveness of the data structure in 1 relies on uniform distribution of key hashes. To avoid a possibility of an attacker finding a set of keys with bad hashes distribution, the hash function should be seeded with a random value per server or room.
To minimize the risk of hash collisions, at least 128-bit hash function should be used, probably better 256-bit to be on a safe side.
Summary of spec changes
State updates
If there is no
event_type
andstate_key
pair in the current state, then a new entry is inserted with the value(1, event_id)
. Otherwise, the value(version, _)
is replaced with(version + 1, event_id)
.State resolution algorithm
The resolved state is the union of the states to be resolved, on conflict the bigger value wins.
Event authorization
Split the rule for
m.room.member
into two form.room.member
/join
/leave
/knock
andm.room.affiliation
/invite
/ban
.Do you see any problems with that approach?
Footnotes
Olle Liljenzin. Confluently Persistent Sets and Maps ↩ ↩2 ↩3 ↩4 ↩5 ↩6
The text was updated successfully, but these errors were encountered: