Make state resolution faster #1922

alexeyshch · 2024-08-09T20:05:16Z

Rationale

State resolution v2 is using some graph algorithms, which can make it work in at least linear time of the number of state events in a room DAG and create a big load on servers.

The goal of this issue is to try to discuss and develop changes to state resolution to make it work in $O(n \log n)$ time total when processing a room with $n$ state events (i.e. $O(\log n)$ average) in realistic scenarios, while keeping good user experience.

What described below is closer to state resolution v1, but trying to fix state resets in a different way.

Use a suitable data structure with key versions

For state maps use the data structure based on ¹. Instead of mapping
$$(\mathrm{event\_type}, \mathrm{state\_key}) \to \mathrm{event\_id}$$
we will use
$$(\mathrm{event\_type}, \mathrm{state\_key}) \to (\mathrm{version}, \mathrm{event\_id}),$$
where version is incremented every time event_id for the key is changed.

When merging two states, a value with higher version takes precedence on
conflict (if both versions are equal, then the bigger value wins). Here is a slightly modified version from ¹ (in pseudocode):

merge(t1, Leaf) = t1
merge(Leaf, t2) = t2
merge(t1, t2) =
   if t1.hash = t2.hash
   then t1
   else if (t1.hash, t2.hash) in merge_cache
   then merge_cache[(t1.hash, t2.hash)]
   else 
      t =
         if t1.key = t2.key then
            if t1.version < t2.version ||
               t1.version = t2.version && t1.value < t2.value
            then Node(t2, merge(t1.left, t2.left),
                          merge(t1.right, t2.right))
            else Node(t1, merge(t1.left, t2.left)
                          merge(t1.right, t2.right))
         else if t1.priority < t2.priority then
            (l, r) = split(t1, t2.key)
            Node(t2, merge(l, t2.left),
                     merge(r, t2.right))
         else
            (l, r) = split(t2, t1.key)
            Node(t1, merge(t1.left, l),
                     merge(t1.right, r))
      merge_cache[(t1.hash, t2.hash)] = t
      t

Another difference from ¹ is caching of merge results, it gives significant speed ups in some scenarios in my tests.

merge is commutative and associative, to resolve several states just merge them one by one.

State resets

Full state resets are not possible, but they can be done per key. E.g. if a user on a malicious server was banned, then the server can create a chain of join/leave events for that user making the version higher, and then on resolving it will overwrite the ban event.

To avoid that, I suggest to use different event types for events that require privileges and that are not. E.g. m.room.member can be used for join and leave events, and m.room.affiliation for invite and ban.

Making state difference smaller

Note, that many algorithms in ¹ take $O(d\, \log(n/d))$ time, where $n$ is the size of the bigger map, and $d$ is the size of the difference between maps. But that difference is not the number of different keys, but the number of blocks of consecutive keys with difference if you sort their union.

If key comparator first uses host part of state_key (where applicable) when comparing keys, then events from the same server are more likely to be grouped together. I guess it might be helpful in a situation, when one server is disconnected from the rest of the world for a while, but have active users.

Another heuristic

If some server creates an event with prev_events pointing to a state map half the size of the current, then merging it will still take $O(n)$ time. But on a server with active users in the room it's still possible to do it efficiently:

When a new event is created, it points to all known leafs in its DAG, making it an ancestor for all currently known events and the only leaf. Let's call such vertices/events "narrow". So when a server sees only one leaf in a DAG, it marks it as narrow. Note, that different servers can have different sets of narrow vertices. For the rest of events, it stores the latest reachable narrow event (i.e. the one with bigger depth from narrow events of prev_events).

Suppose we need to merge state maps of events $A$ and $B$. Let $C$ and $D$ be their latest reachable narrow events, and $C.\mathrm{depth} < D.\mathrm{depth}$. Then $C$ is a common ancestor of $A$ and $B$ and the difference between $C$ and $A$ is usually small. Then
$$\mathrm{merge}(A, B) = \mathrm{merge}(\mathrm{diff}(C, A), B),$$

where $\mathrm{diff}(C, A)$ returns elements in $A$ not present in $C$, or having higher version, or same version and bigger value than in $C$. It can be implemented similar to difference in ¹ with the same time cost.

If the size of $\mathrm{diff}(C, A)$ is $d$, then it takes $O(d\, \log(n/d))$ time instead of $O(n)$ in the worst case.

Worst case

Now, I believe, this version of state resolution should work in logarithmic time for legitimate cases. To make it linear, malicious servers would need to create a huge chain of state events and then events merging random parts of that chain. With reasonable rate limiting it can be spotted early and administrative actions taken.

Implementation details

The effectiveness of the data structure in ¹ relies on uniform distribution of key hashes. To avoid a possibility of an attacker finding a set of keys with bad hashes distribution, the hash function should be seeded with a random value per server or room.

To minimize the risk of hash collisions, at least 128-bit hash function should be used, probably better 256-bit to be on a safe side.

Summary of spec changes

State updates

If there is no event_type and state_key pair in the current state, then a new entry is inserted with the value (1, event_id). Otherwise, the value (version, _) is replaced with (version + 1, event_id).

State resolution algorithm

The resolved state is the union of the states to be resolved, on conflict the bigger value wins.

Event authorization

Split the rule for m.room.member into two for m.room.member/join/leave/knock and m.room.affiliation/invite/ban.

Do you see any problems with that approach?

Olle Liljenzin. Confluently Persistent Sets and Maps ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶

The text was updated successfully, but these errors were encountered:

ara4n · 2024-08-16T09:57:55Z

@alexeyshch many thanks for thinking through this deeply - it's much appreciated. @erikjohnston - you took a quick look at this and said you were worried about how auth works; can you elaborate please?

neilalexander · 2024-08-16T10:21:03Z

you were worried about how auth works

I can probably answer this — state res relies on iteratively applying events on the basis that the auth rules work on the working set of state events. This doesn't look like it does anything of the sort, it's almost just CRDT-like conflict resolution, which doesn't match what iterative event auth does.

alexeyshch · 2024-08-16T16:06:51Z

Event auth happens when new events are added, this version of state resolution just chooses which of already authed events to use per key. Do you see a scenario where it leads to undesired behavior?

alexeyshch added the improvement An idea/future MSC for the spec label Aug 9, 2024

turt2live added room-vNext An idea which will require a bump in room version A-Room-spec Something to do with the room version specifications labels Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make state resolution faster #1922

Make state resolution faster #1922

alexeyshch commented Aug 9, 2024 •

edited

Loading

ara4n commented Aug 16, 2024

neilalexander commented Aug 16, 2024 •

edited

Loading

alexeyshch commented Aug 16, 2024

Make state resolution faster #1922

Make state resolution faster #1922

Comments

alexeyshch commented Aug 9, 2024 • edited Loading

Rationale

Use a suitable data structure with key versions

State resets

Making state difference smaller

Another heuristic

Worst case

Implementation details

Summary of spec changes

State updates

State resolution algorithm

Event authorization

Footnotes

ara4n commented Aug 16, 2024

neilalexander commented Aug 16, 2024 • edited Loading

alexeyshch commented Aug 16, 2024

alexeyshch commented Aug 9, 2024 •

edited

Loading

neilalexander commented Aug 16, 2024 •

edited

Loading