-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Element-web's change to sync based presence breaks synapse process #16039
Comments
Hey there, did you happen to catch how long ago this started? |
It got especially bad after the 25.7.23 where it made a major overall increase: Interesting enough the major peaks for cpu usage seem to be on the sync rather than presence. But I am not sure if thats because it includes the presence into it since it was set via sync? Because in terms of request amount the peaks are presence replication and not sync: Also worth noting is that fed sender seems to heavily be doing a single DB op now: Not sure if that one is related or not. If needed I could hook up the tracing too and export a trace maybe. |
Since I have a sliding sync proxy too #15980 may be related too? |
this might also be not just simply be the different api used, but could be due to a conflict between different clients. i've noticed some strange behaviour with element desktop nightly idling on windows (yellow activity indicator) and then going online from schildichat android 1.6.5.sc70-test5 F-Droid which leads to the activity indicator to oscillate between yellow and green. |
We also noticed significantly increased CPU usage of the federation-sender and generic workers along with about 10x the database load after matrix-org/matrix-react-sdk#11223 was merged and landed in Element-Nightly. |
Sounds like Synapse may have a special-case for no
|
@HarHarLinks that was also reported as element-hq/element-web#25900 |
This is not the case as far as I can see: synapse/synapse/rest/client/sync.py Lines 123 to 128 in d0c4257
|
We also noticed significantly increased CPU usage of the presence and generic workers (serving
It looks good after disabling presence:
|
I've noticed the same thing here. Specifically, when I have one instance of Element idle while I'm using another one, they seem to argue about whether I'm online or away, and get constant sync requests returned only containing presence updated by the other instance. Closing the idle instance stopped the activity indicator flickering. |
This isn't an issue with Element Web. Any client sending |
Even after I updated element to 1.11.38 where this issue should be fixed I still see a lot of those errors in the synapse log, that could be related to presence:
Any ideas how to solve this without disabling presence at all? Applying the element update at least really fixed the high cpu load for me. |
One observation I made since ele-web reverted the change on their side is that synapse seems to not free the resources fully up until you restart it. Both memory stayed high and cpu only partially dropped. The federation-sender workers only recovered after a restart (I assume that cleared the pending queue for replication) |
I believe this is a duplicate of #16057. |
Description
Prior issue over at element-web which has been closed as being a synapse issue rather than element-web's: element-hq/element-web#25874
Since element-web switched to doing presence using /sync vs using the presence API my synapse is using extremely high amounts of CPU due to presence requests happening. It also seems to kill both fed senders from the amount of work they have to do (I assume it is just the metrics endpoint timing out since kubernetes doesnt see them as broken).
The suspicion it being caused by the element-web change is because the peaks align perfectly with the time I start my element-web clients.
Steps to reproduce
Homeserver
matrix.midnightthoughts.space
Synapse Version
{"server_version":"1.87.0","python_version":"3.11.4"}
Installation Method
Docker (matrixdotorg/synapse)
Database
Postgresql 15 server with active + standby replication nodes
Workers
Single process
Platform
The server is running on a mixed arch (x86_64+arm64) kubernetes cluster.
Configuration
The config is at https://git.nordgedanken.dev/kubernetes/gitops/src/branch/main/apps/production/synapse-midnightthoughts.yaml#L30-L41 mainly there are 4 (3 non working) appservices
Relevant log output
Attached as a file as the log is too long for a comment
Anything else that would be useful to know?
There is an element-web rageshake available hooked to the other issue.
The pasted log is the last hour. Specifically interesting is the high amount of replication around presence_set_state.
The text was updated successfully, but these errors were encountered: