Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow/hanging HTTP responses from self-hosted Synapse homeserver. #3894

Closed
shymega opened this issue Aug 26, 2024 · 28 comments
Closed

Slow/hanging HTTP responses from self-hosted Synapse homeserver. #3894

shymega opened this issue Aug 26, 2024 · 28 comments

Comments

@shymega
Copy link

shymega commented Aug 26, 2024

I'm writing a small utility to leave rooms on my Matrix account using the Rust SDK. However, it consistently times out, despite editing SyncSettings, or using a machine closer to my self-hosted Synapse server.

I've also made the utility public: https://github.com/shymega/matrix-leave.

I can see activity in the logs, but it's like the SDK is being impatient.

Logs:

     Running `target/debug/matrix-leave`
2024-08-26T13:35:48.384900Z  INFO matrix_leave: Logging in [email protected]
2024-08-26T13:35:48.384971Z  INFO login{method="identifier and password"}: matrix_sdk::matrix_auth::login_builder: Logging in homeserver="https://mtx.shymega.org.uk/" identifier=Some(UserIdOrLocalpart("@shymega:mtx.shymega.org.uk"))
2024-08-26T13:35:48.834542Z  INFO matrix_leave: Logged in. Now syncing...
Error: error sending request for url (https://mtx.shymega.org.uk/_matrix/client/v3/sync?timeout=30000): operation timed out

Caused by:
    operation timed out

Can someone help? Thanks!

@bnjbvr
Copy link
Member

bnjbvr commented Aug 26, 2024

It's the sending of the request that's timing out, maybe the server takes too long to respond. You could try using a different SyncSettings, with a custom timeout value, something like SyncSettings::default().timeout(Duration::from_secs(60 * 5)) to give it 5 minutes to respond, for instance.

(For support, you might have a better chance using the Matrix room for quicker feedback.)

@shymega
Copy link
Author

shymega commented Aug 26, 2024

Ah, I didn't know there was a Matrix room for this project. Thanks! I'll close this and continue in the Room.

@shymega shymega closed this as completed Aug 26, 2024
@shymega
Copy link
Author

shymega commented Aug 26, 2024

Actually, I'll reopen this issue, because I think something low-level in the HTTP client is going wrong.

@shymega shymega reopened this Aug 26, 2024
@shymega
Copy link
Author

shymega commented Aug 26, 2024

Essentially, with RUST_LOG=trace set, I'm seeing a lot of HTTP requests, which then seem to hang. I've attached a log. It hangs and sometimes picks up again, but I'm sure this behaviour isn't normal. It does get to the sync process, which is where it hangs.

Below are the last few lines at a 'hang point':

2024-08-26T17:09:48.772487Z DEBUG sync_once{sync_settings=SyncSettings { timeout: 900s, full_state: false, set_presence: "online" }}:keys_query{request_id="9e2908872d1349c7956cfbb8dd7bd434" device_keys={"@shymega:mtx.shymega.org.uk": []}}: eyeball::state: No wakers
2024-08-26T17:09:48.772839Z TRACE sync_once{sync_settings=SyncSettings { timeout: 900s, full_state: false, set_presence: "online" }}:send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 930s } request_id="REQ-8"}: matrix_sdk::http_client: Serializing request request_type="ruma_client_api::sync::sync_events::v3::Request"
2024-08-26T17:09:48.772986Z DEBUG sync_once{sync_settings=SyncSettings { timeout: 900s, full_state: false, set_presence: "online" }}:send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 930s } request_id="REQ-8" method=GET uri="https://mtx.shymega.org.uk/_matrix/client/v3/sync"}: matrix_sdk::http_client::native: Sending request num_attempt=1
2024-08-26T17:09:48.773152Z TRACE sync_once{sync_settings=SyncSettings { timeout: 900s, full_state: false, set_presence: "online" }}:send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 930s } request_id="REQ-8" method=GET uri="https://mtx.shymega.org.uk/_matrix/client/v3/sync"}: hyper_util::client::legacy::pool: take? ("https", mtx.shymega.org.uk): expiration = Some(90s)
2024-08-26T17:09:48.773180Z DEBUG sync_once{sync_settings=SyncSettings { timeout: 900s, full_state: false, set_presence: "online" }}:send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 930s } request_id="REQ-8" method=GET uri="https://mtx.shymega.org.uk/_matrix/client/v3/sync"}: hyper_util::client::legacy::pool: reuse idle connection for ("https", mtx.shymega.org.uk)

@shymega
Copy link
Author

shymega commented Aug 26, 2024

I should also mention I have Matrix behind Cloudflare. It's proxied, and doesn't generally cause issues.

@shymega shymega changed the title Timeout during sync_once Slow/hanging HTTP responses from self-hosted Synapse homeserver. Aug 26, 2024
@Hywan
Copy link
Member

Hywan commented Aug 27, 2024

It takes 930s before hitting the timeout, that's quite a lot (15mn). From the server logs, do you see a request coming?

@shymega
Copy link
Author

shymega commented Aug 27, 2024

It takes 930s before hitting the timeout, that's quite a lot (15mn). From the server logs, do you see a request coming?

@Hywan The server has a lot of logs, mostly federation. I see a few hits wth the user ID, rapidly, but the client itself seems to hang. My Nix Synapse config is here.

@Hywan
Copy link
Member

Hywan commented Aug 27, 2024

I can't help you more with the information I've. Check if the server receives the request or takes a while to compute the request (it can happen with sync v3 for super large accounts). If you have a proxy, did you configure the Client to use this proxy (https://matrix-org.github.io/matrix-rust-sdk/matrix_sdk/struct.ClientBuilder.html#method.proxy)? Are you using an SSL/TLS certificate? It's hard to help you without these information. How long did you wait? Are you reaching the timeout?

@shymega
Copy link
Author

shymega commented Aug 27, 2024

The server is behind a Cloudflare Argo Tunnel, over HTTPS. Cloudflare Argo connects to the localhost HTTP (port: 8008) server Synapse exposes. I am not using a proxy.

Yes, I reach the timeout if I use the default timeout int. That's why I boosted it, but it still hangs.

@Hywan
Copy link
Member

Hywan commented Aug 27, 2024

Does another client hang?

@shymega
Copy link
Author

shymega commented Aug 27, 2024

No, that's the strange thing. The JS SDK works, as does the Element app.

@Hywan
Copy link
Member

Hywan commented Aug 27, 2024

Please try to debug from the server-side: do you receive a request? does the server send a response?

@shymega
Copy link
Author

shymega commented Aug 27, 2024

The server seems to receive the sync request. As to sending responses, this I doubt.

The SDK spits this out:

2024-08-27T09:28:52.479853Z TRACE sync_once{sync_settings=SyncSettings { timeout: 30s, full_state: false, set_presence: "online" }}:send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 60s } request_id="REQ-2" method=GET uri="https://mtx.shymega.org.uk/_matrix/client/v3/sync"}: hyper_util::client::legacy::client: http1 handshake complete, spawning background dispatcher task
2024-08-27T09:28:52.480110Z TRACE sync_once{sync_settings=SyncSettings { timeout: 30s, full_state: false, set_presence: "online" }}:send{server_versions=[V1_0, V1_1, V1_2, V1_3, V1_4, V1_5, V1_6, V1_7, V1_8, V1_9, V1_10, V1_11] config=RequestConfig { timeout: 60s } request_id="REQ-2" method=GET uri="https://mtx.shymega.org.uk/_matrix/client/v3/sync"}: hyper_util::client::legacy::pool: checkout dropped for ("https", mtx.shymega.org.uk)

The thing that jumps out at me is the line that says:

checkout dropped for ("https", mtx.shymega.org.uk)

@shymega
Copy link
Author

shymega commented Aug 27, 2024

There is also one other recurring line that makes me think it's something to do with HTTP/2:

hyper_util::client::legacy::pool: reuse idle connection for ("https", mtx.shymega.org.uk)

@shymega
Copy link
Author

shymega commented Aug 27, 2024

I've been trying with varying timeouts, with no luck so far. Reqwest simply hangs mid-connection.

This is bizarre.

@bnjbvr
Copy link
Member

bnjbvr commented Aug 27, 2024

If you try without the Cloudflare proxy, or with the proxy but with HTTP1, does it work? The Cloudflare proxy may tweak the HTTP headers/values in subtle ways, so at the extreme, it might be a bug on their side…

@shymega
Copy link
Author

shymega commented Aug 27, 2024

There's a bit more activity when I disable the proxy. Sync is still stuck, but the logs from the SDK seem more 'busy' - as well as Synapse.

@shymega
Copy link
Author

shymega commented Aug 27, 2024

Actually, re-read the logs - it's just fetching /.well-known/matrix/client over and over. It receives HTTP 200, and then repeats.

@shymega
Copy link
Author

shymega commented Aug 27, 2024

Below are logs from the SDK. It just does this in a loop now. It might point to my .well-known config - does this look sane?

2024-08-27T15:06:59.676881Z TRACE build{homeserver=ServerName { server: "mtx.shymega.org.uk", protocol: Https }}:send{server_versions=[V1_0] config=RequestConfig { timeout: 30s, retry_limit: 3 } request_id="REQ-0" method=GET uri="https://mtx.shymega.org.uk/.well-known/matrix/client" status=200 response_size="180 B"}:Connection{peer=Client}:poll: h2::proto::streams::prioritize: writing frame=Headers { stream_id: StreamId(9), flags: (0x5: END_HEADERS | END_STREAM) }
2024-08-27T15:06:59.676897Z DEBUG build{homeserver=ServerName { server: "mtx.shymega.org.uk", protocol: Https }}:send{server_versions=[V1_0] config=RequestConfig { timeout: 30s, retry_limit: 3 } request_id="REQ-0" method=GET uri="https://mtx.shymega.org.uk/.well-known/matrix/client" status=200 response_size="180 B"}:Connection{peer=Client}:poll:FramedWrite::buffer{frame=Headers { stream_id: StreamId(9), flags: (0x5: END_HEADERS | END_STREAM) }}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(9), flags: (0x5: END_HEADERS | END_STREAM) }
2024-08-27T15:06:59.676953Z TRACE build{homeserver=ServerName { server: "mtx.shymega.org.uk", protocol: Https }}:send{server_versions=[V1_0] config=RequestConfig { timeout: 30s, retry_limit: 3 } request_id="REQ-0" method=GET uri="https://mtx.shymega.org.uk/.well-known/matrix/client" status=200 response_size="180 B"}:Connection{peer=Client}:poll: h2::proto::streams::prioritize: schedule_pending_open
2024-08-27T15:06:59.676970Z TRACE build{homeserver=ServerName { server: "mtx.shymega.org.uk", protocol: Https }}:send{server_versions=[V1_0] config=RequestConfig { timeout: 30s, retry_limit: 3 } request_id="REQ-0" method=GET uri="https://mtx.shymega.org.uk/.well-known/matrix/client" status=200 response_size="180 B"}:Connection{peer=Client}:poll:FramedWrite::flush: h2::codec::framed_write: queued_data_frame=false
2024-08-27T15:06:59.677092Z TRACE build{homeserver=ServerName { server: "mtx.shymega.org.uk", protocol: Https }}:send{server_versions=[V1_0] config=RequestConfig { timeout: 30s, retry_limit: 3 } request_id="REQ-0" method=GET uri="https://mtx.shymega.org.uk/.well-known/matrix/client" status=200 response_size="180 B"}:Connection{peer=Client}:poll:FramedWrite::flush: h2::codec::framed_write: flushing buffer

@shymega
Copy link
Author

shymega commented Aug 27, 2024

I think this is my end. The SDK works fine with Element One.

I've gone back to my thread in the Synapse room - unless there's a better place?

If the team deem it appropriate, I'm happy for this ticket to be closed.

@bnjbvr
Copy link
Member

bnjbvr commented Aug 27, 2024

I've gone back to my thread in the Synapse room

Sounds good. Curious what's causing it. Good luck!

@bnjbvr bnjbvr closed this as completed Aug 27, 2024
@bnjbvr bnjbvr closed this as not planned Won't fix, can't repro, duplicate, stale Aug 27, 2024
@shymega
Copy link
Author

shymega commented Aug 27, 2024

Just a quick update: I found a legacy _matrix SRV record, which was pointing incorrectly. I believe this is why Synapse returns 500. I've adjusted the record, and am now waiting for it to propogate.

@shymega
Copy link
Author

shymega commented Aug 28, 2024

No luck yet.

@bnjbvr Just a quick question - is the SDK meant able to respect the org.matrix.msc3575.proxy. key in /.well-known/matrix/client?

@bnjbvr
Copy link
Member

bnjbvr commented Aug 28, 2024

It's clearer since #3889, which landed yesterday, which allows you to specify a sliding_sync_version() in the ClientBuilder; in particular there are options to force-specify a sliding sync proxy URL, or discover it via the .well-known mechanism.

@shymega
Copy link
Author

shymega commented Aug 28, 2024

I'm not sure how to use #3889. Could you DM me on Matrix? @shymega: ? Thanks.

@bnjbvr
Copy link
Member

bnjbvr commented Aug 28, 2024

We don't provide personal unpaid support for using the SDK; feel free to ask in the Matrix dev room I pointed to previously, though.

@shymega
Copy link
Author

shymega commented Aug 28, 2024

No worries, I understand. It was a big ask. I'll ask in there now.

@shymega
Copy link
Author

shymega commented Aug 28, 2024

Once I figure it out, I've made a note to submit a PR with an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants