-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container restarting exit code: 132 #174
Comments
Got a bit further on this, i can get the container to stay stable by using a non TLS port / scheme This wasn't clear from any errors in v2, downgrading to v1 gives the ssl handshake error which made me try an non TLS connection |
It looks like it stops when the first client connects, is this correct? Does it stay alive if you don't start any clients? |
Yes correct. I'm connecting with AMQPLib for PHP in case that makes any differenece |
Ok. Can you see in the upstream's logs if amqproxy connects? |
Hey @spuun Sorry for the delay getting back to you. Yes i can see the connection in the cloudamqp logs, I can connect and publish fine now 🤷 If the issue comes back, i'll check the upstream logs and re-open this issue |
So, testing this again in production and seeing the issue. The upstream logs just say the client closed connection The only difference between working and broken is changing the AMQP_URL env var from
v2 debug logs:
Here's the logs on v1.0.0:
v1 debug logs:
|
Hmm, interesting, so something with the TLS.. But somehow the connection is esablished, but then after some packets are exchanged it crashes? In other cases exit code 132 might indicate it's a missing cpu feature, like AVX or so, maybe something with the AES/hardware acceleration that openssl uses? |
My theory is that this has happened since the cert on *.rmq.cloudamqp.com changed on More information from Cloudflare here: https://blog.cloudflare.com/upcoming-lets-encrypt-certificate-chain-change-and-impact-for-cloudflare-customers/ It's possibly an error like you say, but I have been comparing open SSL installs on a machine it works on, and one it doesn't, and I can't see anything obviously different. Here is the /proc/cpu from a machine this doesn't work on - This is running Ubuntu 22.04 LTS, so it's not out of date in terms of OS, and the hardware is fairly new.
vs one it does work on:
|
I was about to post a comment linking to this issue kubernetes/kubernetes#94284 specially thinking of the last comment kubernetes/kubernetes#94284 (comment) Interesting about the TLS cert... are you able to try amqproxy connecting against some other host over AMQPS that has a different cert? |
@dentarg We're getting the same issue with a custom cert/domain issued by AlphaSSL with these certs https://help.configuressl.com/alphassl-wildcard-intermediate-root-ca-cross-signed-ca-certificates-r6/ |
We have changed the SSL cert on cloudamqp to our own custom cert issued by AlphaSSL and we're still seeing the same problem. Can confirm through the stack that the SSL is valid by running
It's hitting this in both instances - amqproxy/src/amqproxy/client.cr Line 264 in e9e0162
Client Logs
Server Logs
|
We modified cloudamqp to log the exception stack trace:
|
The stack trace is more useful once amqproxy is built with
|
If the amqproxy upstream is using |
No (actually it's not even possible to connect with amqps to amqproxy, our thinking have been that the proxy is installed in a trusted network and that TLS then doesnt add any meaningful benefit) |
We've realised at least part of the problem here (the socket EOFError) is coming from health check requests opening the socket, but not providing any data. |
On the affected host machine (in a containerised environment, FWIW), we are seeing:
However we don't yet have a core dump, and don't know whether the problem lies in openssl, crystallang or amqproxy. |
Core dump with backtrace for illegal instruction:
|
Ok! In gdb, can you |
@carlhoerberg thanks for your help, unfortunately not yet: I'm presuming I need to be using debug versions of some system libraries to get that. |
So if we have built with
And when running in gdb with
Afaik crystal uses LLVM, whose trap intrinsic will generate a ud2 instruction. That could mean that SIGILL is a red herring for an invalid memory access as the root cause, which makes us suspect the problem is in crystal lang. |
I think we've narrowed this down to a mismatch of libssl version between compile and runtime. See issue #178 |
We're fairly sure this should be fixed with #183 @carlhoerberg @viktorerlingsson please can we get a release based on all of these recent fixes ? v2.0.2...main |
Yes @mrmason, a new release will be coming soon! |
@viktorerlingsson Testing this now, so far looks good! |
Hi,
We're running this as a kubernetes Daemonset (we've also tried sidecar/extraContainers) and containers constantly restart
The logs aren't great, even if we try turning on debug
the container then dies all the time with exist code 132:
Any suggestions?
The text was updated successfully, but these errors were encountered: