-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xrootd TLS handshake failed: not an SSL/TLS record #7628
Comments
Hi, We have commented this issue today in the dCache Tier-1 Meeting with @lemora and @kofemann. I forgot to mention: dCache 9.2.22, openjdk 17 and RHEL-8. Cheers, |
Hi Samuel, Good to know, we also experienced timeouts with CMS SAM tests for Xrootd since yesterday afternoon and same observation, 50% of SAM tests failed with timeouts
https://monit-grafana.cern.ch/goto/UgDTDSrIg?orgId=20 But I don't know if it's related with this SSL/TLS issue because I found these errors in the Xrootd door log since the end of May... Stephan from CMS just answered that the ETF team may have picked up newer, broken xrootd client version yesterday. They are trying to downgrade it to fix the SAM tests. Adrien |
Hi Adrien, I am not sure, because we had a Full Site Downtime last week and we reinstalled all dCache servers hosting the doors, and we do not save the door logs, but I think we have also seen these TLS handshake errors in the past on the doors. Cheers, |
Just to add this discussin. We see the same errors from etf-07.cern.ch hosts. We upgraded to Alma9. And deployed 9.2.22. We have not seen CLOSE_WAIT connections in large numbers before Another bit - the "no SSL/TLS" errors definitely come from SAM test and do not seem to correlate with CLOSE_WAIT situation. I started to monitor threads used by XrootD and spikes occur about may be once in 2 days whereas these We do not restart the door. It recoveres by itself. Dmitry |
When connection to xrootd door is made there are few things being exchanged. I think all that means is that protocol call has succeeded. But then the server threw exception causing no proper entry in the log. |
This is time distrubution of errors on xrootd door on CMS site:
We experienced spike in thread count on Xrootd door at about 2 AM. All errors between 2 AM and 4 AM looked like:
Again, this is NOT related to SSL/TLS issue but, I think this is related to CLOSE_WAIT https://github.com/DmitryLitvintsev/scripts/blob/master/bash/dcache/dcache_thread_count.sh I use the above script to monitor thread counts and I trigger thread dump of domain if thread count > 900 plus (this is because default I would like you to run this script at your site. |
Well, the errors stopped Yesterday afternoon as soon as CMS rollbacked their Xrootd client version I think.
Now I only got errors on doors like |
Same behaviour at KIT. The errors with |
@ageorget @samuambroj Thanks for the update. Nonetheless, we (dcache) should find out what's going on there anyway.... |
The SSL errors looked liked non-ssl connection made to SSL expecting host. |
The issue with multiple CLOSE_WAITs and peek CPU utilization on the xrootd server for CMS AAA is believed to be caused by a bug in OpenSSL 3.0. (This is a c library to extract voms information at voms_api) |
Hi,
For the last days, we observed more and more SSL/TLS errors on the CMS XRootD doors like this without any changes on dCache side (9.2.22 on doors, Java openjdk version "17.0.10") :
The strange thing is that access log reports the transaction as OK and nothing else:
Adrien
The text was updated successfully, but these errors were encountered: