-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-109534: fix reference leak when SSL handshake fails #114074
gh-109534: fix reference leak when SSL handshake fails #114074
Conversation
Lib/asyncio/sslproto.py
Outdated
self._on_handshake_complete(ConnectionResetError) | ||
self._on_handshake_complete(ConnectionResetError()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mjpieters; This fix caused the ssl-over-ssl test from GH-113214 (PR #113334) to fail and want to make sure I did not break anything on your end.
The test will raise ConnectionResetError
(via SSLProtocol.eof_received
) and normally this won't
be caught by the exception handler in SSLProtocol._fatal_error
because it is of instance OSError
. But because we no longer raise the exception in SSLProtocol._on_handshake_complete(exc)
, the exception object is never initialised and the isinstance
doesn't work as intended. Changed SSLProtocol.eof_received
to pass an exception object and not class to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in this area, but passing in an instance instead of the class looks fine by me. I did leave a comment on the removal of the try...except
block however as this did more than just handle raise handshake_exc
.
Was not sure how to add tests for this, the current replication from the issue is as follows import asyncio
import ssl
import tracemalloc
async def main(certfile, keyfile):
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ssl_context.load_cert_chain(certfile, keyfile)
await asyncio.start_server(
lambda _, w: w.write(b"\0"), "127.0.0.1", "4443", ssl=ssl_context
)
current_1, _ = tracemalloc.get_traced_memory()
proc = await asyncio.create_subprocess_shell(
f"curl https://127.0.0.1:4443 2> /dev/null"
)
await proc.communicate()
current_2, _ = tracemalloc.get_traced_memory()
print(f"{(current_2-current_1)/1000:.2f}KB")
if __name__ == "__main__":
tracemalloc.start()
asyncio.run(main(certfile="server.crt", keyfile="server.key")) The delta in |
peercert = sslobj.getpeercert() | ||
except Exception as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handler here is intended to (also) catch exceptions raised by sslobj.getpeercert()
. That is probably going to be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words: you can't avoid using a try...except
block here. Is removing this really necessary to break the cycle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! This was my misunderstanding of the problem. It turns out that setting the original handshake_exc
to None also fixed our problem -- which my understanding is that there are two separate instances of the exception, each with a traceback and only one was being handled.
try:
if handshake_exc is None:
self._set_state(SSLProtocolState.WRAPPED)
else:
raise handshake_exc
peercert = sslobj.getpeercert()
except Exception as exc:
handshake_exc = None # <--- fixes the problem
This code snippet should replicate our issue, which I'm currently trying to convert into a unittest for this PR import asyncio
import ssl
import weakref
import tracemalloc
class SSLProtocol:
def __init__(self, future) -> None:
self.buffer = bytearray(256 * 1024)
self.future = future
def _do_handshake(self):
try:
raise ssl.SSLCertVerificationError()
except ssl.SSLError as exc:
self._on_handshake_complete(exc)
def _on_handshake_complete(self, handshake_exc):
try:
if handshake_exc is None:
...
else:
raise handshake_exc
except Exception as exc:
# handshake_exc = None # <-------- Uncomment to break assertion
self.future.set_exception(exc)
self.future = None
async def accept_connection():
"""Mimicking selector_events.BaseSelectorEventLoop._accept_connection2"""
loop = asyncio.get_running_loop()
waiter = loop.create_future()
ssl_proto = SSLProtocol(waiter)
# selector_events._SelectorSocketTransport would schedule the protocols
# connection_made method
loop.call_soon(ssl_proto._do_handshake)
try:
await waiter
except BaseException:
waiter = None # This is needed with `handshake_exc = None`
ref = weakref.ref(ssl_proto) # For testing
# The protocol **should** fall out of scope here...
# In _accept_connection2 the app & socket transports will release their
# references to the protocol via their close methods and there will be no
# references to it in this task (it falls out of scope of _make_ssl_transport)
ssl_proto = None
return ref
if __name__ == "__main__":
tracemalloc.start()
m_start = tracemalloc.get_traced_memory()[0]
ssl_proto_ref = asyncio.run(accept_connection())
m_delta = tracemalloc.get_traced_memory()[0] - m_start
# Uncomment `handshake_exc = None` to break assertions
assert ssl_proto_ref() is not None
assert m_delta > 256_000 # bytes |
Does uvloop show the same issue? The implementation here is a fairly direct port from their Cython implementation, it'd be interesting to see what is different if uvloop doesn't show the same leak. |
Hm. The initial issue ticket says My second snippet (with the mimicked classes) will still show the same issue with And briefly looking over the uvloop implementation of SSLProtocol, I can't see any obvious differences! Let me look into this! |
Sorry for the slow response @mjpieters, it's been a busy week at the office!
uvloop doesn't appear to have the same issue. Despite the (Note, I am quite out-of-depth here, but tried my best navigating through uvloop's code base...) To me, the main difference seems to be that in uvloop the waiter (future) exception is set through a done callback allowing the waiter to fall out of scope and break the reference cycle between Exception (and traceback) and Future. In
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work debugging this! This change looks good to me. It sounds like we should backport this change to 3.11 as well?
Would you like me to create the PRs for the backports or is this something best left for the core team? |
I'll handle backports! Just wanted to mention and to leave open for a little bit in case other folks have opinions. |
Thanks @hauntsaninja! Just want to mention that the issue has some open threads RE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to say I haven't been able to follow all this (the issue is very noisy) but these changes look good regardless.
Thanks @ordinary-jamie for the PR, and @hauntsaninja for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11, 3.12. |
…H-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
GH-114829 is a backport of this pull request to the 3.12 branch. |
…H-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
GH-114830 is a backport of this pull request to the 3.11 branch. |
) (#114830) gh-109534: fix reference leak when SSL handshake fails (GH-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
) (#114829) gh-109534: fix reference leak when SSL handshake fails (GH-114074) (cherry picked from commit 80aa7b3) Co-authored-by: Jamie Phan <[email protected]>
Thanks @ordinary-jamie for researching this and coming up with the fix! Thanks @hauntsaninja for taking over the review. Really appreciate it! |
Thank you mjpieters as well! :-) |
Thanks everyone for the support and guidance - I've learnt a lot from you all! I will continue to monitor the issue thread to see if other users are observing other related bugs. |
Fixes a reference cycle issue between the
Future
object used inselector_events.BaseSelectorEventLoop._accept_connection2
andSSLProtocol
which prevented theSSLProtocol
from deallocating and causing spikes in memory usage.The root cause of the issue is that, on SSL handshake failure, the
SSLProtocol
sets theFuture
(waiter) exception, and the traceback frames in that exception will hold references to the protocol object and its buffers.RefactoredEdit: see comments belowSSLProtocol._on_handshake_complete
to not usetry/except
as this created a frame that kept the waiter object alive.