-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS/SSL asyncio leaks memory #109534
Comments
Can confirm that the issue exists - Python 3.11, and uvloop 0.17.0, when trying to set up many client SSL connections that sometimes reconnects - even directly with Sometimes it leaked several megabytes per second for me. I was able to track this by attaching to a running python process via
Also, a |
@1st1, @asvetlov, @graingert, @gvanrossum, @kumaraditya303 (as asyncio module experts) |
Unfortunately I am really bad at this type of low level stuff. :-( If you suspect a leak in the C accelerator, you can disable it and see if the leak goes away. (Or at least becomes less severe.) |
OTOH if the problem is in the SSL code, we need a different kind of expert. |
Finally please figure out if this is in uvloop or not. If it’s tied to uvloop this is the wrong tracker. |
Unfortunately, this bug was reproduced only in our production, and we weren't able to cause it in other environments. It's already mitigated, as the root cause was in poor stability of connections. I don't think I would be able to reproduce it again in production without But ready to assist and provide more details, if anyone wants to ask me some clarifying questions about it. Also, there are several related tickets, and I'm not sure if they are all about
|
I actually was able to reproduce this leak isolated several days ago, but it leaked for only ~5 megabytes in several hours. I adopted a repro script from this comment to the aiohttp bug mentioned above to be ready for run, and ran it for about 4 hours. The initial memory consumption (RSS) had initially stabilised on #!/usr/bin/env python3
import aiohttp
import tracemalloc
import ssl
import asyncio
def init():
timeout = aiohttp.ClientTimeout(connect=5, sock_read=5)
ssl_ctx = ssl._create_unverified_context()
conn = aiohttp.TCPConnector(ssl=ssl_ctx, enable_cleanup_closed=True)
session = aiohttp.ClientSession(connector=conn, timeout=timeout, cookie_jar=aiohttp.CookieJar(unsafe=True))
return session
async def fetch(client):
try:
async with client.request('GET', url=f'https://api.pro.coinbase.com/products/BTC-USD/ticker') as r:
msg = await r.text()
print(msg)
except asyncio.CancelledError:
raise
except Exception as err:
print("error", err)
pass
async def main():
requests = 600
clients = [init() for _ in range(requests)]
tracemalloc.start()
try:
while True:
await asyncio.gather(*[fetch(client) for client in clients])
await asyncio.sleep(5)
except asyncio.CancelledError:
pass # end and clean things up
finally:
memory_used = tracemalloc.get_traced_memory()
snapshot = tracemalloc.take_snapshot()
stats = snapshot.statistics('lineno')
for stat in stats[:10]:
print(stat)
try:
await asyncio.gather(*[client.close() for client in clients.values()])
except:
pass
asyncio.run(main()) |
Could you find a reproducer without aiohttp? |
Unsure if this is helpful at all or related but I came across this issue encode/uvicorn#2078 in which in the thread it is discussed/concluded that the issue of memory not being released is not isolated to uvicorn but seen in granian/gunicorn/hypercorn and as a result could be interpreter level (apologies for butchering the summary). Thread has some great charts/analysis though it is on the server level, however the different implementations w.r.t granian and uvicorn can help approximate where the issue might be surfacing if it's related. Example repo: https://github.com/Besedo/memory_issue/tree/main by @EBazarov Apologies in advance if it is unrelated and will remove the comment/create one in the right place. gi0baro:
|
Upstream is pretty hard pressed to debug this (I'm no web developer or admin). Can I ask one of the framework owners or users who are struggling with this to help us reduce their example to the point where we have a small self-contained program that demonstrates the issue? |
i provided working example of leak, testing with ab gaves 1+gb/min leak, openssl version 1.1 IMPORTANT: there is no problem if using python3.9 WITHOUT uvloop, with uvloop it leaks, without is not. |
shorten the mvp even more
|
Updated first post for clarity. |
example certificates |
I only have a Mac, and I got the example to work and even managed to install the apache2 utils. I modified the example to print the process size once a second (rather than bothering with graphs) using Now I've got an idea. Given the complexity of the So maybe someone who is interested in getting to the bottom of this issue could rewrite the example without the use of (If you can't figure out how to use |
@gvanrossum Biggest leak i saw is at Python 3.12 (without uvloop) |
This doesn’t really help. We need someone to try and find which objects are being leaked. |
i tried different python memory profilers with no result at all, idk how to create useful dump, but inside coredump are tons of server certificates info |
Might one of the magical functions in the |
objgraph is great for this https://objgraph.readthedocs.io/en/stable/#memory-leak-example |
Sorry, what without repro (but I will try to do it anyway) but I also got the same issue with Tornado + SSL. And I can confirm, such a leak exist even without uvloop, and sometimes (if I response 404 to cloudflare proxy a lot) it's became significant. |
SummaryFindings:
Minimum replicationThis snippet runs the server and runs two separate pings (one insecure) in a subprocess, capturing the The output is as follows Δ Memory Allocations = 2727.56kb
Δ Memory Allocations = 6.66kb # without SSL Notice the import asyncio
import asyncio.sslproto
import ssl
import tracemalloc
class HTTP(asyncio.Protocol):
def __init__(self):
self.transport = None
def connection_made(self, transport):
self.transport = transport
def data_received(self, data):
self.transport.write(
b"HTTP/1.1 200 OK\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n"
)
self.transport.close()
def make_tls_context():
ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ctx.load_cert_chain(".jamie/iss109534/server.crt", ".jamie/iss109534/server.key")
return ctx
async def start_server(loop):
tls_context = make_tls_context()
return await loop.create_server(
HTTP, "127.0.0.1", 4443, backlog=65535, ssl=tls_context, start_serving=True
)
async def ping(delay: float = 1.0, n_iter: int = 1, insecure: bool = False):
await asyncio.sleep(delay)
# ----------------------------------------------------------------------------------
# Before
current_1, _ = tracemalloc.get_traced_memory()
# ----------------------------------------------------------------------------------
# Run a single request
if insecure:
cmd = "curl --insecure"
else:
cmd = "curl"
for _ in range(n_iter):
proc = await asyncio.create_subprocess_shell(
f"{cmd} https://127.0.0.1:4443",
stderr=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE
)
await proc.communicate()
# ----------------------------------------------------------------------------------
# After
current_2, _ = tracemalloc.get_traced_memory()
print(f"Δ Memory Allocations = {(current_2 - current_1)/1000:.2f}kb")
if __name__ == "__main__":
tracemalloc.start()
loop = asyncio.new_event_loop()
loop.run_until_complete(start_server(loop))
# Run with SSL verification
loop.run_until_complete(ping(delay=0.5, n_iter=10))
# Run without SSL verification
loop.run_until_complete(ping(delay=0.5, insecure=1, n_iter=10))
loop.close() Trace malloc snapshotI updated async def ping(delay: float = 1.0, n_iter: int = 1, insecure: bool = False):
# ...
snapshot_1 = tracemalloc.take_snapshot()
# ...
# Same as before
# ....
snapshot_2 = tracemalloc.take_snapshot()
print('-'*40)
if insecure:
print("Insecure")
print(f"Δ Inner Memory Allocations = {(current_2 - current_1)/1000:.2f}kb")
top_stats = snapshot_2.compare_to(snapshot_1, 'traceback')
print("\n[ Top stat ]")
for stat in top_stats[:1]:
print(stat)
for line in stat.traceback.format(limit=25):
print("\t", line)
print('-'*40) And, now we can see where the allocations are happening (with
Dev ModeWe can further confirm this when we run
EditUsing Edit 2, updated minimal replicationA simpler code example for replicating the issue... import asyncio
import ssl
import tracemalloc
async def main(certfile, keyfile):
tracemalloc.start()
# Start server with SSL
ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
ssl_context.load_cert_chain(certfile, keyfile)
await asyncio.start_server(
lambda _, w: w.write(b"\0"), "127.0.0.1", "4443", ssl=ssl_context
)
current_1, _ = tracemalloc.get_traced_memory()
# Ping the server with cURL
proc = await asyncio.create_subprocess_shell(
f"curl https://127.0.0.1:4443 2> /dev/null"
)
await proc.communicate()
current_2, _ = tracemalloc.get_traced_memory()
print(f"{(current_2-current_1)/1000:.2f}KB")
if __name__ == "__main__":
asyncio.run(
main(certfile="server.crt", keyfile="server.key")
) |
@mjpieters Just curious -- have you ever experienced anything similar? |
Excellent find!! Can you think of a suitable place to free this memory? I think it must be passed into OpenSSL via a C wrapper and that C wrapper must somehow forget to free it? So it would have to be in the C code for I don't think the leak is in the Python code, even though that's where it's being allocated (Python objects don't easily leak attribute values). |
I can reproduce it on 3.12 on Gentoo but I can't see it inside container base on alpine or debian, it's de allocate properly. |
While I do confirm @stalkerg observation that So I think we need to explore that angle perhaps, that when connection_lost() is called there's actually memory cleanup. I'll try confirmation of such hypothesis. But @stalkerg please don't kid yourself that this is "pooling" of some sort. While I did observe like you, repeated |
oh, it was a good hint. It's can confuse everything even more, but still. I tried to use
btw on my Gentoo I use glibc 2.38, on debian is 2.36 and in alpine is not a glibc.
UPDATE: UPDATE2: |
I confirm @stalkerg observation that with jemalloc there is deallocation of the buffers after
I also confirm @stalkerg observation that issuing |
@graingert @rojamit seems like any numbers for UPDATE: Okey, it's happened because such env var disable dynamic threshold. https://github.com/lattera/glibc/blob/master/malloc/malloc.c#L5031 UPDATE: this is actual values during our test: |
Confirming that setting |
@geraldog maybe Some tests with Python 3.12 with patch, asyncio loop:
Python 3.9, asyncio loop:
Python 3.12 with patch, uvloop, desired result:
(patch doesnt affect uvloop as i remember it has own ssl module) Looks like |
@rojamit why for your first test you have much more requests? Anyway, you should try update glibc and maybe linux kernel.
as I said, it's just mitigation, it's also not solve my issue on a long run. Did you try to replace allocator? If |
@stalkerg higher requests count is not linked with memory consumption in this test uname -r Debian GNU/Linux 11 (bullseye) |
Sorry @stalkerg but I think the burden is on you to write a Bug Report to https://sourceware.org/mailman/listinfo/libc-stable and Cc: the relevant people involved in glibc development. It was you who discovered it. There's a high possibility they won't fix however, citing that it's an actual optimization for some cases and that the burden is on the user to tell glibc to deallocate. And then we're basically stuck, all that is left is noting in documentation that this is a known issue... |
Hello, I experienced the problem with my Django ASGI web app and finally found this topic. I use railway.app as a hosting service, I also described how to reproduce the problem with minimal Django ASGI app on stackoverflow: https://stackoverflow.com/questions/78339166/python-django-asgi-memory-leak-updated-2 |
@fifdee did you try use different allocator? You can find instructions in this topic. |
@stalkerg I haven't. I'm not sure if it's possible with PaaS host like Railway, currently I don't have much experience with this kind of stuff. |
I tried setting environment variable And for those who use Django ASGI and suffer the same problem, this is modified
|
We're experiencing the same problem and setting env var Is there a plan to add the fix in the near-future releases? |
Are you absolutely sure it is the same problem, then? Feel free to provide details.
The "bug" discussed here is not a Python bug but rather a glibc quirk or optimization. |
Since there are no plans to fix this, shouldn't Python docs very clearly and explicitly recommend people to not use Python 3.11+ for ASGI applications? Or maybe disclaim that your application will leak memory unless you try one of the fixes in this discussion? This seems too big to not be documented. |
@rodrigomeireles it's not about ASGI apps, we have much smaller repro. It's more about glibc quirks compatibility. Unfortunately, I didn't have time last few months and I really doesn't like mail lists. |
@stalkerg I understand it's not exactly about ASGI apps but reading the thread (and also why I'm here) it seems a lot of people found memory leaks in their production APIs. This should be highlighted as I (and many others) could've avoided a lot of headache had I not upgraded my applications to the latest Python (or used a different language altogether). If you give me some guidance I could detail the issue to their email list but I believe that would give you more work than messaging them yourself. Nonetheless I could try it if you wish. |
I can absolutely relate to this because we ended up rewriting the whole app in Rust. One of the main reasons was this bug (and async libs being quite buggy in general). It took me days to debug what's going on and finally landed in here. |
Are there any updates on this issue? Did anyone open up a bug report with the glibc maintainers? This issue has made trying to run a long-running Python server nearly impossible. |
@nikumar1206 I did attempt a few moments ago https://sourceware.org/pipermail/libc-help/2024-October/006791.html but not sure. |
This is helpful @stalkerg assuming the glibc maintainers actually get to reading the thread they will eventually come up with something, hopefully not a won't-fix |
Bug report
Bug description:
python3.9 without uvloop doesn't leaks memory (or noticeably smaller).
python3.11+ (and others?) leaks memory A LOT under load (with or without uvloop) - up to +2gb per every test!
test commands:
ab -n50000 -c15000 -r https://127.0.0.1/
(apt install apache2-utils)
CPython versions tested on:
3.9, 3.11, 3.12
Operating systems tested on:
Debian Linux
Linked PRs
The text was updated successfully, but these errors were encountered: