Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS/SSL asyncio leaks memory #109534

Open
rojamit opened this issue Sep 18, 2023 · 87 comments
Open

TLS/SSL asyncio leaks memory #109534

rojamit opened this issue Sep 18, 2023 · 87 comments
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@rojamit
Copy link

rojamit commented Sep 18, 2023

Bug report

Bug description:

python3.9 without uvloop doesn't leaks memory (or noticeably smaller).
python3.11+ (and others?) leaks memory A LOT under load (with or without uvloop) - up to +2gb per every test!

test commands:
ab -n50000 -c15000 -r https://127.0.0.1/
(apt install apache2-utils)

import asyncio, ssl, uvloop

class HTTP(asyncio.Protocol):
    
    def __init__(self):
        self.transport = None
        
    def connection_made(self, transport):
        self.transport = transport

    def data_received(self, data):
        self.transport.write(
            b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n'
        )
        self.transport.close()
        
def make_tls_context():
    ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
    ctx.load_cert_chain('cert.crt', 'cert.key')
    return ctx

tls_context = make_tls_context()
loop = uvloop.new_event_loop()

async def start_server(loop):
    return await loop.create_server(
        HTTP, '127.0.0.1', 443, backlog=65535,
        ssl=tls_context)

loop.run_until_complete(start_server(loop))
loop.run_forever()

CPython versions tested on:

3.9, 3.11, 3.12

Operating systems tested on:

Debian Linux

Linked PRs

@rojamit rojamit added the type-bug An unexpected behavior, bug, or error label Sep 18, 2023
@selevit
Copy link

selevit commented Nov 17, 2023

Can confirm that the issue exists - Python 3.11, and uvloop 0.17.0, when trying to set up many client SSL connections that sometimes reconnects - even directly with asyncio.create_connection, and it's cleaned up not properly in some place.

Sometimes it leaked several megabytes per second for me.

I was able to track this by attaching to a running python process via gdb, and when it leaked, I put a catchpoint for brk syscalls, and examined 40 sequential breakpoint events, when all were related to /usr/lib/python3.10/ssl.py.

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.10/ssl.py", line 917, in read
    v = self._sslobj.read(len)
  <built-in method buffer_updated of uvloop.loop.SSLProtocol object at remote 0xffff758f90c0>
  <built-in method run of _contextvars.Context object at remote 0xffff758c7340>
  File "/home/ubuntu/py-core/env/lib/python3.10/site-packages/aiohttp/web.py", line 544, in run_app
    loop.run_until_complete(main_task)
  File "/home/ubuntu/py-core/server_app.py", line 39, in <module>
    web.run_app(app_factory(), loop=LOOP, host=args.host, port=args.port)
(gdb)

Also, a gc.get_objects() count was not actually growing, that's why I suppose that this is related a lower-level native implementation.

@arhadthedev arhadthedev added performance Performance or resource usage topic-asyncio extension-modules C modules in the Modules dir labels Nov 18, 2023
@github-project-automation github-project-automation bot moved this to Todo in asyncio Nov 18, 2023
@arhadthedev
Copy link
Member

@1st1, @asvetlov, @graingert, @gvanrossum, @kumaraditya303 (as asyncio module experts)

@gvanrossum
Copy link
Member

Unfortunately I am really bad at this type of low level stuff. :-( If you suspect a leak in the C accelerator, you can disable it and see if the leak goes away. (Or at least becomes less severe.)

@gvanrossum
Copy link
Member

OTOH if the problem is in the SSL code, we need a different kind of expert.

@gvanrossum
Copy link
Member

Finally please figure out if this is in uvloop or not. If it’s tied to uvloop this is the wrong tracker.

@selevit
Copy link

selevit commented Nov 19, 2023

Unfortunately, this bug was reproduced only in our production, and we weren't able to cause it in other environments.

It's already mitigated, as the root cause was in poor stability of connections.

I don't think I would be able to reproduce it again in production without uvloop, because it caused a lot of harm for our system.

But ready to assist and provide more details, if anyone wants to ask me some clarifying questions about it.

Also, there are several related tickets, and I'm not sure if they are all about uvloop:

  1. asyncio: potential leak of TLS connections #106684 - very similar cpython issue, already closed though.
  2. Memory leak with Python 3.11.2 aio-libs/aiohttp#7252 - also, an already closed, but recent aiohttp issue, they were able to mitigate it somehow, but not sure if it was the same bug.

@selevit
Copy link

selevit commented Nov 19, 2023

I actually was able to reproduce this leak isolated several days ago, but it leaked for only ~5 megabytes in several hours. I adopted a repro script from this comment to the aiohttp bug mentioned above to be ready for run, and ran it for about 4 hours.

The initial memory consumption (RSS) had initially stabilised on 258120 bytes, but after several hours increased to ~264000. Hope it's the same bug.

#!/usr/bin/env python3
import aiohttp
import tracemalloc
import ssl
import asyncio

def init():
    timeout = aiohttp.ClientTimeout(connect=5, sock_read=5)
    ssl_ctx = ssl._create_unverified_context()
    conn = aiohttp.TCPConnector(ssl=ssl_ctx, enable_cleanup_closed=True)
    session = aiohttp.ClientSession(connector=conn, timeout=timeout, cookie_jar=aiohttp.CookieJar(unsafe=True))
    return session

async def fetch(client):
    try:
        async with client.request('GET', url=f'https://api.pro.coinbase.com/products/BTC-USD/ticker') as r:
            msg = await r.text()
            print(msg)
    except asyncio.CancelledError:
        raise
    except Exception as err:
        print("error", err)
        pass

async def main(): 
    requests = 600
    clients = [init() for _ in range(requests)]
    tracemalloc.start()
    try:
        while True:
            await asyncio.gather(*[fetch(client) for client in clients])
            await asyncio.sleep(5)
    except asyncio.CancelledError:
        pass  # end and clean things up
    finally:
        memory_used = tracemalloc.get_traced_memory()
        snapshot = tracemalloc.take_snapshot()
        stats = snapshot.statistics('lineno')
        for stat in stats[:10]:
            print(stat)
        try:
            await asyncio.gather(*[client.close() for client in clients.values()])
        except:
            pass

asyncio.run(main())

@gvanrossum
Copy link
Member

gvanrossum commented Nov 19, 2023

Could you find a reproducer without aiohttp?

@Faolain
Copy link

Faolain commented Nov 27, 2023

Unsure if this is helpful at all or related but I came across this issue encode/uvicorn#2078 in which in the thread it is discussed/concluded that the issue of memory not being released is not isolated to uvicorn but seen in granian/gunicorn/hypercorn and as a result could be interpreter level (apologies for butchering the summary). Thread has some great charts/analysis though it is on the server level, however the different implementations w.r.t granian and uvicorn can help approximate where the issue might be surfacing if it's related. Example repo: https://github.com/Besedo/memory_issue/tree/main by @EBazarov Apologies in advance if it is unrelated and will remove the comment/create one in the right place.

gi0baro:

Given everything we stated in the upper replies, I suspect there's no much to do about this. Given the number of frameworks and servers tested, it just seems the Python interpreter is the culprit here, as even if the heap memory gets released, the RSS remains high.

@gvanrossum
Copy link
Member

Upstream is pretty hard pressed to debug this (I'm no web developer or admin). Can I ask one of the framework owners or users who are struggling with this to help us reduce their example to the point where we have a small self-contained program that demonstrates the issue?

@rojamit
Copy link
Author

rojamit commented Dec 10, 2023

Upstream is pretty hard pressed to debug this (I'm no web developer or admin). Can I ask one of the framework owners or users who are struggling with this to help us reduce their example to the point where we have a small self-contained program that demonstrates the issue?

i provided working example of leak, testing with ab gaves 1+gb/min leak, openssl version 1.1
i can provide access to a dedicated server where this bug happens (not only one server affected)

IMPORTANT: there is no problem if using python3.9 WITHOUT uvloop, with uvloop it leaks, without is not.
BUT python3.12 with or without uvloop leaks even more...

@rojamit
Copy link
Author

rojamit commented Dec 10, 2023

shorten the mvp even more

import asyncio, ssl, uvloop

class HTTP(asyncio.Protocol):
    
    def __init__(self):
        self.transport = None
        
    def connection_made(self, transport):
        self.transport = transport

    def data_received(self, data):
        self.transport.write(
            b'HTTP/1.1 200 OK\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n'
        )
        self.transport.close()
        
def make_tls_context():
    ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
    ctx.load_cert_chain('cert.crt', 'cert.key')
    return ctx

tls_context = make_tls_context()
loop = uvloop.new_event_loop()
#loop = asyncio.new_event_loop()

async def start_server(loop):
    return await loop.create_server(
        HTTP, '127.0.0.1', 443, backlog=65535,
        ssl=tls_context)

loop.run_until_complete(start_server(loop))
loop.run_forever()

@rojamit
Copy link
Author

rojamit commented Dec 10, 2023

Updated first post for clarity.
Problem not with openssl itself, it leaks without uvloop even with python3.12

@rojamit
Copy link
Author

rojamit commented Dec 10, 2023

example certificates
certs.zip

@gvanrossum
Copy link
Member

I only have a Mac, and I got the example to work and even managed to install the apache2 utils. I modified the example to print the process size once a second (rather than bothering with graphs) using psutil. Then it got messy, the ab tool often ends with an error. It does show a huge process size increase when ab sort-of works.

Now I've got an idea. Given the complexity of the ssl module (a C extension that wraps OpenSSL) it is much more likely that the problem (if there is really a problem) is due to the code in that extension module than in asyncio, whose TLS support is just a small amount of Python code.

So maybe someone who is interested in getting to the bottom of this issue could rewrite the example without the use of asyncio. Instead you could use the socketserver module. This is ancient tech, but showing the leak when just using this will prove that the problem is in the ssl extension. We can then try to narrow down what happens in the ssl module to find the leak (I'm suspecting there's some path through the code that doesn't release a buffer it owns, but that covers a lot of cases).

(If you can't figure out how to use socketserver with TLS, maybe just a basic multi-threader hand-written serving loop could work. In either case you have to make sure to follow all the rules for serving TLS, of course.)

@rojamit
Copy link
Author

rojamit commented Dec 14, 2023

@gvanrossum
Important note: Python 3.9.2 without uvloop (pure asyncio) doesn't leak, same code. But leaks using uvloop.
As far i know, after Python 3.9 the SSL code is ported from uvloop to cpython?

Biggest leak i saw is at Python 3.12 (without uvloop)

@gvanrossum
Copy link
Member

This doesn’t really help. We need someone to try and find which objects are being leaked.

@rojamit
Copy link
Author

rojamit commented Dec 15, 2023

This doesn’t really help. We need someone to try and find which objects are being leaked.

i tried different python memory profilers with no result at all, idk how to create useful dump, but inside coredump are tons of server certificates info

@gvanrossum
Copy link
Member

Might one of the magical functions in the gc module that give you all the objects be helpful?

@graingert
Copy link
Contributor

objgraph is great for this https://objgraph.readthedocs.io/en/stable/#memory-leak-example

@stalkerg
Copy link

stalkerg commented Dec 20, 2023

Sorry, what without repro (but I will try to do it anyway) but I also got the same issue with Tornado + SSL. And I can confirm, such a leak exist even without uvloop, and sometimes (if I response 404 to cloudflare proxy a lot) it's became significant.
@rojamit sorry but in that case you should try valgrind, I also will try on weekend
PS @rojamit do your project use cloudflare? I found a very annoying behavior bounded with keep-alive and which can also produce illusion of memory leak.

@ordinary-jamie
Copy link
Contributor

ordinary-jamie commented Dec 20, 2023

Summary

Findings:

  • This appears to be an issue when the SSL handshake fails here
  • When you turn of SSL verification, the memory leak won't be detected.
  • The key memory allocation that is not being cleaned is at sslproto.py:275

Minimum replication

This snippet runs the server and runs two separate pings (one insecure) in a subprocess, capturing the tracemalloc snapshots before and after each ping.

The output is as follows

Δ Memory Allocations = 2727.56kb
Δ Memory Allocations = 6.66kb # without SSL

Notice the n_iter=10

import asyncio
import asyncio.sslproto
import ssl
import tracemalloc

class HTTP(asyncio.Protocol):
    def __init__(self):
        self.transport = None

    def connection_made(self, transport):
        self.transport = transport

    def data_received(self, data):
        self.transport.write(
            b"HTTP/1.1 200 OK\r\nContent-Length: 0\r\nConnection: keep-alive\r\n\r\n"
        )
        self.transport.close()


def make_tls_context():
    ctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
    ctx.load_cert_chain(".jamie/iss109534/server.crt", ".jamie/iss109534/server.key")
    return ctx


async def start_server(loop):
    tls_context = make_tls_context()

    return await loop.create_server(
        HTTP, "127.0.0.1", 4443, backlog=65535, ssl=tls_context, start_serving=True
    )


async def ping(delay: float = 1.0, n_iter: int = 1, insecure: bool = False):

    await asyncio.sleep(delay)

    # ----------------------------------------------------------------------------------
    # Before
    current_1, _ = tracemalloc.get_traced_memory()

    # ----------------------------------------------------------------------------------
    # Run a single request
    if insecure:
        cmd = "curl --insecure"
    else:
        cmd = "curl"

    for _ in range(n_iter):
        proc = await asyncio.create_subprocess_shell(
            f"{cmd} https://127.0.0.1:4443",
            stderr=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE
        )
        await proc.communicate()

    # ----------------------------------------------------------------------------------
    # After
    current_2, _ = tracemalloc.get_traced_memory()

    print(f"Δ Memory Allocations = {(current_2 - current_1)/1000:.2f}kb")


if __name__ == "__main__":
    tracemalloc.start()

    loop = asyncio.new_event_loop()
    loop.run_until_complete(start_server(loop))
    # Run with SSL verification
    loop.run_until_complete(ping(delay=0.5, n_iter=10))
    # Run without SSL verification
    loop.run_until_complete(ping(delay=0.5, insecure=1, n_iter=10))

    loop.close()

Trace malloc snapshot

I updated ping to also take a tracemalloc.take_snapshot before and after and we see that the key allocation is here: sslproto.py:275

async def ping(delay: float = 1.0, n_iter: int = 1, insecure: bool = False):

    # ...

    snapshot_1 = tracemalloc.take_snapshot()

    # ...
    # Same as before
    # ....

    snapshot_2 = tracemalloc.take_snapshot()

    print('-'*40)
    if insecure:
        print("Insecure")

    print(f"Δ Inner Memory Allocations = {(current_2 - current_1)/1000:.2f}kb")

    top_stats = snapshot_2.compare_to(snapshot_1, 'traceback')
    print("\n[ Top stat ]")
    for stat in top_stats[:1]:
        print(stat)
        for line in stat.traceback.format(limit=25):
            print("\t", line)

    print('-'*40)

And, now we can see where the allocations are happening (with n_iter=1)

Δ Inner Memory Allocations = 283.75kb

[ Top stat ]
/Users/jamphan/git/github.com/ordinary-jamie/cpython/.jamie/iss109534/tmp02.py:91: size=256 KiB (+256 KiB), count=2 (+2), average=128 KiB
           File "/Users/jamphan/git/github.com/ordinary-jamie/cpython/.jamie/iss109534/tmp02.py", line 91
             loop.run_until_complete(ping(delay=0.5))
           File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 671
             self.run_forever()
           File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 638
             self._run_once()
           File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 1971
             handle._run()
           File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/events.py", line 84
             self._context.run(self._callback, *self._args)
           File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 224
             transport = self._make_ssl_transport(
           File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 83
             ssl_protocol = sslproto.SSLProtocol(
           File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 275
             self._ssl_buffer = bytearray(self.max_size)

Dev Mode

We can further confirm this when we run python -X dev we see the following warnings

Error on transport creation for incoming connection
handle_traceback: Handle created at (most recent call last):
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/base_events.py", line 1963, in _run_once
    handle._run()
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 966, in _read_ready
    self._read_ready_cb()
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 998, in _read_ready__get_buffer
    self._protocol.buffer_updated(nbytes)
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 436, in buffer_updated
    self._do_handshake()
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 561, in _do_handshake
    self._on_handshake_complete(exc)
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 586, in _on_handshake_complete
    self._wakeup_waiter(exc)
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 366, in _wakeup_waiter
    self._waiter.set_exception(exc)
protocol: <__main__.HTTP object at 0x1039294b0>
transport: <asyncio.sslproto._SSLProtocolTransport object at 0x103d6a210>
Traceback (most recent call last):
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/selector_events.py", line 235, in _accept_connection2
    await waiter
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 576, in _on_handshake_complete
    raise handshake_exc
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/asyncio/sslproto.py", line 557, in _do_handshake
    self._sslobj.do_handshake()
  File "/Users/jamphan/.pyenv/versions/3.12.1/lib/python3.12/ssl.py", line 917, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: TLSV1_ALERT_UNKNOWN_CA] tlsv1 alert unknown ca (_ssl.c:1000)

Edit

Using curl --cacert /path/to/cert.crt as opposed to curl -k will get the same results; just to confirm the insecure flag isn't an invalid test.

Edit 2, updated minimal replication

A simpler code example for replicating the issue...

import asyncio
import ssl
import tracemalloc


async def main(certfile, keyfile):
    tracemalloc.start()

    # Start server with SSL
    ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
    ssl_context.load_cert_chain(certfile, keyfile)
    await asyncio.start_server(
        lambda _, w: w.write(b"\0"), "127.0.0.1", "4443", ssl=ssl_context
    )

    current_1, _ = tracemalloc.get_traced_memory()

    # Ping the server with cURL
    proc = await asyncio.create_subprocess_shell(
        f"curl https://127.0.0.1:4443 2> /dev/null"
    )
    await proc.communicate()

    current_2, _ = tracemalloc.get_traced_memory()
    print(f"{(current_2-current_1)/1000:.2f}KB")


if __name__ == "__main__":

    asyncio.run(
        main(certfile="server.crt", keyfile="server.key")
    )

@gvanrossum
Copy link
Member

@mjpieters Just curious -- have you ever experienced anything similar?

@gvanrossum
Copy link
Member

@ordinary-jamie

The key memory allocation that is not being cleaned is at sslproto.py:275

Excellent find!! Can you think of a suitable place to free this memory? I think it must be passed into OpenSSL via a C wrapper and that C wrapper must somehow forget to free it? So it would have to be in the C code for _ssl where it receives that buffer.

I don't think the leak is in the Python code, even though that's where it's being allocated (Python objects don't easily leak attribute values).

@stalkerg
Copy link

stalkerg commented Feb 19, 2024

Did you also get 3100+ megabytes of RAM that never gets deallocated with default python and ab concurrency of 10,000 clients?

I can reproduce it on 3.12 on Gentoo but I can't see it inside container base on alpine or debian, it's de allocate properly.
Also, I can't see constant leak, it's just allocate a lot and keep it without de allocation (more like slot for TCP connection).
Also, I can't reproduce it by wrk even if I try to use same amount of connections.

@geraldog
Copy link

While I do confirm @stalkerg observation that wrk does not leak - it allocates 3000+ megabytes of RAM but that gets deallocated at the end of the test - it might be wise to notice that while ab does a clean SSL shutdown, wrk does not, and that itself is enough to trigger the memory cleanup.

So I think we need to explore that angle perhaps, that when connection_lost() is called there's actually memory cleanup. I'll try confirmation of such hypothesis.

But @stalkerg please don't kid yourself that this is "pooling" of some sort. While I did observe like you, repeated ab tests didn't leak anymore memory if - big if there - you kept the number of clients constant, it is behavior that wasn't present for certain in 3.9, since 3.9 doesn't leak as per mine and specially @rojamit observations. This is a real live one, a big leak, it's going to be hard to sell the version that this is some optimization of some kind, that RAM is being glutton-ed for some good reason, a trade-off of sorts.

@stalkerg
Copy link

stalkerg commented Feb 19, 2024

This is a real live one, a big leak, it's going to be hard to sell the version that this is some optimization of some kind, that RAM is being glutton-ed for some good reason, a trade-off of sorts.

oh, it was a good hint. It's can confuse everything even more, but still. I tried to use jemalloc (jemalloc.sh python3.12) and results became a different:

  1. Peak mem usage similar (very big)
  2. But mem usage reduce by 70% after ab stop working.
  3. If I run wrk after that it's de-allocate even more, and my process consume 70-80Mb only.

btw on my Gentoo I use glibc 2.38, on debian is 2.36 and in alpine is not a glibc.
My theory:

  1. We have a leak but it's not so big itself. And yes, it seems like somehow tied with proper SSL close.
  2. At the same time, intarray or something else is making a new glibc allocator a little bit sick. He never return to OS memory in a such situation.

UPDATE:
I can now reproduce issue on debian bookworm inside container, but alpine still working very well.

UPDATE2:
I connected to the process by gdb and did call malloc_trim(0) and memory is released on my gentoo machine.

@geraldog
Copy link

oh, it was a good hint. It's can confuse everything even more, but still. I tried to use jemalloc (jemalloc.sh python3.12) and results became a different:

1. Peak mem usage similar (very big)

2. But mem usage reduce by 70% after `ab` stop working.

I confirm @stalkerg observation that with jemalloc there is deallocation of the buffers after ab testing.

UPDATE2: I connected to the process by gdb and did call malloc_trim(0) and memory is released on my gentoo machine.

I also confirm @stalkerg observation that issuing call malloc_trim(0) from the gdb prompt will release the memory back to the system. So I was wrong, and this was an "optimization" after all.

@stalkerg
Copy link

stalkerg commented Feb 19, 2024

@graingert @rojamit seems like any numbers for MALLOC_TRIM_THRESHOLD_ env var can mitigate the situation. It should be MALLOC_TRIM_THRESHOLD_=131072 by default (128KB) but for some reason even that is change behavior a little.

UPDATE: Okey, it's happened because such env var disable dynamic threshold. https://github.com/lattera/glibc/blob/master/malloc/malloc.c#L5031

UPDATE: this is actual values during our test: {trim_threshold = 532480, top_pad = 131072, mmap_threshold = 266240, arena_test = 8, arena_max = 0, thp_pagesize = 0, hp_pagesize = 0, hp_flags = 0, n_mmaps = 3, n_mmaps_max = 65536, max_n_mmaps = 5, no_dyn_threshold = 0, mmapped_mem = 610304, max_mmapped_mem = 1142784, sbrk_base = 0x555555558000 "", tcache_bins = 64, tcache_max_bytes = 1032, tcache_count = 7, tcache_unsorted_limit = 0}

@geraldog
Copy link

Confirming that setting MALLOC_TRIM_THRESHOLD_=131072 or any other number will release memory back to the system. Thank you very much @stalkerg that was impressive!

@rojamit
Copy link
Author

rojamit commented Feb 26, 2024

@geraldog maybe wrk doesnt leak because it uses Keep-Alive by default?

Some tests with malloc_trim(0) call every second (in other thread) and huge requests count:

Python 3.12 with patch, asyncio loop:

Requests count: 1850000 RAM: 919 MB, runtime: 698 s # ab end
Requests count: 1850000 RAM: 2403 MB, runtime: 699 s
Requests count: 1850000 RAM: 2511 MB, runtime: 700 s
Requests count: 1850000 RAM: 1978 MB, runtime: 701 s
Requests count: 1850000 RAM: 1978 MB, runtime: 702 s
# allocated ram amount shrinks while ab running & increases after ab done, strange

Python 3.9, asyncio loop:

Requests count: 542830 RAM: 832 MB, runtime: 186 s
Requests count: 546091 RAM: 803 MB, runtime: 187 s
Requests count: 548113 RAM: 816 MB, runtime: 188 s
Requests count: 550000 RAM: 750 MB, runtime: 189 s
Requests count: 550000 RAM: 710 MB, runtime: 190 s
Requests count: 550000 RAM: 470 MB, runtime: 191 s
Requests count: 550000 RAM: 166 MB, runtime: 192 s # ab end

Python 3.12 with patch, uvloop, desired result:

Requests count: 550000 RAM: 760 MB, runtime: 146 s # ab end
Requests count: 550000 RAM: 720 MB, runtime: 147 s
Requests count: 550000 RAM: 525 MB, runtime: 148 s
Requests count: 550000 RAM: 73 MB, runtime: 149 s
Requests count: 550000 RAM: 73 MB, runtime: 150 s

(patch doesnt affect uvloop as i remember it has own ssl module)

Looks like export MALLOC_TRIM_THRESHOLD_=131072 has no effect for me, only malloc_trim(0) does.

@stalkerg
Copy link

@rojamit why for your first test you have much more requests? Anyway, you should try update glibc and maybe linux kernel.

Looks like export MALLOC_TRIM_THRESHOLD_=131072 has no effect for me, only malloc_trim(0) does.

as I said, it's just mitigation, it's also not solve my issue on a long run. Did you try to replace allocator? If malloc_trim(0) is work, it means issue not in Python but in glibc allocation algorithm. However probably recent changes and memoryview screw it up.
Also who can wrote to glibc mail list? This case should be helpful for them.

@rojamit
Copy link
Author

rojamit commented Mar 1, 2024

@stalkerg higher requests count is not linked with memory consumption in this test

uname -r
5.10.0-26-amd64

Debian GNU/Linux 11 (bullseye)

@stalkerg
Copy link

@rojamit @geraldog sorry for mention you, but can somebody a way to write to glibc maillist? And in general we can close this ticket.

@geraldog
Copy link

geraldog commented Apr 5, 2024

@rojamit @geraldog sorry for mention you, but can somebody a way to write to glibc maillist? And in general we can close this ticket.

Sorry @stalkerg but I think the burden is on you to write a Bug Report to https://sourceware.org/mailman/listinfo/libc-stable and Cc: the relevant people involved in glibc development. It was you who discovered it.

There's a high possibility they won't fix however, citing that it's an actual optimization for some cases and that the burden is on the user to tell glibc to deallocate. And then we're basically stuck, all that is left is noting in documentation that this is a known issue...

@fifdee
Copy link

fifdee commented Apr 24, 2024

Hello, I experienced the problem with my Django ASGI web app and finally found this topic. I use railway.app as a hosting service, I also described how to reproduce the problem with minimal Django ASGI app on stackoverflow: https://stackoverflow.com/questions/78339166/python-django-asgi-memory-leak-updated-2
I tried Python 3.9 without uvloop config but it also leaks. Can't check Python 3.12 though, as nixpacks doesn't support it yet.

@stalkerg
Copy link

@fifdee did you try use different allocator? You can find instructions in this topic.

@fifdee
Copy link

fifdee commented Apr 24, 2024

@stalkerg I haven't. I'm not sure if it's possible with PaaS host like Railway, currently I don't have much experience with this kind of stuff.

@fifdee
Copy link

fifdee commented Apr 25, 2024

I tried setting environment variable PYTHONMALLOC=malloc but it didn't change anything.
It looks like malloc_trim(0) is working for me, I need to observe how web server behaves in the long run.
I used piece of code found here: https://www.softwareatscale.dev/p/run-python-servers-more-efficiently

And for those who use Django ASGI and suffer the same problem, this is modified asgi.py:

import ctypes
import random
import time
from threading import Thread
import psutil
import os

import django
from django.core.handlers.asgi import ASGIHandler

MEMORY_THRESHOLD = 1024 * 1024 * 256  # 256MB


def trim_memory() -> int:
    libc = ctypes.CDLL("libc.so.6")
    return libc.malloc_trim(0)


def should_trim_memory() -> bool:
    # check if we're close to our OOM limit
    # through psutil
    process = psutil.Process(os.getpid())
    return process.memory_info().rss > MEMORY_THRESHOLD


def trim_loop() -> None:
    while True:
        time.sleep(random.randint(30, 60))  # jitter between 30 and 60s
        if not should_trim_memory():
            continue

        ret = trim_memory()
        print("trim memory result: ", ret)


def get_asgi_application():
    """
    The public interface to Django's ASGI support. Return an ASGI 3 callable.

    Avoids making django.core.handlers.ASGIHandler a public API, in case the
    internal implementation changes or moves in the future.
    """

    django.setup(set_prefix=False)

    thread = Thread(name="TrimThread", target=trim_loop)
    thread.daemon = True
    thread.start()

    return ASGIHandler()


os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'v45.settings')

application = get_asgi_application()

@geraldog
Copy link

geraldog commented Apr 25, 2024

@fifdee did you try MALLOC_TRIM_THRESHOLD_?

Like this:

Confirming that setting MALLOC_TRIM_THRESHOLD_=131072 or any other number will release memory back to the system. Thank you very much @stalkerg that was impressive!

@fifdee
Copy link

fifdee commented Apr 25, 2024

@fifdee did you try MALLOC_TRIM_THRESHOLD_?

Like this:

Confirming that setting MALLOC_TRIM_THRESHOLD_=131072 or any other number will release memory back to the system. Thank you very much @stalkerg that was impressive!

Ah, I missed this one. I'll check it also, thanks.

@payam54
Copy link

payam54 commented May 6, 2024

We're experiencing the same problem and setting env var MALLOC_TRIM_THRESHOLD_=131072 doesn't seem to have any effects. The Active File Memory keeps increasing to the point that kernel intervenes and dumps it ungracefully.

Is there a plan to add the fix in the near-future releases?

@geraldog
Copy link

geraldog commented May 6, 2024

We're experiencing the same problem and setting env var MALLOC_TRIM_THRESHOLD_=131072 doesn't seem to have any effects. The Active File Memory keeps increasing to the point that kernel intervenes and dumps it ungracefully.

Are you absolutely sure it is the same problem, then? Feel free to provide details.

Is there a plan to add the fix in the near-future releases?

The "bug" discussed here is not a Python bug but rather a glibc quirk or optimization.

@rodrigomeireles
Copy link

Since there are no plans to fix this, shouldn't Python docs very clearly and explicitly recommend people to not use Python 3.11+ for ASGI applications? Or maybe disclaim that your application will leak memory unless you try one of the fixes in this discussion? This seems too big to not be documented.

@stalkerg
Copy link

stalkerg commented Jul 2, 2024

@rodrigomeireles it's not about ASGI apps, we have much smaller repro. It's more about glibc quirks compatibility. Unfortunately, I didn't have time last few months and I really doesn't like mail lists.
In the best it should be solved on glibc side. If you have time you can send message to glibc and point this ticket.

@rodrigomeireles
Copy link

@stalkerg I understand it's not exactly about ASGI apps but reading the thread (and also why I'm here) it seems a lot of people found memory leaks in their production APIs.

This should be highlighted as I (and many others) could've avoided a lot of headache had I not upgraded my applications to the latest Python (or used a different language altogether).

If you give me some guidance I could detail the issue to their email list but I believe that would give you more work than messaging them yourself. Nonetheless I could try it if you wish.

@gyula-lakatos
Copy link

This should be highlighted as I (and many others) could've avoided a lot of headache had I not upgraded my applications to the latest Python (or used a different language altogether).

I can absolutely relate to this because we ended up rewriting the whole app in Rust. One of the main reasons was this bug (and async libs being quite buggy in general). It took me days to debug what's going on and finally landed in here.

@fifdee
Copy link

fifdee commented Jul 3, 2024

In my case MALLOC_TRIM_THRESHOLD_=131072 seems to work. Those vertical lines are project updates. I'm using Python 3.11 and Django 5.0.
I've not seen memory consumption higher than 450 MB, but I got rid of media passing through Django server, just JSON requests and responses.
Zrzut ekranu 2024-07-03 123254

@nikumar1206
Copy link

Are there any updates on this issue? Did anyone open up a bug report with the glibc maintainers?

This issue has made trying to run a long-running Python server nearly impossible.

@stalkerg
Copy link

stalkerg commented Oct 7, 2024

@nikumar1206 I did attempt a few moments ago https://sourceware.org/pipermail/libc-help/2024-October/006791.html but not sure.

@geraldog
Copy link

geraldog commented Oct 7, 2024

This is helpful @stalkerg assuming the glibc maintainers actually get to reading the thread they will eventually come up with something, hopefully not a won't-fix

@1st1
Copy link
Member

1st1 commented Oct 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Todo
Development

No branches or pull requests