Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Memory Leak with tokio::TcpStream #2

Open
sampaioletti opened this issue Oct 11, 2023 · 6 comments
Open

Possible Memory Leak with tokio::TcpStream #2

sampaioletti opened this issue Oct 11, 2023 · 6 comments

Comments

@sampaioletti
Copy link

Thanks for the work you've done on bringing tokio to the esp I think its a great scenario, its the closest we have been to successfully sharing our codebase from other platforms.

I'm trying to figure out how to narrow this down but tokio tcp streams seem to cause a memory leak of around 250b that I have been unable to duplicate on other platforms. It appears to be when a connection never happens.

// this section leaks on the heap
loop {
      log::info!(
          "Stack High Water Mark {} Heap {}",
          unsafe { esp_idf_sys::uxTaskGetStackHighWaterMark2(std::ptr::null_mut()) },//no leak
          unsafe {
              esp_idf_sys::heap_caps_get_free_size(esp_idf_sys::MALLOC_CAP_8BIT) //leaks about 250b
          }
      );
      let conn = tokio::net::TcpStream::connect("192.168.1.10:12345").await; //this endpoint doesn't exist so connection will fail
  
      match conn{
          Ok(_) => {}
          Err(_) => {}
      }
      tokio::time::sleep(tokio::time::Duration::from_secs(5)).await;
  }

// this non async version does not
loop {
    log::info!(
        "Stack High Water Mark {} Heap {}",
        unsafe {
            esp_idf_sys::uxTaskGetStackHighWaterMark2(std::ptr::null_mut())
        },
        unsafe {
            esp_idf_sys::heap_caps_get_free_size(esp_idf_sys::MALLOC_CAP_8BIT)
        } //stays consistent
    );

    let conn = std::net::TcpStream::connect("192.168.1.10:12345"); //this endpoint doesn't exist so connection will fail

    match conn {
        Ok(_) => {}
        Err(_) => {}
    }
    std::thread::sleep(std::time::Duration::from_secs(5));
}

I've tried playing with time to see if the connections weren't being released, but it never stabilized it will decrease until it runs out of memory, and the sync version stabilizes after the first allocation.

Any thoughts on what might cause this, or how i could figure out what might be causing it. I'm not a esp expert (yet) so i'm still trying to learn the tooling.

Thanks!

@jasta
Copy link
Owner

jasta commented Oct 11, 2023

Great catch! That's the kind of thing I was really hoping we could find before moving to formal support (i.e. removing the experimental mio flag). That said, I can't think of any reason why the poll implementation I created would do this, there's no unsafe code or anything particular fancy going on. Hmm, a few things come to mind to help narrow it down though:

  1. Check that UdpSocket does the same (i.e. send_to to a non-routable address or closed port).
  2. Confirm that TcpSocket.connect to a working address does NOT leak.
  3. Check that TcpListener.accept might do the same (accept a connection then close it immediately). For this you can test with a host machine running: while :; do echo foo | nc espressif 12345; sleep 1; done to just spam it with connections.

If indeed none of those scenarios produce unexpected results then I'd start looking into the Tokio TcpStream error path to see if there's anything suspicious going on like a mem::forget call that sounds pretty hand waivey or is host specific in a way that wouldn't match on #[cfg(unix)] (esp32 is considered a UNIX implementation). You might start your journey down that path here: https://github.com/tokio-rs/tokio/blob/master/tokio/src/net/tcp/socket.rs#L653

Personally I'd probably get a debugger set up and step through each line in tokio until you get to the error path. I'd have a separate thread calling the print heap function call you're using and just watching it carefully as you go. It's a bit of a pain, you've gotta get a USB cable that has break out jumper cables that you can connect to the USB_D-/D+ pins, then set up openocd (at least that's what I used in the past). You also have to disable the watchdog timer which will likely fire as you're stepping through. See here for more details: https://esp-rs.github.io/book/tooling/debugging/openocd.html. Also note that I find the ESP32-C3 is way easier to use because the RISC-V support is better than Xtensa across various toolchains but you should be able to get any chip working. It's a good investment if you're new to embedded because it can be really tempting to just never use the debugger and then spend days and days tearing your hair out trying to fix simple bugs :)

@jasta
Copy link
Owner

jasta commented Oct 11, 2023

Oh and with verbose logging you should see the console spamming each select fd set so you can check for an easy one like I am not removing the failed fd from the select set. If the set just keeps getting bigger and bigger then it was definitely my fault :P

@sampaioletti
Copy link
Author

sampaioletti commented Oct 12, 2023 via email

@sampaioletti
Copy link
Author

I setup the debugger and have yet to be able to figure out where its happening. I'll have to keep practicing with it until i get better...not quite as easy to debug as i'm used to (:

If the socket connects and is then subsequently dropped it doesn't leak so its something specific to attempting to connect not being freed on failure, just haven't figured out what.

Bind doesn't leak either.

I did try to run the code as identically as possible with valgrind on wsl and had no memory leaks

@sampaioletti
Copy link
Author

tokio::net::UnixStream and tokio::net::UdpSocket dont leak

@sampaioletti
Copy link
Author

sampaioletti commented Jan 15, 2024

I haven't given up on this issue, and I'm still looking into it, but if anyone comes across this I was able to work around it by creating a std::net::TcpStream, setting the non_blocking to true, and using the tokio::net::TcpStream::from_std method upon success.

This really makes me think that it is not necessarily due to the tokio runtime implementation but something about the way it creates/drops the underlaying lwIP connection in an async context...maybe there is some reliance on a drop or something that doesn't have an async equivalent. So not giving up..but the workaround is keeping my project moving until I can circle back and figure this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants