-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
free() invalid pointer #3777
Comments
Hey I see you have passed the |
@jbkyang-nvi This is the minimum code to reproduce this issue. I removed the part in exectute function to simplify it. Just add the config file. Can you reproduce this issue ? |
Can you try to add the InferResponse to your minimum example and see if the |
Yes, I can see the issue. In fact, the original codes are quite long, that's why I delete all of them. |
Can you run triton with valgrind? That should produce a stack trace showing exactly where the invalid pointer error is occurring. |
Because you are not even sending any requests to the server, and because your model is stripped down, I would assume that you don't need to dockerfile build step. Do you see the same failure even if you run with the NGC Tritonserver image? We will try your steps and see if we can reproduce. |
@Slyne I was able to reproduce the bug with your steps. The most interesting thing is that if I remove |
Maybe. I don't know how to read the valgrind log and didn't find any keywords related to kaldifeat. |
I am certain that the issue is specific to kaldifeat and it's use within triton's python infrastructure. I have tried various other python modules and they all work perfectly fine. Looking more closely into the kaldifeat module, we can see that it compiles shared object files.
Because the model with kaldifeat is running in a forked process we can not see those logs. |
So the python stub process may have some leaks or other issues due to kaldifeat, but that should not cause the tritonserver process to have an invalid free, should it? Perhaps there is something wrong in the cleanup logic of the python backend? |
Opened an issue in kaldifeat project for tracking: csukuangfj/kaldifeat#26 |
Thanks @Slyne for filing the issue. We will continue investigating on our side and let you know if I find something strange.
The clean up logic looks ok to me but I haven't got a chance to do a deep dive. The respective finalize functions are being called appropriately. Will do a deep dive once I get some cycles to work on it. |
PR #3868 fixes |
The module This can be verified using:
However, we do not see the
The trace demostrates the free() invalid pointer originates in main.cpp : #include <pybind11/embed.h> // everything needed for embedding
#include <iostream>
namespace py = pybind11;
int main() {
py::scoped_interpreter guard{}; // start the interpreter and keep it alive
py::module_ kaldifeat = py::module_::import("kaldifeat");
std::cerr << "Module Loaded" << std::endl;
} CMakeLists.txt
In the directory with these file run the following commands:
When running the example we see the below issue:
The backtrace for the Invalid free for example:
As you can see the free() invalid pointer is raised even when running outside Triton Python Backend. It is coming from Closing the issue as the issue is reproducible outside Triton and is shown to manifest when importing kaldifeat within pybind11 interpreter. |
For the given demo #include <pybind11/embed.h> // everything needed for embedding
#include <iostream>
namespace py = pybind11;
int main() {
py::scoped_interpreter guard{}; // start the interpreter and keep it alive
py::module_ kaldifeat = py::module_::import("kaldifeat");
std::cerr << "Module Loaded" << std::endl;
} If you change py::module_ kaldifeat = py::module_::import("kaldifeat"); to py::module_ kaldifeat = py::module_::import("torch"); It will still
produces as many warnings as
Also,
produces many warnings. I am not an expert about |
@csukuangfj Thank you! Did you see the
It's quite strange when we |
Yes, it is reproducible. |
@csukuangfj @Slyne That is very strange. I changed kaldifeat to torch in reproducer:
And I don't see the free() invalid any more.
However, for kaldifeat I see:
The kaldifeat version:
The torch version:
I tried numpy too and it runs smoothly. @Slyne Can you try confirming whether |
I just created a GitHub repo to reproduce the You can see the output from GitHub actions at A screenshot of the output is given below:
[edited]: So memory issues with |
@tanmayv25 Are you running in a docker environment ? One of our colleague also found this issue and he didn't use kaldifeat. He only imports pytorch and tensorrt in python backend. |
Yes. I am compiling and running the reproducer in docker container. |
Hello, I would like to ask a question. I am using triton 22.04-py3 version of docker. When the specified backend is python, the free problem also occurs when unloading the model. Is it because of kaldifeat? |
Description
When I shut down triton inference server, there's one line:
Triton Information
What version of Triton are you using? 21.12
Are you using the Triton container or did you build it yourself?
Here's the dockerfile:
Here's the model.py.
config.pbtxt
To Reproduce
Expected behavior
Expect no such line.
I test on 2 different machine. Both will give this error? warning? One will not generate core, and another will generate a core file.
The text was updated successfully, but these errors were encountered: