Memory leak in client when using python wrapper to send request to service #822

crazyhank · 2021-09-14T08:15:10Z

I found this problem in latest Galactic release, it is simple to reproduce, write a simple service(C++) and a client (Python), memory leak will definitly happen. It will not happen when using C++ in client.

Data struct:

int32 seq
uint64 time
byte[1048576] input_tensor
---
int32 seq
uint64 time
byte[1048576] output_tensor

Client side code:

import sys

from my_struct.srv import InferService
import rclpy
from rclpy.node import Node

import time


class MinimalClientAsync(Node):

    def __init__(self):
        super().__init__('minimal_client_async')
        self.cli = self.create_client(InferService, 'infer')
        while not self.cli.wait_for_service(timeout_sec=1.0):
            self.get_logger().info('service not available, waiting again...')
        self.req = InferService.Request()

    def send_request(self):
        self.req.seq = 2000
        self.req.time = 200
        self.future = self.cli.call_async(self.req)


def main(args=None):
    rclpy.init(args=args)

    minimal_client = MinimalClientAsync()
    for _ in range(100000):
        minimal_client.send_request()

        while rclpy.ok():
            rclpy.spin_once(minimal_client)
            if minimal_client.future.done():
                try:
                    response = minimal_client.future.result()
                except Exception as e:
                    minimal_client.get_logger().info(
                        'Service call failed %r' % (e,))
                else:
                    minimal_client.get_logger().info(
                        'Result of inference: resp seq %d' % (response.seq))
                break
            #time.sleep(0.5)


    minimal_client.destroy_node()
    rclpy.shutdown()


if __name__ == '__main__':
    main()

You can write a simple service code, just receive the request and do nothing and send a response to client.

The text was updated successfully, but these errors were encountered:

aprotyas · 2021-09-14T14:12:25Z

Do you mind explicating why you think there will be a memory leak?

crazyhank · 2021-09-15T02:08:51Z

This problem is found in our project, I need time to seperate test code to reproduce it. As you see, there is a 1MB data member in response data struct, each time when I receive a response from servcer, I see a about 1MB memory used for client node, so a memory usage is increased as test continues.
I seems that reponse buffer is not released for python wrapper, no problem is found when I switch to C++ code.

aprotyas · 2021-09-15T02:52:39Z

Thanks for the explanation! It would be great if you could provide a self-contained example that could be used for stress-testing.

crazyhank · 2021-09-15T03:37:34Z

Hi, I list the key information, I think it enough and easy to reproduce the problem.

1、Request/Response data struct:

int32 seq
---
int32 seq
byte[1048576] output_tensor

2、Service code（C++）:

#include "rclcpp/rclcpp.hpp"
#include "my_struct/srv/infer_service.hpp"
#include <memory>

void do_infer(const std::shared_ptr<my_struct::srv::InferService::Request> request,
                                                                        std::shared_ptr<my_struct::srv::InferService::Response> response)
{
        static int index = 0;
        RCLCPP_INFO(rclcpp::get_logger("server"), "Incoming request: seq = %d", request->seq);

        response->seq           = index++;
}

int main(int argc, char **argv)
{
        rclcpp::init(argc, argv);

        std::shared_ptr<rclcpp::Node> node = rclcpp::Node::make_shared("infer_service");

        rclcpp::Service<my_struct::srv::InferService>::SharedPtr service = node->create_service<my_struct::srv::InferService>("infer", &do_infer);

        RCLCPP_INFO(rclcpp::get_logger("server"), "Ready to receive infer request ...");

        rclcpp::spin(node);
        rclcpp::shutdown();
}

3、Client Code (Python)：

import sys

from my_struct.srv import InferService
import rclpy
from rclpy.node import Node

import time


class MinimalClientAsync(Node):

    def __init__(self):
        super().__init__('minimal_client_async')
        self.cli = self.create_client(InferService, 'infer')
        while not self.cli.wait_for_service(timeout_sec=1.0):
            self.get_logger().info('service not available, waiting again...')
        self.req = InferService.Request()
        self.index = 0

    def send_request(self):
        self.index += 1
        self.req.seq = self.index
        self.future = self.cli.call_async(self.req)


def main(args=None):
    rclpy.init(args=args)

    minimal_client = MinimalClientAsync()
    for _ in range(100000):
        minimal_client.send_request()

        while rclpy.ok():
            rclpy.spin_once(minimal_client)
            if minimal_client.future.done():
                try:
                    response = minimal_client.future.result()
                except Exception as e:
                    minimal_client.get_logger().info(
                        'Service call failed %r' % (e,))
                else:
                    minimal_client.get_logger().info(
                        'Result of inference: resp seq %d' % (response.seq))
                break
            time.sleep(0.5)


    minimal_client.destroy_node()
    rclpy.shutdown()


if __name__ == '__main__':
    main()

BTW, python wrapper performance is very slow when your data struct include a big byte array as you see in the above example, but it is another problem.
Hope to get your feedback and use ROS2 in our project successfully!

aprotyas · 2021-09-15T03:41:27Z

BTW, python wrapper performance is very slow when your data struct include a big byte array as you see in the above example, but it is another problem.

Yeah, that's a known problem: ros2/rosidl_python#134 (Edit: looks like you're aware of this already)

I will try the example that you provided. Thanks!

crazyhank · 2021-09-17T00:45:09Z

Hi, have you reproduced the problem?

…rvice ros2/rclpy#822 Signed-off-by: Tomoya Fujita <[email protected]>

fujitatomoya · 2021-09-21T05:29:00Z

problem confirmed, in the process space there is a lot of heap memory area mapped. as long as client/service running, virtual/physical memory increases.
i created the reproducible sample program, https://github.com/fujitatomoya/ros2_test_prover/tree/master/prover_rclpy.

under colcon envirnoment,

colcon build --symlink-install --packages-select prover_interfaces prover_rclpy
source install/local_setup.bash
ros2 run prover_rclpy rclpy_server_822
ros2 run prover_rclpy rclpy_client_822

CC: @Barry-Xu-2018 @iuhilnehc-ynos could you take a look if you have time? i guess this is memory leak, if i am not mistaken...

aprotyas · 2021-09-21T06:03:29Z

I can confirm the reported issue as well using the sample program linked above. For discussion/debugging convenience, I've produced a couple of charts that show what's happening. Use this script to reproduce said charts when demonstrating a fix.

I won't have the bandwidth to return to this issue for a while, but from a cursory overview it does look like a memory leak in the client.

iuhilnehc-ynos · 2021-09-22T06:57:11Z

I'd like to share something about this issue.

__convert_to_py(void * raw_ros_message) doesn't own the raw_ros_message.

service server

Service::service_take_request
	auto taken_request = create_from_py(pyrequest_type);   // allocate a buffer
	...
	result_tuple[0] = convert_to_py(taken_request.get(), pyrequest_type);

	taken_request.release();  // Delete this line because this function have the responsibility to deallocate the buffer
	...

convert_to_py (using PyBytes_FromStringAndSize, Py_BuildValue, etc) copy data from the raw message instead of owning it.

same for service client

llapx · 2021-09-28T01:57:12Z

unique_ptr's release() just release the ownership, but does not free the memory which point to, we can replace it with reset(nullptr).
service,
client.

aprotyas · 2021-09-28T02:27:37Z

unique_ptr's release() just release the ownership, but does not free the memory which point to

@llapx you are right. For reference, [std::unique_ptr<T,Deleter>::release] "The caller is responsible for deleting the object."

I believe replacing with just reset() would suffice too.

iuhilnehc-ynos · 2021-09-28T02:30:59Z

You don't need to do reset() manually, just let the unique_ptr with its destroy_ros_message_function do the magic.

Refer to

rclpy/rclpy/src/rclpy/action_client.cpp

Lines 102 to 121 in 691e4fb

    
           #define TAKE_SERVICE_RESPONSE(Type) \ 
        
             /* taken_msg is always destroyed in this function */ \ 
        
             auto taken_msg = create_from_py(pymsg_type); \ 
        
             rmw_request_id_t header; \ 
        
             rcl_ret_t ret = rcl_action_take_ ## Type ## _response( \ 
        
               rcl_action_client_.get(), &header, taken_msg.get()); \ 
        
             int64_t sequence = header.sequence_number; \ 
        
             /* Create the tuple to return */ \ 
        
             if (RCL_RET_ACTION_CLIENT_TAKE_FAILED == ret || RCL_RET_ACTION_SERVER_TAKE_FAILED == ret) { \ 
        
               return py::make_tuple(py::none(), py::none()); \ 
        
             } else if (RCL_RET_OK != ret) { \ 
        
               throw rclpy::RCLError("Failed to take " #Type); \ 
        
             } \ 
        
             return py::make_tuple(sequence, convert_to_py(taken_msg.get(), pymsg_type)); \ 
        
           py::tuple 
        
           ActionClient::take_goal_response(py::object pymsg_type) 
        
           { 
        
             TAKE_SERVICE_RESPONSE(goal) 
        
           }

crazyhank · 2021-09-28T09:30:42Z

@iuhilnehc-ynos
After deleting these two release lines, problem disappeared.

In my understanding, release function will take the owership of buffer from rclcpp to caller, right? but python wrapper does not get the unique pointer, so this buffer will never have a chance to be freed. Correct me if I am wrong!

fujitatomoya · 2021-09-29T00:11:09Z

@iuhilnehc-ynos @llapx

either of you, can you make PR against this issue? let's review and fix the problem in the mainline.

fujitatomoya · 2021-09-29T00:12:08Z

okay, i see #828, one step behind... 😢 sorry!

fujitatomoya · 2021-09-30T15:38:38Z

i will go ahead to close this.

fujitatomoya added the more-information-needed Further information is required label Sep 14, 2021

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Sep 21, 2021

Memory leak in client when using python wrapper to send request to se…

d0963d5

…rvice ros2/rclpy#822 Signed-off-by: Tomoya Fujita <[email protected]>

aprotyas mentioned this issue Sep 28, 2021

Fix memory leak. #828

Merged

fujitatomoya closed this as completed Sep 30, 2021

aprotyas mentioned this issue Jan 5, 2022

Fix memory leak. (backport #828) #840

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in client when using python wrapper to send request to service #822

Memory leak in client when using python wrapper to send request to service #822

crazyhank commented Sep 14, 2021 •

edited

Loading

aprotyas commented Sep 14, 2021

crazyhank commented Sep 15, 2021

aprotyas commented Sep 15, 2021

crazyhank commented Sep 15, 2021

aprotyas commented Sep 15, 2021 •

edited

Loading

crazyhank commented Sep 17, 2021

fujitatomoya commented Sep 21, 2021

aprotyas commented Sep 21, 2021 •

edited

Loading

iuhilnehc-ynos commented Sep 22, 2021

llapx commented Sep 28, 2021 •

edited

Loading

aprotyas commented Sep 28, 2021

iuhilnehc-ynos commented Sep 28, 2021

crazyhank commented Sep 28, 2021

fujitatomoya commented Sep 29, 2021

fujitatomoya commented Sep 29, 2021

fujitatomoya commented Sep 30, 2021

Memory leak in client when using python wrapper to send request to service #822

Memory leak in client when using python wrapper to send request to service #822

Comments

crazyhank commented Sep 14, 2021 • edited Loading

aprotyas commented Sep 14, 2021

crazyhank commented Sep 15, 2021

aprotyas commented Sep 15, 2021

crazyhank commented Sep 15, 2021

aprotyas commented Sep 15, 2021 • edited Loading

crazyhank commented Sep 17, 2021

fujitatomoya commented Sep 21, 2021

aprotyas commented Sep 21, 2021 • edited Loading

iuhilnehc-ynos commented Sep 22, 2021

llapx commented Sep 28, 2021 • edited Loading

aprotyas commented Sep 28, 2021

iuhilnehc-ynos commented Sep 28, 2021

crazyhank commented Sep 28, 2021

fujitatomoya commented Sep 29, 2021

fujitatomoya commented Sep 29, 2021

fujitatomoya commented Sep 30, 2021

crazyhank commented Sep 14, 2021 •

edited

Loading

aprotyas commented Sep 15, 2021 •

edited

Loading

aprotyas commented Sep 21, 2021 •

edited

Loading

llapx commented Sep 28, 2021 •

edited

Loading