Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault on nodelet close #39

Open
protobits opened this issue Jul 8, 2014 · 23 comments
Open

segmentation fault on nodelet close #39

protobits opened this issue Jul 8, 2014 · 23 comments

Comments

@protobits
Copy link

I have the following launch file:

<launch>
  <node pkg="nodelet" type="nodelet" name="camera_nodelet_manager" args="manager" output="screen">
    <param name="num_worker_threads" value="20"/>
  </node>

  <node pkg="nodelet" type="nodelet" name="left" args="load camera1394/driver camera_nodelet_manager">
    <param name="guid" value="00b09d0100cd6e9e"/>
    <param name="bayer_pattern" value="rggb"/>
    <param name="bayer_method" value="HQ"/> <!-- esto tiene que ser distinto de "" (por ej "HQ") si quiero rectificar por nodelet -->
    <param name="use_ros_time" value="true"/>
    <param name="reset_on_open" value="true"/>
 </node> 

  <!--<node pkg="nodelet" type="nodelet" name="left_debayer" args="load image_proc/debayer /camera_nodelet_manager">
    <remap from="image_raw" to="camera/image_raw"/>
    <remap from="image_color" to="camera/image_color"/>
    <remap from="image_mono" to="camera/image_mono"/>
  </node> -->


  <node pkg="nodelet" type="nodelet" name="left_rectify" args="load image_proc/rectify /camera_nodelet_manager">
    <remap from="image_mono" to="camera/image_raw"/> <!-- remapear image_mono a image_raw, en caso de que el driver este haciendo el debayer solo -->
    <remap from="camera_info" to="camera/camera_info"/>
    <remap from="image_rect" to="camera/image_rect"/>
    <param name="queue_size" value="50"/>
  </node>



</launch>

And, when doing CTRL+C, sometimes I get:

process[left-2]: started with pid [9082]
process[left_rectify-3]: started with pid [9114]
[ WARN] [1404829063.551903876]: Comparing 00b09d0100cd6e9e to 00b09d0100cd6e9e
[ INFO] [1404829063.568476815]: Found camera with GUID 00b09d0100cd6e9e
[ INFO] [1404829063.568623447]: camera model: Point Grey Research Firefly MV FMVU-03MTC
[ WARN] [1404829063.586293968]: [HQ] Bayer decoding in the driver is DEPRECATED; image_proc decoding preferred.
[ INFO] [1404829063.591556576]: [00b09d0100cd6e9e] opened: 640x480_mono8, 30 fps, 400 Mb/s
[ INFO] [1404829063.627000717]: [00b09d0100cd6e9e] has trigger support
[ERROR] [1404829063.628724634]: Unknown trigger source: source_0
[ERROR] [1404829063.629587108]: [00b09d0100cd6e9e] feature initialization failure
[ INFO] [1404829063.629676503]: [00b09d0100cd6e9e] closing device
[ WARN] [1404829063.811004478]: Comparing 00b09d0100cd6e9e to 00b09d0100cd6e9e
[ INFO] [1404829063.828208301]: Found camera with GUID 00b09d0100cd6e9e
[ INFO] [1404829063.828267154]: camera model: Point Grey Research Firefly MV FMVU-03MTC
[ WARN] [1404829063.844065052]: [HQ] Bayer decoding in the driver is DEPRECATED; image_proc decoding preferred.
[ INFO] [1404829063.848904539]: [00b09d0100cd6e9e] opened: 640x480_mono8, 30 fps, 400 Mb/s
[ INFO] [1404829063.889797976]: using default calibration URL
[ INFO] [1404829063.890063598]: camera calibration URL: file:///home/v01d/.ros/camera_info/00b09d0100cd6e9e.yaml
^C[left_rectify-3] killing on exit
[left-2] killing on exit
[camera_nodelet_manager-1] killing on exit
*** Error in `/opt/ros/hydro/lib/nodelet/nodelet': corrupted double-linked list: 0x00007f2394006010 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x73f8e)[0x7f23c5e67f8e]
/usr/lib/libc.so.6(+0x7988e)[0x7f23c5e6d88e]
/usr/lib/libc.so.6(+0x79b5a)[0x7f23c5e6db5a]
/usr/lib/libc.so.6(+0x7a6a8)[0x7f23c5e6e6a8]
/usr/lib/libdc1394.so.22(dc1394_usb_capture_stop+0xf3)[0x7f239fdaa603]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN10camera139410Camera13945closeEv+0x47)[0x7f23a5580dbd]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver11closeCameraEv+0x1c0)[0x7f23a5537ce8]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver8shutdownEv+0x18)[0x7f23a553a00a]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD1Ev+0x365)[0x7f23a55316b1]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD0Ev+0x18)[0x7f23a55317ea]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN12class_loader11ClassLoader16onPluginDeletionIN7nodelet7NodeletEEEvPT_+0x51)[0x7f23c79025a1]
/opt/ros/hydro/lib/nodelet/nodelet(_ZN5boost6detail12shared_countD1Ev+0x39)[0x406889]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN5boost20ptr_container_detail24reversible_ptr_containerINS0_10map_configIN7nodelet14ManagedNodeletESt3mapISsPvSt4lessISsESaISt4pairIKSsS6_EEELb1EEENS_20heap_clone_allocatorEED2Ev+0x4b)[0x7f23c7902f7b]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN7nodelet6LoaderD1Ev+0x19)[0x7f23c78fbf09]
/opt/ros/hydro/lib/nodelet/nodelet(main+0x17b)[0x404ddb]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f23c5e14000]
/opt/ros/hydro/lib/nodelet/nodelet[0x4060f4]
@jack-oquin
Copy link
Member

Are you using external triggering?

@protobits
Copy link
Author

Not in this case. I am using strobe output, but I don't that should matter.

On Wed, Jul 9, 2014 at 12:58 PM, jack-oquin [email protected]
wrote:

Are you using external triggering?


Reply to this email directly or view it on GitHub
#39 (comment)
.

@jack-oquin
Copy link
Member

Would you mind running the same test with the ros-hydro-camera1394 package?

@protobits
Copy link
Author

I am using hydro actually.
El 10/07/2014 18:17, "jack-oquin" [email protected] escribió:

Would you mind running the same test with the ros-hydro-camera1394
package?


Reply to this email directly or view it on GitHub
#39 (comment)
.

@jack-oquin
Copy link
Member

The backtrace implies you built it from source in the /home/v01d/coding/ros workspace. Not true? The messages look like they come from the unreleased master branch.

I wanted to know if the current released version (camera1394-1.9.5) has the same problem. One easy way to do that is:

sudo apt-get install ros-hydro-camera1394
source /opt/ros/hydro/setup.bash
roslaunch /your/camera/launch_file.launch

Or, you can check out the hydro-devel branch from the source tree and test with that.

@protobits
Copy link
Author

You are right, I didn't realize. I just tried the system installed version (1.9.4) and it also happens:

*** Error in `/opt/ros/hydro/lib/nodelet/nodelet': corrupted double-linked list: 0x00007f4910015570 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x73f8e)[0x7f494c369f8e]
/usr/lib/libc.so.6(+0x7988e)[0x7f494c36f88e]
/usr/lib/libc.so.6(+0x79b32)[0x7f494c36fb32]
/usr/lib/libc.so.6(+0x7a6a8)[0x7f494c3706a8]
/usr/lib/libdc1394.so.22(dc1394_usb_capture_stop+0xf3)[0x7f492e294603]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN10camera139410Camera13945closeEv+0x47)[0x7f4944414dbd]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver11closeCameraEv+0x1c0)[0x7f49443cbce8]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17camera1394_driver16Camera1394Driver8shutdownEv+0x18)[0x7f49443ce00a]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD1Ev+0x365)[0x7f49443c56b1]
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so(_ZN17Camera1394NodeletD0Ev+0x18)[0x7f49443c57ea]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN12class_loader11ClassLoader16onPluginDeletionIN7nodelet7NodeletEEEvPT_+0x51)[0x7f494de045a1]
/opt/ros/hydro/lib/nodelet/nodelet(_ZN5boost6detail12shared_countD1Ev+0x39)[0x406889]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN5boost20ptr_container_detail24reversible_ptr_containerINS0_10map_configIN7nodelet14ManagedNodeletESt3mapISsPvSt4lessISsESaISt4pairIKSsS6_EEELb1EEENS_20heap_clone_allocatorEED2Ev+0x4b)[0x7f494de04f7b]
/opt/ros/hydro/lib/libnodeletlib.so(_ZN7nodelet6LoaderD1Ev+0x19)[0x7f494ddfdf09]
/opt/ros/hydro/lib/nodelet/nodelet(main+0x17b)[0x404ddb]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f494c316000]
/opt/ros/hydro/lib/nodelet/nodelet[0x4060f4]
======= Memory map: ========
00400000-0040f000 r-xp 00000000 08:01 134101                             /opt/ros/hydro/lib/nodelet/nodelet
0060e000-0060f000 r--p 0000e000 08:01 134101                             /opt/ros/hydro/lib/nodelet/nodelet
0060f000-00610000 rw-p 0000f000 08:01 134101                             /opt/ros/hydro/lib/nodelet/nodelet
01c03000-01e97000 rw-p 00000000 00:00 0                                  [heap]
7f490b5e1000-7f490b5e6000 r-xp 00000000 08:01 3548859                    /usr/lib/libXdmcp.so.6.0.0
...

@jack-oquin
Copy link
Member

You are still running /home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so from your catkin overlay. It should say /opt/ros/hydro/lib/libcamera1394_nodelet.so.

@protobits
Copy link
Author

Sorry again. I have now removed everything from my workspace and it seems a
bit more robust however, after many starts and stops of the launch file I
get the same:

[left-2] killing on exit
[camera_nodelet_manager-1] killing on exit
*** Error in `/opt/ros/hydro/lib/nodelet/nodelet': corrupted double-linked
list: 0x00007ff52c005850 ***
shutting down processing monitor...

(no backtrace this time)

On Fri, Jul 11, 2014 at 4:06 PM, jack-oquin [email protected]
wrote:

You are still running
/home/v01d/coding/ros/devel/lib//libcamera1394_nodelet.so from your
catkin overlay. It should say /opt/ros/hydro/lib/libcamera1394_nodelet.so.


Reply to this email directly or view it on GitHub
#39 (comment)
.

@jack-oquin
Copy link
Member

This happens intermittently? Not every time?

It's always on exit, right?

@protobits
Copy link
Author

Yes (to all questions)

On Sat, Jul 12, 2014 at 8:38 PM, jack-oquin [email protected]
wrote:

This happens intermittently? Not every time?

It's always on exit, right?


Reply to this email directly or view it on GitHub
#39 (comment)
.

@jack-oquin
Copy link
Member

Hard to solve this without more information.

@protobits
Copy link
Author

Ok, maybe when I compile the driver from the git repo to address the offset
issue I can compile with debug info and see if I can get a backtrace.

On Sat, Jul 12, 2014 at 9:23 PM, jack-oquin [email protected]
wrote:

Hard to solve this without more information.


Reply to this email directly or view it on GitHub
#39 (comment)
.

@jack-oquin
Copy link
Member

Good idea. Thanks for helping.

@protobits
Copy link
Author

Hi,
using hydro-devel the segmentation fault also happens. I was not able to get a better backtrace since it is difficult to reproduce using gdb. However, if you look at the previous backtrace you have a list of symbols and it appears to indicate that is triggered within libdc1394. I suspect either a bug in the library, or a misuse of its functions. I would suspect something regarding deinitialization and a race condition which may trigger a camera de deinitialized twice or something like that.

@protobits
Copy link
Author

Hi, can you verify that the nodelet is correctly calling the destructors as expected?

I been having some difficulties using nodelets and it seems, after recent updates in several ROS core packages, things have improved. I added "std::cout" messages in destructors (ROS_INFO* does not work at such stage) to ensure my destructors are called and it now seems to be working much more robustly.

However, I have a roslaunch file which loads a nodelet manager where both the camera1394 nodelets (left and right) and my nodelets are loaded. My destructors were not being called whenever I CTRL+C'd. I now switched to using camera1394 nodes instead of nodelets and my destructors are being called correctly. It all seems to point that the nodelet manager dies due to camera1394 nodelets.

I see you have some ROS_INFO calls on destructors but as I said these will never be visible. Would you check if switching to std::cout makes the messages appear? Otherwise, I suspect something is not completely correct in how the nodelets are shut-down.

@protobits
Copy link
Author

PS: I still get the double free from time to time

@protobits
Copy link
Author

PS2: it is not just with nodelets. I switched to regular nodes since this crashing made the manager crash and so did all my own nodelets and they were not deinitialized correctly. Now this problem does not affect my nodelets but it is still present. I get the double free message sometimes at shutdown and some times it hangs there after this message and eventally ROS kills when it realizes it is not responding.

@jack-oquin
Copy link
Member

Knowing that it affects nodes, too, is helpful.

Is it possible to get a backtrace of that failure, as well?

@protobits
Copy link
Author

Hi,
I'm back at it. I've got a backtrace now. I don't have line numbers for some reason (it is hard to re-compile the system package with debug information enabled) but I think it should be useful:

#0 0x00007ffff594de94 in free () from /usr/lib/libc.so.6
#1 0x00000000004d2d06 in _gnu_cxx::new_allocator::deallocate(double, unsigned long) ()
#2 0x00000000004cf973 in gnu_cxx::alloc_traitsstd::allocator::deallocate(std::allocator&, double, unsigned long) ()
#3 0x00000000004ca9d4 in std::Vector_base<double, std::allocator >::M_deallocate(double*, unsigned long) ()
#4 0x00000000004c4a22 in std::Vector_base<double, std::allocator >::~Vector_base() ()
#5 0x00000000004bd9a6 in std::vector<double, std::allocator >::~vector() ()
#6 0x00000000004b7fe7 in sensor_msgs::CameraInfo_std::allocator::~CameraInfo
() ()
#7 0x00000000004cf9d7 in void boost::checked_delete<sensor_msgs::CameraInfo
<std::allocator > >(sensor_msgs::CameraInfo_std::allocator
) ()
#8 0x00000000004dcadc in boost::detail::sp_counted_impl_p<sensor_msgs::CameraInfo<std::allocator > >::dispose() ()
#9 0x000000000049895e in boost::detail::sp_counted_base::release() ()
#10 0x00000000004989ed in boost::detail::shared_count::~shared_count() ()
#11 0x00000000004b8058 in boost::shared_ptr<sensor_msgs::CameraInfo<std::allocator > >::~shared_ptr() ()
#12 0x00000000004bdc9a in void boost::shared_ptr<sensor_msgs::CameraInfo<std::allocator > >::reset<sensor_msgs::CameraInfo<std::allocator > >(sensor_msgs::CameraInfo_std::allocator) ()
#13 0x000000000049c2b5 in camera1394_driver::Camera1394Driver::publish(boost::shared_ptr<sensor_msgs::Image
<std::allocator > > const&) ()
#14 0x000000000049bf8f in camera1394_driver::Camera1394Driver::poll() ()
#15 0x0000000000497f56 in main ()

@protobits
Copy link
Author

PS: I would guess that the driver is being polled after deinitialization has happened, since this occurs when CTRL+C-ing the node.

@protobits
Copy link
Author

Maybe this is the offending line?

https://github.com/ros-drivers/camera1394/blob/master/src/nodes/driver1394.cpp#L395

It appears the close is called without locking the mutex and moreover the driver is closed before the CLOSED flag is set, therefore a call to publish() may be issued while the driver is being closed, right?

@jack-oquin
Copy link
Member

Sounds plausible, I'll take look at it.

Thanks for doing all this work!

@protobits
Copy link
Author

any news on this? I'm still suffering this crash very often

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants