Object Detection

Context: As of this year, URC Rules specify two new waypoints during the Autonomy mission that require the Rover to detect and navigate toward two objects placed on the ground.

The 2 objects will have GNSS coordinates within the vicinity of the objects (<10 m). Autonomous detection of the tools will be required. The first object will be an orange rubber mallet. The second object will be a standard 1 L wide-mouthed plastic water bottle of unspecified color/markings (approximately 21.5 cm tall by 9 cm diameter).

Currently, the perception system does not support detection of objects besides ARTags so we must experiment with and implement such a detection system. One way to do this is using a learning-based instance segmentation model such as YOLO to extract the mallet and waterbottle from the ZED camera feed.

Interface: (Subject to change)

Node: detect_objects

Subscribes: sensor_msgs/Image

Publishes:

Object.msg

string object_type
float32 detection_confidence
float32 distance

Rough Steps:

Run a pre-trained object detection model outside of ROS to confirm performance
Create a subscriber to the ZED point cloud topic
Write a function that takes the point cloud, compresses it into a 2D cv::Mat, and passes it into the model to detect the objects (see tag_detector.processing.cpp for an example of this)
Find the point that corresponds to the center of the object bounding box in the Point Cloud and assign that as the Object's distance from the rover
Create a publisher for the Object topic
Write a function that publishes the detected Objects from the Point Cloud message to the Object topic
Determine a more robust way of detecting the Object's distance from ZED. Picking the center point of the bounding box could often be NaN or located somewhere in open space
Revisit the image segmentation model and fine-tune with our own images if necessary

See: https://github.com/umrover/mrover-ros/tree/percep/obj-detect/src/perception/object_detector

We also need a node to put the detected object(s) into the TF tree so the rover can navigate toward them. This involves translating the tag from image_x, image_y pixel space to an SE3 pose relative to the rover. One possible way of doing this is determining the depth reading at image_x, image_y and navigating toward that point much like we do with ARTags. Another possible solution is to project the object into the ground plane and using the ground plane estimate and relative location of the rover to determine the pose of the object of interest.

Node: get_object_pose

Subscribes: Object.msg topic published by detect_objects

Either ZED depth or ground plane estimation

Publishes: SE3 pose to the TF tree

Note: This might have to run as a nodelet inside of the ZED wrapper nodelet configuration because passing images using ROS is costly.