Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect a known object #5

Open
nickswalker opened this issue Feb 4, 2020 · 14 comments · May be fixed by #17
Open

Detect a known object #5

nickswalker opened this issue Feb 4, 2020 · 14 comments · May be fixed by #17
Assignees

Comments

@nickswalker
Copy link
Member

Input: camera image
Output: bbox detection, or sufficient information such that object centroid can be estimated

For our pick and place milestone, it doesn't matter what object we can detect (preferably it's a YCB object in the set of RoboCup items). The goal is to have a working detection pipeline that we can evaluate end-to-end with manipulation.

@JHLee0513
Copy link

Update (Tue 5 2020):

  • Model runs relatively fast on jetson off the shelf (~20FPS, visually slow with -X on remote)
  • Objects classified as general household items e.g. book, bottle rather than the YCB label
  • Detection relatively unstable in tested experiment, some of the known objects are detected but are also often dependent on the overall pose of the object when seen by camera.

@nickswalker
Copy link
Member Author

From @JHLee0513's shared results:

Potential future work:
Finetune on YCB video dataset (256G???)
Direct regression of pose (PoseCNN, DeepIM → seems to work well, issue being different framework)
Investigation of segmentation networks (Take fast models trained on cityscape, etc, finetune on YCB video data as well)
Keypoint matching since all objects are known?? (Or similar traditional CV approach)

Screenshot` from 2020-02-06 12-17-24

This is better than I expected given that these weren't explicitly trained for. The bounding boxes do look funky though, like something is wrong with NMS or they're being drawn with a dimension skewed.

  • YCB video data looks promising. If it's easy to work with, go for it. Otherwise, let's work on getting a labeling pipeline and work with fewer images but from the robot's camera, in the lighting situation we care about, etc. It's okay to overfit as long as we can overfit with a fast turn around time on site.

  • If it's possible, we want a single pipeline for both standard (we know about them today) and "known" (we know about them during set up days) objects. It's not clear that something like PoseCNN could be reasonably fine-tuned to function for a new object class on a short timescale (even with more time, it seems like it'd be crazy expensive). The hardware requirements are also a barrier.

  • Would be interested to see how segmentation performs out of the box

  • "Keypoint matching won't work as well as fine tuning YOLO" seems to be the consensus amongst robocup teams.

@JHLee0513
Copy link

@nickswalker Do we have any storage/GPU solution for the YCB videos dataset to be handled? I could try using one of RSE lab machines, though I'd have to confirm its availability. (since the dataset is 265G...)

@nickswalker
Copy link
Member Author

I think the update from @csemecu is that she'll check if we can use one of the VR capstone's machines as a short term solution. We should discuss more during Monday's meeting

@JHLee0513
Copy link

@nickswalker As a followup to categories, should we include both categories from COCO and YCB for the final perception system? For quickly testing out the whole pipeline I will finetune only on YCB for now as to allow inspection based on objects we have.

@nickswalker
Copy link
Member Author

Most of the COCO classes are irrelevant for us, so no need to include them.

@JHLee0513
Copy link

Update (Fed 18): got model to start training, progress was delayed due to midterm :/ I will keep updating on its training speed, if trained the inference, etc ASAP

@nickswalker
Copy link
Member Author

Based on what @JHLee0513 has shown, we seem to be well above this bar now. Future work is in making sure we can quickly train in additional classes (labeling pipeline #7) and in connecting 2D and 3D perception (like what's happening for pick and place, and eventually for receptionist #13).

@nickswalker
Copy link
Member Author

Ah, but there's no code tracked for this anywhere. @JHLee0513 open a branch please.

@nickswalker nickswalker reopened this Mar 1, 2020
@JHLee0513
Copy link

branch opened here the code is currently under heavy modification (and not too familiar to integrating another repo inside as sub-module FYI)

@nickswalker
Copy link
Member Author

Let's discuss how handle packaging tomorrow

@nickswalker
Copy link
Member Author

We've put the detection python blob as a git submodule and set up a catkin package around it. The code isn't really in a usable state yet because it's unclear how to get any data out over ROS; the model is built up in pytorch, and requires Python 3. rospy is python2 only though, so we can't just open up publishers

@JHLee0513
Copy link

@nickswalker rospy in melodic seems to support python3 (not tested personally, though there are many straightforward blog/tutorials about it online), would it be possible to set up a publisher as normal if such is the case?

@nickswalker
Copy link
Member Author

Yes, as long as rospy is working should be good. Let's test that as soon as we can. We should also check that roslaunch and rosrun respect python3 shebangs and run the code as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants