Paper list for Embodied AI

HCPLab
Pengcheng Laboratory & SYSU HCP Lab

We appreciate any useful suggestions for improvement of this paper list or survey from peers. Please raise issues or send an email to [email protected] and [email protected]. Thanks for your cooperation! We also welcome your pull requests for this project!

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Yang Liu, Weixing Chen, Yongjie Bai, Xiaodan Liang, Guanbin Li, Wen Gao, Liang Lin

🏠 About

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation.

💥 Update Log

[2024.09.08] We are constantly updating the Dataset section!
[2024.08.31] We added the Datasets section and classified the useful projects!
[2024.08.19] To make readers focus on newest works, we have arranged papers in chronological order!
[2024.08.02] We regularly update the project weekly!
[2024.07.29] We have updated the project!
[2024.07.22] We have updated the paper list and other useful embodied projects!
[2024.07.10] We release the first version of the survey on Embodied AI PDF!
[2024.07.10] We release the first version of the paper list for Embodied AI. This page is continually updating!

Books & Surveys 🔝

Multimodal Large Models: The New Paradigm of Artificial General Intelligence, Publishing House of Electronics Industry (PHE), 2024
Yang Liu, Liang Lin
[Page]
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI, arXiv:2407.06886, 2024
Yang Liu, Weixing Chen, Yongjie Bai, Guanbin Li, Wen Gao, Liang Lin.
[Paper]
All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents, arXiv:2408.10899, 2024
Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang
[Paper][Project]
A Survey of Embodied Learning for Object-Centric Robotic Manipulation, arXiv:2408.11537, 2024
Ying Zheng, Lei Yao, Yuejiao Su, Yi Zhang, Yi Wang, Sicheng Zhao, Yiyi Zhang, Lap-Pui Chau
[Paper]
Teleoperation of Humanoid Robots: A Survey, IEEE Transactions on Robotics, 2024
Kourosh Darvish, Luigi Penco, Joao Ramos, Rafael Cisneros, Jerry Pratt, Eiichi Yoshida, Serena Ivaldi, Daniele Pucci.
[Paper]
A Survey on Vision-Language-Action Models for Embodied AI, arXiv:2405.14093, 2024
Yueen Ma, Zixing Song, Yuzheng Zhuang, Jianye Hao, Irwin King
[Paper]
Towards Generalist Robot Learning from Internet Video: A Survey, arXiv:2404.19664, 2024
McCarthy, Robert, Daniel CH Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, and Zhibin Li.
[Paper]
A Survey on Robotics with Foundation Models: toward Embodied AI, arXiv:2402.02385, 2024
Zhiyuan Xu, Kun Wu, Junjie Wen, Jinming Li, Ning Liu, Zhengping Che, and Jian Tang.
[Paper]
Toward general-purpose robots via foundation models: A survey and meta-analysis, Machines, 2023
Yafei Hu, Quanting Xie, Vidhi Jain, Jonathan Francis, Jay Patrikar, Nikhil Keetha, Seungchan Kim, Yaqi Xie, Tianyi Zhang, Shibo Zhao, Yu Quan Chong, Chen Wang, Katia Sycara, Matthew Johnson-Roberson, Dhruv Batra, Xiaolong Wang, Sebastian Scherer, Zsolt Kira, Fei Xia, Yonatan Bisk.
[Paper]
Deformable Object Manipulation in Caregiving Scenarios: A Review, Machines, 2023
Liman Wang, Jihong Zhu.
[[Paper]https://www.mdpi.com/2075-1702/11/11/1013]
A survey of embodied ai: From simulators to research tasks, IEEE Transactions on Emerging Topics in Computational Intelligence, 2022
Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan
[Paper]
The development of embodied cognition: Six lessons from babies, Artificial life, 2005
Linda Smith, Michael Gasser
[Paper]
Embodied artificial intelligence: Trends and challenges, Lecture notes in computer science, 2004
Rolf Pfeifer, Fumiya Iida
[Paper]

Embodied Simulators 🔝

General Simulator

Design and use paradigms for gazebo, an open-source multi-robot simulator, IROS, 2004
Koenig, Nathan, Andrew, Howard.
[page]
Nvidia isaac sim: Robotics simulation and synthetic data, NVIDIA, 2023
[page]
Aerial Gym -- Isaac Gym Simulator for Aerial Robots, ArXiv, 2023
Mihir Kulkarni and Theodor J. L. Forgaard and Kostas Alexis.
[paper]
Webots: open-source robot simulator, 2018
Cyberbotics
[page, code]
Unity: A general platform for intelligent agents, ArXiv, 2020
Juliani, Arthur, Vincent-Pierre, Berges, Ervin, Teng, Andrew, Cohen, Jonathan, Harper, Chris, Elion, Chris, Goy, Yuan, Gao, Hunter, Henry, Marwan, Mattar, Danny, Lange.
[page]
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles, Field and Service Robotics, 2017
Shital Shah, , Debadeepta Dey, Chris Lovett, Ashish Kapoor.
[page]
Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016
Coumans, Erwin, Yunfei, Bai.
[page]
V-REP: A versatile and scalable robot simulation framework, IROS, 2013
Rohmer, Eric, Surya PN, Singh, Marc, Freese.
[page]
MuJoCo: A physics engine for model-based control, IROS, 2012
Todorov, Emanuel, Tom, Erez, Yuval, Tassa.
[page, code]
Modular open robots simulation engine: Morse, ICRA, 2011
Echeverria, Gilberto and Lassabe, Nicolas and Degroote, Arnaud and Lemaignan, S{'e}verin
[page]

Real-Scene Based Simulators

ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI, arxiv, 2024
Stone Tao, Fanbo Xiang, Arth Shukla, Yuzhe Qin, Xander Hinrichsen, Xiaodi Yuan, Chen Bao, Xinsong Lin, Yulin Liu, Tse-kai Chan, Yuan Gao, Xuanlin Li, Tongzhou Mu, Nan Xiao, Arnav Gurha, Zhiao Huang, Roberto Calandra, Rui Chen, Shan Luo, Hao Su.
[page]
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI, arxiv, 2024
Yang, Yandan, Baoxiong, Jia, Peiyuan, Zhi, Siyuan, Huang.
[page]
Holodeck: Language Guided Generation of 3D Embodied AI Environments, CVPR, 2024
Yue Yang, , Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark.
[page]
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation, arXiv, 2023
Wang, Yufei, Zhou, Xian, Feng, Chen, Tsun-Hsuan, Wang, Yian, Wang, Katerina, Fragkiadaki, Zackory, Erickson, David, Held, Chuang, Gan.
[page]
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation, NeurIPS, 2022
Deitke, VanderBilt, Herrasti, Weihs, Salvador, Ehsani, Han, Kolve, Farhadi, Kembhavi, Mottaghi
[page]
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation, NeurIPS, 2021
Gan, Chuang, J., Schwartz, Seth, Alter, Martin, Schrimpf, James, Traer, JulianDe, Freitas, Jonas, Kubilius, Abhishek, Bhandwaldar, Nick, Haber, Megumi, Sano, Kuno, Kim, Elias, Wang, Damian, Mrowca, Michael, Lingelbach, Aidan, Curtis, KevinT., Feigelis, DavidM., Bear, Dan, Gutfreund, DavidD., Cox, JamesJ., DiCarlo, JoshH., McDermott, JoshuaB., Tenenbaum, Daniel, Yamins.
[page]
iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes, IROS, 2021
Shen, Bokui, Fei, Xia, Chengshu, Li, Roberto, Martín-Martín, Linxi, Fan, Guanzhi, Wang, Claudia, Pérez-D’Arpino, Shyamal, Buch, Sanjana, Srivastava, Lyne, Tchapmi, Micael, Tchapmi, Kent, Vainio, Josiah, Wong, Li, Fei-Fei, Silvio, Savarese.
[page]
SAPIEN: A SimulAted Part-Based Interactive ENvironment, CVPR, 2020
Xiang, Fanbo, Yuzhe, Qin, Kaichun, Mo, Yikuan, Xia, Hao, Zhu, Fangchen, Liu, Minghua, Liu, Hanxiao, Jiang, Yifu, Yuan, He, Wang, Li, Yi, Angel X., Chang, Leonidas J., Guibas, Hao, Su.
[page]
Habitat: A Platform for Embodied AI Research, ICCV, 2019
Savva, Manolis, Abhishek, Kadian, Oleksandr, Maksymets, Yili, Zhao, Erik, Wĳmans, Bhavana, Jain, Julian, Straub, Jia, Liu, Vladlen, Koltun, Jitendra, Malik, Devi, Parikh, Dhruv, Batra.
[page]
VirtualHome: Simulating Household Activities Via Programs, CVPR, 2018
Puig, Xavier, Kevin, Ra, Marko, Boben, Jiaman, Li, Tingwu, Wang, Sanja, Fidler, Antonio, Torralba.
[page]
Matterport3D: Learning from RGB-D Data in Indoor Environments, 3DV, 2017
Chang, Angel, Angela, Dai, Thomas, Funkhouser, Maciej, Halber, Matthias, Niebner, Manolis, Savva, Shuran, Song, Andy, Zeng, Yinda, Zhang.
[page]
AI2-THOR: An Interactive 3D Environment for Visual AI. arXiv, 2017
Kolve, Eric, Roozbeh, Mottaghi, Daniel, Gordon, Yuke, Zhu, Abhinav, Gupta, Ali, Farhadi.
[page]

Embodied Perception 🔝

Active Visual Exploration

AIR-Embodied: An Efficient Active 3DGS-based Interaction and Reconstruction Framework with Embodied Large Language Model, arxiv, 2024.
Zhenghao Qi, Shenghai Yuan, Fen Liu, Haozhi Cao, Tianchen Deng, Jianfei Yang, Lihua Xie.
[page]
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation, CVPR, 2024.
Yunhao Ge, Yihe Tang, Jiashu Xu, Cem Gokmen, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu.
[page]
Coarse-to-Fine Detection of Multiple Seams for Robotic Welding, arxiv, 2024.
Pengkun Wei, Shuo Cheng, Dayou Li, Ran Song, Yipeng Zhang, Wei Zhang.
[page]
Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception, CVPR, 2024.
Fan, Lei, Mingfu, Liang, Yunxuan, Li, Gang, Hua, Ying, Wu.
[page]
SpatialBot: Precise Spatial Understanding with Vision Language Models, arxiv, 2024.
Wenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao.
[page]
Embodied Uncertainty-Aware Object Segmentations, IROS, 2024.
Xiaolin Fang, Leslie Pack Kaelbling, Tom ́as Lozano-P ́erez.
[page]
Point Transformer V3: Simpler Faster Stronger, CVPR, 2024. Wu, Xiaoyang, Li, Jiang, Peng-Shuai, Wang, Zhijian, Liu, Xihui, Liu, Yu, Qiao, Wanli, Ouyang, Tong, He, Hengshuang, Zhao.
[page]
PointMamba: A Simple State Space Model for Point Cloud Analysis, arXiv, 2024.
Liang, Dingkang, Xin, Zhou, Xinyu, Wang, Xingkui, Zhu, Wei, Xu, Zhikang, Zou, Xiaoqing, Ye, Xiang, Bai.
[page]
Point Could Mamba: Point Cloud Learning via State Space Model, arXiv, 2024.
Zhang, Tao, Xiangtai, Li, Haobo, Yuan, Shunping, Ji, Shuicheng, Yan.
[page]
Mamba3d: Enhancing local features for 3d point cloud analysis via state space model, arXiv, 2024.
Han, Xu, Yuan, Tang, Zhaoxuan, Wang, Xianzhi, Li.
[page]
Gs-slam: Dense visual slam with 3d gaussian splatting, CVPR, 2024.
Yan, Chi, Delin, Qu, Dan, Xu, Bin, Zhao, Zhigang, Wang, Dong, Wang, Xuelong, Li.
[page]
GOReloc: Graph-based Object-Level Relocalization for Visual SLAM, IEEE RAL, 2024.
Yutong Wang, Chaoyang Jiang, Xieyuanli Chen.
[page]
Embodiedscan: A holistic multi-modal 3d perception suite towards embodied ai CVPR, 2024.
Wang, Tai, Xiaohan, Mao, Chenming, Zhu, Runsen, Xu, Ruiyuan, Lyu, Peisen, Li, Xiao, Chen, Wenwei, Zhang, Kai, Chen, Tianfan, Xue, others.
[page]
Neu-nbv: Next best view planning using uncertainty estimation in image-based neural rendering, IROS, 2023.
Jin, Liren, Xieyuanli, Chen, Julius, Rückin, Marija, Popovi'c.
[page]
Off-policy evaluation with online adaptation for robot exploration in challenging environments, IEEE Robotics and Automation Letters, 2023.
Hu, Yafei, Junyi, Geng, Chen, Wang, John, Keller, Sebastian, Scherer.
[page]
OVD-SLAM: An online visual SLAM for dynamic environments, IEEE Sensors Journal, 2023.
He, Jiaming, Mingrui, Li, Yangyang, Wang, Hongyu, Wang.
[page]
Transferring implicit knowledge of non-visual object properties across heterogeneous robot morphologies, ICRA, 2023.
Tatiya, Gyan, Jonathan, Francis, Jivko, Sinapov.
[page]
Swin3d: A pretrained transformer backbone for 3d indoor scene understanding, arXiv, 2023.
Yang, Yu-Qi, Yu-Xiao, Guo, Jian-Yu, Xiong, Yang, Liu, Hao, Pan, Peng-Shuai, Wang, Xin, Tong, Baining, Guo.
[page]
Point transformer v2: Grouped vector attention and partition-based pooling, NeurIPS, 2022.
Wu, Xiaoyang, Yixing, Lao, Li, Jiang, Xihui, Liu, Hengshuang, Zhao.
[page]
Rethinking network design and local geometry in point cloud: A simple residual MLP framework, arXiv, 2022. Ma, Xu, Can, Qin, Haoxuan, You, Haoxi, Ran, Yun, Fu. [page]
So-slam: Semantic object slam with scale proportional and symmetrical texture constraints. IEEE Robotics and Automation Letters 7. 2(2022): 4008–4015.
Liao, Ziwei, Yutong, Hu, Jiadong, Zhang, Xianyu, Qi, Xiaoyu, Zhang, Wei, Wang.
[page]
SG-SLAM: A real-time RGB-D visual SLAM toward dynamic scenes with semantic and geometric information, IEEE Transactions on Instrumentation and Measurement 72. (2022): 1–12.
Cheng, Shuhong, Changhe, Sun, Shĳun, Zhang, Dianfan, Zhang.
[page]
Point transformer, ICCV, 2021. Zhao, Hengshuang, Li, Jiang, Jiaya, Jia, Philip HS, Torr, Vladlen, Koltun.
[page]
Pointpillars: Fast encoders for object detection from point clouds, CVPR, 2019.
Lang, Alex H, Sourabh, Vora, Holger, Caesar, Lubing, Zhou, Jiong, Yang, Oscar, Beijbom.
[page]
4d spatio-temporal convnets: Minkowski convolutional neural networks, CVPR, 2019.
Choy, Christopher, JunYoung, Gwak, Silvio, Savarese.
[page]
Cubeslam: Monocular 3-d object slam, IEEE T-RO 35. 4(2019): 925–938
Yang, Shichao, Sebastian, Scherer.
[page]
Hierarchical topic model based object association for semantic SLAM, IEEE T-VCG 25. 11(2019): 3052–3062
Zhang, Jianhua, Mengping, Gui, Qichao, Wang, Ruyu, Liu, Junzhe, Xu, Shengyong, Chen.
[page]
DS-SLAM: A semantic visual SLAM towards dynamic environments, IROS, 2018
Yu, Chao, Zuxin, Liu, Xin-Jun, Liu, Fugui, Xie, Yi, Yang, Qi, Wei, Qiao, Fei.
[page]
DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes, IEEE Robotics and Automation Letters 3. 4(2018): 4076–4083
Bescos, Berta, José M, Facil, Javier, Civera, José, Neira.
[page]
Quadricslam: Dual quadrics from object detections as landmarks in object-oriented slam, IEEE Robotics and Automation Letters 4. 1(2018): 1–8.
Nicholson, Lachlan, Michael, Milford, Niko, Sünderhauf.
[page]
3d semantic segmentation with submanifold sparse convolutional networks, CVPR, 2018.
Graham, Benjamin, Martin, Engelcke, Laurens, Van Der Maaten.
[page]
Learning to look around: Intelligently exploring unseen environments for unknown tasks, CVPR, 2018.
Jayaraman, Dinesh, Kristen, Grauman.
[page]
Multi-view 3d object detection network for autonomous driving, CVPR, 2017.
Chen, Xiaozhi, Huimin, Ma, Ji, Wan, Bo, Li, Tian, Xia.
[page]
Semantic scene completion from a single depth image, CVPR, 2017.
Song, Shuran, Fisher, Yu, Andy, Zeng, Angel X, Chang, Manolis, Savva, Thomas, Funkhouser.
[page]
Pointnet: Deep learning on point sets for 3d classification and segmentation, CVPR, 2017.
Qi, Charles R, Hao, Su, Kaichun, Mo, Leonidas J, Guibas.
[[page](Pointnet: Deep learning on point sets for 3d classification and segmentation)]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space, NeurIPS, 2017.
Qi, Charles Ruizhongtai, Li, Yi, Hao, Su, Leonidas J, Guibas.
[page]
The curious robot: Learning visual representations via physical interactions, ECCV, 2016.
Pinto, Lerrel, Dhiraj, Gandhi, Yuanfeng, Han, Yong-Lae, Park, Abhinav, Gupta.
[page]
Multi-view convolutional neural networks for 3d shape recognition, ICCV, 2015.
Su, Hang, Subhransu, Maji, Evangelos, Kalogerakis, Erik, Learned-Miller.
[page]
Voxnet: A 3d convolutional neural network for real-time object recognition, IROS, 2015.
Maturana, Daniel, Sebastian, Scherer.
[page]
ORB-SLAM: a versatile and accurate monocular SLAM system IEEE T-RO 31. 5(2015): 1147–1163
Mur-Artal, Raul, Jose Maria Martinez, Montiel, Juan D, Tardos.
[page]
LSD-SLAM: Large-scale direct monocular SLAM, ECCV, 2014
Engel, Jakob, Thomas, Schops, Daniel, Cremers.
[page]
Slam++: Simultaneous localisation and mapping at the level of objects, CVPR, 2013
Salas-Moreno, Renato F, Richard A, Newcombe, Hauke, Strasdat, Paul HJ, Kelly, Andrew J, Davison.
[page]
DTAM: Dense tracking and mapping in real-time, ICCV, 2011
Newcombe, Richard A, Steven J, Lovegrove, Andrew J, Davison.
[page]
MonoSLAM: Real-time single camera SLAM, IEEE T-PAMI, 2007.
Davison, Andrew J, Ian D, Reid, Nicholas D, Molton, Olivier, Stasse.
[page]
A multi-state constraint Kalman filter for vision-aided inertial navigation, IROS, 2007
Mourikis, Anastasios I, Stergios I, Roumeliotis.
[page]
Parallel tracking and mapping for small AR workspaces, ISMAR, 2007
Klein, Georg, David, Murray.
[page]

3D Visual Perception and Grounding

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding, arxiv, 2024
Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao, Xuelong Li
[page]
EmbodiedSAM: Online Segment Any 3D Thing in Real Time, arxiv, 2024
Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu
[page]
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding, arxiv, 2024
Youjun Zhao, Jiaying Lin, Shuquan Ye, Qianshi Pang, Rynson W.H. Lau
[page]
LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image, arxiv, 2024
Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding
[page]
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations, arxiv, 2024
Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang
[page]
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction, arxiv, 2024
Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, He Wang, Li Yi, Kaisheng Ma
[page])
LEO: An Embodied Generalist Agent in 3D World, ICML, 2024
Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, and Siyuan Huang
[page]
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding, ECCV, 2024
Baoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, and Siyuan Huang
[page]
PQ3D: Unifying 3D Vision-Language Understanding via Promptable Queries, ECCV, 2024
Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, and Qing Li
[page]
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World, CVPR, 2024
Yining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li, Chuang Gan
[page]
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception, CVPR, 2024
Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao
[page]
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation, CVPR, 2024
Mi Yan, Jiazhao Zhang, Yan Zhu, He Wang
[page]
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding, CVPR, 2024
Yun Liu, Haolin Yang, Xu Si, Ling Liu, Zipeng Li, Yuxiang Zhang, Yebin Liu, Li Yi
[page]
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding, CVPR, 2023
Wu, Yanmin and Cheng, Xinhua and Zhang, Renrui and Cheng, Zesen and Zhang, Jian
[page]
3d-vista: Pre-trained transformer for 3d vision and text alignment, ICCV, 2023
Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, and Qing Li
[page]
LeaF: Learning Frames for 4D Point Cloud Sequence Understanding, ICCV, 2023
Yunze Liu, Junyu Chen, Zekai Zhang, Li Yi
[page]
SQA3D: Situated Question Answering in 3D Scenes, ICLR, 2023
Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, and Siyuan Huang
[page])
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent, arXix, 2023
Yang, Jianing and Chen, Xuweiyi and Qian, Shengyi and Madaan, Nikhil and Iyengar, Madhavan and Fouhey, David F and Chai, Joyce
[page]
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding, arXix, 2023
Yuan, Zhihao and Ren, Jinke and Feng, Chun-Mei and Zhao, Hengshuang and Cui, Shuguang and Li, Zhen
[page]
Multi-view transformer for 3D visual grounding, CVPR, 2022
Huang, Shijia and Chen, Yilun and Jia, Jiaya and Wang, Liwei
[page]
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding, CVPR, 2022
Bakr, Eslam and Alsaedy, Yasmeen and Elhoseiny, Mohamed
[page]
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection, CVPR, 2022
Luo, Junyu and Fu, Jiahui and Kong, Xianghao and Gao, Chen and Ren, Haibing and Shen, Hao and Xia, Huaxia and Liu, Si
[page]
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds, ECCV, 2022
Jain, Ayush and Gkanatsios, Nikolaos and Mediratta, Ishita and Fragkiadaki, Katerina
[page]
Text-guided graph neural networks for referring 3D instance segmentation, AAAI, 2021
Huang, Pin-Hao and Lee, Han-Hung and Chen, Hwann-Tzong and Liu, Tyng-Luh
[page]
InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring, ICCV, 2021
Yuan, Zhihao and Yan, Xu and Liao, Yinghong and Zhang, Ruimao and Wang, Sheng and Li, Zhen and Cui, Shuguang
[page]
Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud, CVPR, 2021
Feng, Mingtao and Li, Zhen and Li, Qi and Zhang, Liang and Zhang, XiangDong and Zhu, Guangming and Zhang, Hui and Wang, Yaonan and Mian, Ajmal
[page]
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, CVPR, 2021
Yang, Zhengyuan and Zhang, Songyang and Wang, Liwei and Luo, Jiebo
[page]
LanguageRefer: Spatiallanguage model for 3D visual grounding, CVPR, 2021
Roh, Junha and Desingh, Karthik and Farhadi, Ali and Fox, Dieter
[page]
3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds, ICCV, 2021
Zhao, Lichen and Cai, Daigang and Sheng, Lu and Xu, Dong
[page]
TransRefer3D: Entity-and-relation aware transformer for fine-grained 3D visual grounding, CVPR, 2021
He, Dailan and Zhao, Yusheng and Luo, Junyu and Hui, Tianrui and Huang, Shaofei and Zhang, Aixi and Liu, Si [page]
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language, ECCV, 2020
Chen, Dave Zhenyu and Chang, Angel X and Nie{\ss}ner, Matthias
[page]
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes, ECCV, 2020
Achlioptas, Panos and Abdelreheem, Ahmed and Xia, Fei and Elhoseiny, Mohamed and Guibas, Leonidas
[page]

Visual Language Navigation

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation, ACL, 2024.
Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong.
[page]
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning, ArXiv, 2024.
Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang.
[page]
OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robot in Dynamic Environments via State Space Model, ArXiv, 2024.
Junming Wang, Dong Huang, Xiuxian Guan, Zekai Sun, Tianxiang Shen, Fangming Liu, Heming Cui.
[page]
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving, ArXiv, 2024.
Hidehisa Arai, Keita Miwa, Kento Sasaki, Yu Yamaguchi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto.
[page]
FLAME: Learning to Navigate with Multimodal LLM in Urban Environments, ArXiv, 2024.
Yunzhe Xu, Yiyuan Pan, Zhe Liu, Hesheng Wang.
[page]
Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation, ArXiv, 2024.
Jiaqi Chen, Bingqian Lin, Xinmin Liu, Xiaodan Liang, Kwan-Yee K Wong.
[page]
Embodied Instruction Following in Unknown Environments, ArXiv, 2024.
Wu, Wang, Xu, Lu, Yan.
[page]
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control, arxiv, 2024.
Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu.
[page]
NOLO: Navigate Only Look Once, arxiv, 2024.
Bohan Zhou, Jiangxing Wang, Zongqing Lu.
[page]
Towards Learning a Generalist Model for Embodied Navigation, CVPR, 2024.
Duo Zheng, , Shijia Huang, Lin Zhao, Yiwu Zhong, Liwei Wang.
[page]
Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation ICML, 2024.
Junyu Gao, , Xuan Yao, Changsheng Xu.
[page]
Discuss before moving: Visual language navigation via multi-expert discussions, ICRA, 2024.
Long, Yuxing, Xiaoqi, Li, Wenzhe, Cai, Hao, Dong.
[page]
Vision-and-Language Navigation via Causal Learning, CVPR, 2024.
Liuyi Wang, Qijun Chen.
[page]
Volumetric Environment Representation for Vision-Language Navigation, CVPR, 2024.
Rui Liu, Yi Yang.
[page]
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation, CVPR 2024.
Wang, Zihan, Xiangyang, Li, Jiahao, Yang, Yeqi, Liu, Junjie, Hu, Ming, Jiang, Shuqiang, Jiang. [page]
Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill ICRA, 2024.
Wenzhe Cai, Siyuan Huang, Guangran Cheng, Yuxing Long, Peng Gao, Changyin Sun, and Hao Dong.
[page]
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation, CVPR, 2024.
Ganlong Zhao, Guanbin Li, Weikai Chen, Yizhou Yu.
[page]
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation, CVPR, 2024.
Zeyuan Yang, Jiageng Liu, Peihao Chen, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan.
[page]
Towards Learning a Generalist Model for Embodied Navigation, CVPR, 2024.
Duo Zheng, Shijia Huang, Lin Zhao, Yiwu Zhong, Liwei Wang.
[page]
Vision-and-Language Navigation via Causal Learning, CVPR, 2024.
Liuyi Wang, Zongtao He, Ronghao Dang, Mengjiao Shen, Chengju Liu, Qijun Chen.
[page]
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation, CVPR, 2024.
Xiaohan Lei, Min Wang, Wengang Zhou, Li Li, Houqiang Li.
[page]
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation, CVPR, 2024.
Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, Manolis Savva.
[page]
SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System, CVPR, 2024.
Yunfei Fan, Tianyu Zhao, Guidong Wang.
[page]
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World, CVPR, 2024.
Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi.
[page]
Volumetric Environment Representation for Vision-Language Navigation, CVPR, 2024.
Rui Liu, Wenguan Wang, Yi Yang.
[page]
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation, CVPR, 2024.
Xiaohan Wang, Yuehu Liu, Xinhang Song, Yuyi Liu, Sixian Zhang, Shuqiang Jiang.
[page]
An Interactive Navigation Method with Effect-oriented Affordance, CVPR, 2024.
Xiaohan Wang, Yuehu Liu, Xinhang Song, Yuyi Liu, Sixian Zhang, Shuqiang Jiang.
[page]
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation, CVPR, 2024.
Sixian Zhang, Xinyao Yu, Xinhang Song, Xiaohan Wang, Shuqiang Jiang.
[page]
MemoNav: Working Memory Model for Visual Navigation, CVPR, 2024.
Hongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang.
[page]
Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy, CVPR, 2024.
Gengyu Zhang, Hao Tang, Yan Yan.
[page]
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation, CVPR, 2024.
Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang.
[page]
SPIN: Simultaneous Perception Interaction and Navigation, CVPR, 2024.
Shagun Uppal, Ananye Agarwal, Haoyu Xiong, Kenneth Shaw, Deepak Pathak.
[page]
Correctable Landmark Discovery via Large Models for Vision-Language Navigation, TPAMI, 2024.
Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang.
[page]
ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments, IEEE T-PAMI, 2024.
An, Dong, Hanqing, Wang, Wenguan, Wang, Zun, Wang, Yan, Huang, Keji, He, Liang, Wang. [page]
NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation, RSS, 2024.
Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang.
[page]
March in Chat: Interactive Prompting for Remote Embodied Referring Expression, ICCV, 2023.
Qiao, Yanyuan, Yuankai, Qi, Zheng, Yu, Jing, Liu, Qi, Wu.
[page]
Multi-level compositional reasoning for interactive instruction following, AAAI, 2023.
Bhambri, Suvaansh, Byeonghwi, Kim, Jonghyun, Choi.
[page]
Vision and Language Navigation in the Real World via Online Visual Language Mapping, ArXiv, 2023.
Chengguang Xu, , Hieu T. Nguyen, Christopher Amato, Lawson L.S. Wong. [page]
Towards Deviation-robust Agent Navigation via Perturbation-aware Contrastive Learning, TPAMI, 2023.
Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang , Qixiang Ye, Liang Lin.
[page]
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation, NIPS, 2023.
Wang, Chen, Li, Wu, Dong.
[page]
HomeRobot: Open-Vocabulary Mobile Manipulation, NIPS, 2023.
Yenamandra, Sriram, Arun, Ramachandran, Karmesh, Yadav, Austin, Wang, Mukul, Khanna, Theophile, Gervet, Tsung-Yen, Yang, Vidhi, Jain, AlexanderWilliam, Clegg, John, Turner, Zsolt, Kira, Manolis, Savva, Angel, Chang, DevendraSingh, Chaplot, Dhruv, Batra, Roozbeh, Mottaghi, Yonatan, Bisk, Chris, Paxton.
[page]
Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation, Conference on Robot Learning. 2023.
Li, Chengshu, Ruohan, Zhang, Josiah, Wong, Cem, Gokmen, Sanjana, Srivastava, Roberto, Mart\in-Mart'\in, Chen, Wang, Gabrael, Levine, Michael, Lingelbach, Jiankai, Sun, others.
[page]
DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following, arXiv, 2022.
Gao, Xiaofeng, Qiaozi, Gao, Ran, Gong, Kaixiang, Lin, Govind, Thattai, GauravS., Sukhatme.
[page]
HOP: History-and-Order Aware Pretraining for Vision-and-Language Navigation, CVPR, 2022.
Qiao, Yanyuan, Yuankai, Qi, Yicong, Hong, Zheng, Yu, Peng, Wang, Qi, Wu.
[page]
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation, CVPR, 2022.
Hong, Yicong, Zun, Wang, Qi, Wu, Stephen, Gould.
[page]
FILM: Following Instructions in Language with Modular Methods, ICLR, 2022.
So Yeon Min, , Devendra Singh Chaplot, Pradeep Kumar Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov.
[page]
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, Conference on Robot Learning. 2022.
Dhruv Shah, , Blazej Osinski, Brian Ichter, Sergey Levine.
[page]
SOON: Scenario Oriented Object Navigation with Graph-based Exploration, CVPR, 2021.
Zhu, Fengda, Xiwen, Liang, Yi, Zhu, Qizhi, Yu, Xiaojun, Chang, Xiaodan, Liang.
[page]
Vision-Language Navigation Policy Learning and Adaptation, IEEE T-PAMI 43. 12(2021): 4205-4216.
Wang, Xin, Qiuyuan, Huang, Asli, Celikyilmaz, Jianfeng, Gao, Dinghan, Shen, Yuan-Fang, Wang, William Yang, Wang, Lei, Zhang.
[page]
Neighbor-view enhanced model for vision and language navigation, MM, 2021.
An, Dong, Yuankai, Qi, Yan, Huang, Qi, Wu, Liang, Wang, Tieniu, Tan.
[page]
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments, ECCV, 2020.
Krantz, Jacob and Wijmans, Erik and Majumdar, Arjun and Batra, Dhruv and Lee, Stefan.
[page]
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments, CVPR, 2020.
Qi, Yuankai, Qi, Wu, Peter, Anderson, Xin, Wang, William Yang, Wang, Chunhua, Shen, Anton, Hengel.
[page]
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks, CVPR, 2020.
Shridhar, Mohit, Jesse, Thomason, Daniel, Gordon, Yonatan, Bisk, Winson, Han, Roozbeh, Mottaghi, Luke, Zettlemoyer, Dieter, Fox.
[page]
Vision-and-dialog navigation, Conference on Robot Learning. 2020.
Thomason, Jesse, Michael, Murray, Maya, Cakmak, Luke, Zettlemoyer.
[page]
Language and visual entity relationship graph for agent navigation, NeurIPS, 2020.
Hong, Yicong, Cristian, Rodriguez, Yuankai, Qi, Qi, Wu, Stephen, Gould.
[page]
Language-Guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning, IEEE T-CSVT 31. (2020): 3469-3481.
Weixia Zhang, , Chao Ma, Qi Wu, Xiaokang Yang.
[page]
Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation, ACL, 2019.
Jain, Vihan, Gabriel, Magalhaes, Alexander, Ku, Ashish, Vaswani, Eugene, Ie, Jason, Baldridge.
[page]
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments, CVPR, 2019.
Chen, Howard, Alane, Suhr, Dipendra, Misra, Noah, Snavely, Yoav, Artzi.
[page]
Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments, CVPR, 2018.
Anderson, Peter, Qi, Wu, Damien, Teney, Jake, Bruce, Mark, Johnson, Niko, Sunderhauf, Ian, Reid, Stephen, Gould, Anton, Hengel.
[page]
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation, ECCV, 2018.
Xin Eric Wang, , Wenhan Xiong, Hongmin Wang, William Yang Wang.
[page]

Non-Visual Perception: Tactile

When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors from the Signal Processing Perspective, Arxiv, 2024.
Li, Shoujie and Wang, Zihan and Wu, Changsheng and Li, Xiang and Luo, Shan and Fang, Bin and Sun, Fuchun and Zhang, Xiao-Ping and Ding, Wenbo.
[page]
Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing, RA-L, 2024.
Yun Liu, Xiaomeng Xu, Weihang Chen, Haocheng Yuan, He Wang, Jing Xu, Rui Chen, Li Yi.
[page]
Learning visuotactile skills with two multifingered hands, ArXiv, 2024.
Lin, Toru and Zhang, Yu and Li, Qiyang and Qi, Haozhi and Yi, Brent and Levine, Sergey and Malik, Jitendra.
[page]
Binding touch to everything: Learning unified multimodal tactile representations, CVPR, 2024.
Yang, Fengyu and Feng, Chao and Chen, Ziyang and Park, Hyoungseob and Wang, Daniel and Dou, Yiming and Zeng, Ziyao and Chen, Xien and Gangopadhyay, Rit and Owens, Andrew and others.
[page]
Bioinspired sensors and applications in intelligent robots: a review, Robotic Intelligence and Automation, 2024.
Zhou, Yanmin and Yan, Zheng and Yang, Ye and Wang, Zhipeng and Lu, Ping and Yuan, Philip F and He, Bin.
[page]
Give Me a Sign: Using Data Gloves for Static Hand-Shape Recognition, Sensors, 2023.
Achenbach, Philipp and Laux, Sebastian and Purdack, Dennis and Müller, Philipp Niklas and Göbel, Stefan.
[page]
Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition, IEEE Transactions on Image Processing, 2021.
Liu, Yang and Wang, Keze and Li, Guanbin and Lin, Liang.
[page]
Hand movements: A window into haptic object recognition, Cognitive psychology, 1987.
Lederman, Susan J and Klatzky, Roberta L.
[page]
Force and tactile sensing, Springer Handbook of Robotics, 2016.
Cutkosky, Mark R and Howe, Robert D and Provancher, William R.
[page]
Haptic perception: A tutorial, Attention, Perception, & Psychophysics, 2009.
Lederman, Susan J and Klatzky, Roberta L.
[page]
Flexible tactile sensing based on piezoresistive composites: A review, Sensors, 2014.
Stassi, Stefano and Cauda, Valentina and Canavese, Giancarlo and Pirri, Candido Fabrizio.
[page]
Tactile sensing in dexterous robot hands, Robotics and Autonomous Systems, 2015.
Kappassov, Zhanat and Corrales, Juan-Antonio and Perdereau, Véronique.
[page]
GelLink: A Compact Multi-phalanx Finger with Vision-based Tactile Sensing and Proprioception, arXiv, 2024.
Ma, Yuxiang and Adelson, Edward.
[page]
A Touch, Vision, and Language Dataset for Multimodal Alignment, ArXiv, 2024.
Fu, Letian and Datta, Gaurav and Huang, Huang and Panitch, William Chung-Ho and Drake, Jaimyn and Ortiz, Joseph and Mukadam, Mustafa and Lambeta, Mike and Calandra, Roberto and Goldberg, Ken.
[page]
Large-scale actionless video pre-training via discrete diffusion for efficient policy learning, ArXiv, 2024.
He, Haoran and Bai, Chenjia and Pan, Ling and Zhang, Weinan and Zhao, Bin and Li, Xuelong.
[page]
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces, ArXiv, 2024.
Comi, Mauro and Tonioni, Alessio and Yang, Max and Tremblay, Jonathan and Blukis, Valts and Lin, Yijiong and Lepora, Nathan F and Aitchison, Laurence.
[page]
Tactile-augmented radiance fields, CVPR, 2024.
Dou, Yiming and Yang, Fengyu and Liu, Yi and Loquercio, Antonio and Owens, Andrew.
[page]
AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch, ArXiv, 2024.
Yang, Max and Lu, Chenghua and Church, Alex and Lin, Yijiong and Ford, Chris and Li, Haoran and Psomopoulou, Efi and Barton, David AW and Lepora, Nathan F.
[page]
Feature-level Sim2Real Regression of Tactile Images for Robot Manipulation, ICRA ViTac, 2024.
Duan, Boyi and Qian, Kun and Zhao, Yongqiang and Zhang, Dongyuan and Luo, Shan.
[page]
MAE4GM: Visuo-Tactile Learning for Property Estimation of Granular Material using Multimodal Autoencoder,ICRA ViTac, 2024.
Zhang, Zeqing and Zheng, Guangze and Ji, Xuebo and Chen, Guanqi and Jia, Ruixing and Chen, Wentao and Chen, Guanhua and Zhang, Liangjun and Pan, Jia.
[page]
Octopi: Object Property Reasoning with Large Tactile-Language Models, arXiv preprint arXiv:2405.02794, 2024.
Yu, Samson and Lin, Kelvin and Xiao, Anxing and Duan, Jiafei and Soh, Harold.
[page]
9dtact: A compact vision-based tactile sensor for accurate 3D shape reconstruction and generalizable 6D force estimation, IEEE Robotics and Automation Letters, 2023.
Lin, Changyi and Zhang, Han and Xu, Jikai and Wu, Lei and Xu, Huazhe.
[page]
Allsight: A low-cost and high-resolution round tactile sensor with zero-shot learning capability, IEEE Robotics and Automation Letters, 2023.
Azulay, Osher and Curtis, Nimrod and Sokolovsky, Rotem and Levitski, Guy and Slomovik, Daniel and Lilling, Guy and Sintov, Avishai.
[page]
Vistac towards a unified multi-modal sensing finger for robotic manipulation, IEEE Sensors Journal, 2023.
Athar, Sheeraz and Patel, Gaurav and Xu, Zhengtong and Qiu, Qiang and She, Yu.
[page]
Midastouch: Monte-carlo inference over distributions across sliding touch, CoRL, 2023.
Suresh, Sudharshan and Si, Zilin and Anderson, Stuart and Kaess, Michael and Mukadam, Mustafa.
[page]
The objectfolder benchmark: Multisensory learning with neural and real objects, CVPR, 2023.
Gao, Ruohan and Dou, Yiming and Li, Hao and Agarwal, Tanmay and Bohg, Jeannette and Li, Yunzhu and Fei-Fei, Li and Wu, Jiajun. [page]
Imagebind: One embedding space to bind them all, CVPR, 2023.
Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan.
[page]
Touching a nerf: Leveraging neural radiance fields for tactile sensory data generation, Conference on Robot Learning, pp. 1618-1628, 2023.
Zhong, Shaohong and Albini, Alessandro and Jones, Oiwi Parker and Maiolino, Perla and Posner, Ingmar.
[page]
Learning to read braille: Bridging the tactile reality gap with diffusion models, ArXiv, 2023.
Higuera, Carolina and Boots, Byron and Mukadam, Mustafa.
[page]
Generating visual scenes from touch, CVPR, 2023.
Yang, Fengyu and Zhang, Jiacheng and Owens, Andrew.
[page]
Dtact: A vision-based tactile sensor that measures high-resolution 3D geometry directly from darkness, ICRA, 2023.
Lin, Changyi and Lin, Ziqi and Wang, Shaoxiong and Xu, Huazhe.
[page]
In-hand pose estimation using hand-mounted RGB cameras and visuotactile sensors, IEEE Access, 2023.
Gao, Yuan and Matsuoka, Shogo and Wan, Weiwei and Kiyokawa, Takuya and Koyama, Keisuke and Harada, Kensuke.
[page]
Collision-aware in-hand 6D object pose estimation using multiple vision-based tactile sensors, ICRA, 2023.
Caddeo, Gabriele M and Piga, Nicola A and Bottarel, Fabrizio and Natale, Lorenzo.
[page]
Implicit neural representation for 3D shape reconstruction using vision-based tactile sensing, ArXiv, 2023.
Comi, Mauro and Church, Alex and Li, Kejie and Aitchison, Laurence and Lepora, Nathan F.
[page]
Sliding touch-based exploration for modeling unknown object shape with multi-fingered hands, IROS, 2023.
Chen, Yiting and Tekden, Ahmet Ercan and Deisenroth, Marc Peter and Bekiroglu, Yasemin.
[page]
General In-hand Object Rotation with Vision and Touch, CoRL, 2023.
Qi, Haozhi and Yi, Brent and Suresh, Sudharshan and Lambeta, Mike and Ma, Yi and Calandra, Roberto and Malik, Jitendra.
[page]
Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing, IEEE Robotics and Automation Letters, 2023.
Yang, Max and Lin, Yijiong and Church, Alex and Lloyd, John and Zhang, Dandan and Barton, David AW and Lepora, Nathan F.
[page]
Unsupervised adversarial domain adaptation for sim-to-real transfer of tactile images, IEEE Transactions on Instrumentation and Measurement, 2023.
Jing, Xingshuo and Qian, Kun and Jianu, Tudor and Luo, Shan.
[page]
Learn from incomplete tactile data: Tactile representation learning with masked autoencoders, IROS, 2023.
Cao, Guanqun and Jiang, Jiaqi and Bollegala, Danushka and Luo, Shan.
[page]
Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play, ArXiv, 2023.
Guzey, Irmak and Evans, Ben and Chintala, Soumith and Pinto, Lerrel.
[page]
Gelslim 3.0: High-resolution measurement of shape, force and slip in a compact tactile-sensing finger, ICRA, 2022.
Taylor, Ian H and Dong, Siyuan and Rodriguez, Alberto.
[page]
Tacto: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors, IEEE Robotics and Automation Letters, 2022.
Wang, Shaoxiong and Lambeta, Mike and Chou, Po-Wei and Calandra, Roberto.
[page]
Taxim: An example-based simulation model for GelSight tactile sensors, IEEE Robotics and Automation Letters, 2022.
Si, Zilin and Yuan, Wenzhen.
[page]
Objectfolder 2.0: A multisensory object dataset for sim2real transfer, CVPR, 2022.
Gao, Ruohan and Si, Zilin and Chang, Yen-Yu and Clarke, Samuel and Bohg, Jeannette and Fei-Fei, Li and Yuan, Wenzhen and Wu, Jiajun.
[page]
Self-supervised visuo-tactile pretraining to locate and follow garment features, ArXiv, 2022.
Kerr, Justin and Huang, Huang and Wilcox, Albert and Hoque, Ryan and Ichnowski, Jeffrey and Calandra, Roberto and Goldberg, Ken.
[page]
Visuotactile 6D pose estimation of an in-hand object using vision and tactile sensor data, IEEE Robotics and Automation Letters, 2022.
Dikhale, Snehal and Patel, Karankumar and Dhingra, Daksh and Naramura, Itoshi and Hayashi, Akinobu and Iba, Soshi and Jamali, Nawid.
[page]
Shapemap 3-D: Efficient shape mapping through dense touch and vision, ICRA, 2022.
Suresh, Sudharshan and Si, Zilin and Mangelson, Joshua G and Yuan, Wenzhen and Kaess, Michael.
[page]
Visuotactile-rl: Learning multimodal manipulation policies with deep reinforcement learning, ICRA, 2022.
Hansen, Johanna and Hogan, Francois and Rivkin, Dmitriy and Meger, David and Jenkin, Michael and Dudek, Gregory.
[page]
Tactile gym 2.0: Sim-to-real deep reinforcement learning for comparing low-cost high-resolution robot touch, IEEE Robotics and Automation Letters, 2022.
Lin, Yijiong and Lloyd, John and Church, Alex and Lepora, Nathan F.
[page]
Touch and go: Learning from human-collected vision and touch, ArXiv, 2022.
Yang, Fengyu and Ma, Chenyang and Zhang, Jiacheng and Zhu, Jing and Yuan, Wenzhen and Owens, Andrew.
[page]
Objectfolder: A dataset of objects with implicit visual, auditory, and tactile representations, arXiv, 2021.
Gao, Ruohan and Chang, Yen-Yu and Mall, Shivani and Fei-Fei, Li and Wu, Jiajun.
[page]
Learning transferable visual models from natural language supervision, International Conference on Machine Learning, 2021.
Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others.
[page]
GelSight wedge: Measuring high-resolution 3D contact geometry with a compact robot finger, ICRA, 2021.
Wang, Shaoxiong and She, Yu and Romero, Branden and Adelson, Edward.
[page]
Tactile object pose estimation from the first touch with geometric contact rendering, CoRL, 2021.
Villalonga, Maria Bauza and Rodriguez, Alberto and Lim, Bryan and Valls, Eric and Sechopoulos, Theo.
[page]
Active 3D shape reconstruction from vision and touch, NeurIPS, 2021.
Smith, Edward and Meger, David and Pineda, Luis and Calandra, Roberto and Malik, Jitendra and Romero Soriano, Adriana and Drozdzal, Michal.
[page]
Interpreting and predicting tactile signals for the syntouch biotac, The International Journal of Robotics Research, 2021.
Narang, Yashraj S and Sundaralingam, Balakumar and Van Wyk, Karl and Mousavian, Arsalan and Fox, Dieter.
[page]
GelTip: A finger-shaped optical tactile sensor for robotic manipulation, IROS, 2020.
Gomes, Daniel Fernandes and Lin, Zhonglin and Luo, Shan.
[page]
DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor With Application to In-Hand Manipulation, IEEE Robotics and Automation Letters, 2020.
Lambeta, Mike and Chou, Po-Wei and Tian, Stephen and Yang, Brian and Maloon, Benjamin and Most, Victoria Rose and Stroud, Dave and Santos, Raymond and Byagowi, Ahmad and Kammerer, Gregg and Jayaraman, Dinesh and Calandra, Roberto.
[page]
Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation, IEEE Robotics and Automation Letters, 2020.
Lambeta, Mike and Chou, Po-Wei and Tian, Stephen and Yang, Brian and Maloon, Benjamin and Most, Victoria Rose and Stroud, Dave and Santos, Raymond and Byagowi, Ahmad and Kammerer, Gregg and others.
[page]
Deep tactile experience: Estimating tactile sensor output from depth sensor data, IROS, 2020.
Patel, Karankumar and Iba, Soshi and Jamali, Nawid.
[page]
3D shape reconstruction from vision and touch, NeurIPS, 2020.
Smith, Edward and Calandra, Roberto and Romero, Adriana and Gkioxari, Georgia and Meger, David and Malik, Jitendra and Drozdzal, Michal.
[page]
Supervised autoencoder joint learning on heterogeneous tactile sensory data: Improving material classification performance, IROS, 2020.
Gao, Ruihan and Taunyazov, Tasbolat and Lin, Zhiping and Wu, Yan.
[page]
Making sense of vision and touch: Learning multimodal representations for contact-rich tasks, IEEE Transactions on Robotics, 2020.
Lee, Michelle A and Zhu, Yuke and Zachares, Peter and Tan, Matthew and Srinivasan, Krishnan and Savarese, Silvio and Fei-Fei, Li and Garg, Animesh and Bohg, Jeannette.
[page]
Learning efficient haptic shape exploration with a rigid tactile sensor array, PloS One, 2020.
Fleer, Sascha and Moringen, Alexandra and Klatzky, Roberta L and Ritter, Helge.
[page]
Interpreting and predicting tactile signals via a physics-based and data-driven framework, ArXiv, 2020.
Narang, Yashraj S and Van Wyk, Karl and Mousavian, Arsalan and Fox, Dieter.
[page]
Fast texture classification using tactile neural coding and spiking neural network, IROS, 2020.
Taunyazov, Tasbolat and Chua, Yansong and Gao, Ruihan and Soh, Harold and Wu, Yan.
[page]
Simulation of the SynTouch BioTac sensor, Intelligent Autonomous Systems 15: Proceedings of the 15th International Conference IAS-15, 2019.
Ruppel, Philipp and Jonetzko, Yannick and Görner, Michael and Hendrich, Norman and Zhang, Jianwei.
[page]
Robust learning of tactile force estimation through robot interaction, ICRA, 2019.
Sundaralingam, Balakumar and Lambert, Alexander Sasha and Handa, Ankur and Boots, Byron and Hermans, Tucker and Birchfield, Stan and Ratliff, Nathan and Fox, Dieter.
[page]
From pixels to percepts: Highly robust edge perception and contour following using deep learning and an optical biomimetic tactile sensor, IEEE Robotics and Automation Letters, 2019.
Lepora, Nathan F and Church, Alex and De Kerckhove, Conrad and Hadsell, Raia and Lloyd, John.
[page]
Tactile mapping and localization from high-resolution tactile imprints, ICRA, 2019.
Bauza, Maria and Canal, Oleguer and Rodriguez, Alberto.
[page]
Convolutional autoencoder for feature extraction in tactile sensing, IEEE Robotics and Automation Letters, 2019.
Polic, Marsela and Krajacic, Ivona and Lepora, Nathan and Orsag, Matko.
[page]
Learning to identify object instances by touch: Tactile recognition via multimodal matching, ICRA, 2019.
Lin, Justin and Calandra, Roberto and Levine, Sergey.
[page]
The tactip family: Soft optical tactile sensors with 3D-printed biomimetic morphologies, Soft Robotics, 2018.
Ward-Cherrier, Benjamin and Pestell, Nicholas and Cramphorn, Luke and Winstone, Benjamin and Giannaccini, Maria Elena and Rossiter, Jonathan and Lepora, Nathan F.
[page]
3D shape perception from monocular vision, touch, and shape priors, IROS, 2018.
Wang, Shaoxiong and Wu, Jiajun and Sun, Xingyuan and Yuan, Wenzhen and Freeman, William T and Tenenbaum, Joshua B and Adelson, Edward H.
[page]
GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force, Sensors, 2017.
Yuan, Wenzhen and Dong, Siyuan and Adelson, Edward H.
[page]
The feeling of success: Does touch sensing help predict grasp outcomes?, arXiv, 2017.
Calandra, Roberto and Owens, Andrew and Upadhyaya, Manu and Yuan, Wenzhen and Lin, Justin and Adelson, Edward H and Levine, Sergey.
[page]
Improved GelSight tactile sensor for measuring geometry and slip, IROS, 2017.
Dong, Siyuan and Yuan, Wenzhen and Adelson, Edward H.
[page]
GelSight: High-resolution robot tactile sensors for estimating geometry and force, Sensors, vol. 17, no. 12, pp. 2762, 2017.
Yuan, Wenzhen and Dong, Siyuan and Adelson, Edward H.
[page]
Connecting look and feel: Associating the visual and tactile properties of physical materials, CVPR, 2017.
Yuan, Wenzhen and Wang, Shaoxiong and Dong, Siyuan and Adelson, Edward.
[page]
Stable reinforcement learning with autoencoders for tactile and visual data, IROS, 2016.
Van Hoof, Herke and Chen, Nutan and Karl, Maximilian and van der Smagt, Patrick and Peters, Jan.
[page]
Sensing tactile microvibrations with the BioTac—Comparison with human sensitivity, BioRob, 2012.
Fishel, Jeremy A and Loeb, Gerald E.
[page]

Embodied Interaction 🔝

Cross-Embodiment Dexterous Grasping with Reinforcement Learning, arxiv, 2024
Haoqi Yuan, Bohan Zhou, Yuhui Fu, Zongqing Lu.
[page]
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation, arxiv, 2024
Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang.
[page]
MANUS: Markerless Grasp Capture using Articulated 3D Gaussians, CVPR, 2024
Chandradeep Pokhariya, Ishaan Nikhil Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar.
[page]
Language-driven Grasp Detection, CVPR, 2024
An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen.
[page]
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge, CVPR, 2024
Haoxiang Ma, Modi Shi, Boyang Gao, Di Huang.
[page]
Dexterous Grasp Transformer, CVPR, 2024
Guo-Hao Xu, Yi-Lin Wei, Dian Zheng, Xiao-Ming Wu, Wei-Shi Zheng.
[page]
Single-View Scene Point Cloud Human Grasp Generation, CVPR, 2024
Yan-Kang Wang, Chengyi Xing, Yi-Lin Wei, Xiao-Ming Wu, Wei-Shi Zheng.
[page]
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis, CVPR, 2024
Yufei Ye, Abhinav Gupta, Kris Kitani, Shubham Tulsiani.
[page]
Grasping Diverse Objects with Simulated Humanoids ArXiv, 2024.
Zhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu
[page]
Task-Oriented Dexterous Grasp Synthesis via Differentiable Grasp Wrench Boundary Estimator, IROS, 2024.
Jiayi Chen,Yuxing Chen,Jialiang Zhang, He Wang
[page]
Open6DOR: Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach, IROS, 2024.
Yufei Ding,Haoran Geng , Chaoyi Xu ,Xiaomeng Fang,Jiazhao Zhang,Songlin Wei, Qiyu Dai, Zhizheng Zhang, He Wang
[page]
ASGrasp: Generalizable Transparent Object Reconstruction and 6-DoF Grasp Detection from RGB-D Active Stereo Camera, ICRA, 2024.
Jun Shi, Yong A, Yixiang Jin, Dingzhe Li, Haoyu Niu, Zhezhu Jin, He Wang
[page]
OpenEQA: Embodied Question Answering in the Era of Foundation Models, CVPR, 2024
Majumdar, Arjun and Ajay, Anurag and Zhang, Xiaohan and Putta, Pranav and Yenamandra, Sriram and Henaff, Mikael and Silwal, Sneha and Mcvay, Paul and Maksymets, Oleksandr and Arnaud, Sergio and others
[page]
Explore until Confident: Efficient Exploration for Embodied Question Answering, ICRA Workshop VLMNM, 2024
Ren, Allen Z and Clark, Jaden and Dixit, Anushri and Itkina, Masha and Majumdar, Anirudha and Sadigh, Dorsa
[page]
S-EQA: Tackling Situational Queries in Embodied Question Answering, arXix, 2024
Dorbala, Vishnu Sashank and Goyal, Prasoon and Piramuthu, Robinson and Johnston, Michael and Manocha, Dinesh and Ghanadhan, Reza
[page]
Map-based Modular Approach for Zero-shot Embodied Question Answering, arXiv, 2024
Sakamoto, Koya and Azuma, Daichi and Miyanishi, Taiki and Kurita, Shuhei and Kawanabe, Motoaki
[page]
Embodied Question Answering via Multi-LLM Systems, arXiv, 2024
Bhrij Patel and Vishnu Sashank Dorbala and Amrit Singh Bedi
[page]
MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands, arXiv, 2024
Murrilo, Luis Felipe Casas and Khargonkar, Ninad and Prabhakaran, Balakrishnan and Xiang, Yu
[page]
Reasoning Grasping via Multimodal Large Language Model, arXiv, 2024
Jin, Shiyu and Xu, Jinxuan and Lei, Yutian and Zhang, Liangjun
[page]
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization, CoRR, 2024
Li, Kailin and Wang, Jingbo and Yang, Lixin and Lu, Cewu and Dai, Bo
[page]
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping, arXiv, 2024
Zheng, Yuhang and Chen, Xiangyu and Zheng, Yupeng and Gu, Songen and Yang, Runyi and Jin, Bu and Li, Pengfei and Zhong, Chengliang and Wang, Zengmao and Liu, Lina and others
[page]
Knowledge-based Embodied Question Answering, TPAMI, 2023
Tan, Sinan and Ge, Mengmeng and Guo, Di and Liu, Huaping and Sun, Fuchun
[page]
Deep Learning Approaches to Grasp Synthesis: A Review, IEEE Transactions on Robotics, 2023
Newbury, Rhys and Gu, Morris and Chumbley, Lachlan and Mousavian, Arsalan and Eppner, Clemens and Leitner, J{"u}rgen and Bohg, Jeannette and Morales, Antonio and Asfour, Tamim and Kragic, Danica and others
[page]
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter, CoRL, 2023
Tziafas, Georgios and Xu, Yucheng and Goel, Arushi and Kasaei, Mohammadreza and Li, Zhibin and Kasaei, Hamidreza
[page]
Reasoning Tuning Grasp: Adapting Multi-Modal Large Language Models for Robotic Grasping, CoRL, 2023
Xu, Jinxuan and Jin, Shiyu and Lei, Yutian and Zhang, Yuqian and Zhang, Liangjun
[page]
Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation, CoRL, 2023
Shen, William and Yang, Ge and Yu, Alan and Wong, Jansen and Kaelbling, Leslie Pack and Isola, Phillip
[page]
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains, IEEE Transactions on Robotics, 2023 Fang, Hao-Shu and Wang, Chenxi and Fang, Hongjie and Gou, Minghao and Liu, Jirong and Yan, Hengxu and Liu, Wenhai and Xie, Yichen and Lu, Cewu
[page]
DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation, ICRA, 2023.
Ruicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, He Wang
[page]
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy, CVPR, 2023.
Yinzhen Xu, Weikang Wan, Jialiang Zhang, Haoran Liu, Zikang Shan, Hao Shen, Ruicheng Wang, Haoran Geng, Yijia Weng, Jiayi Chen, Tengyu Liu, Li Yi, He Wang
[page]
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning, ICCV, 2023.
Weikang Wan, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, He Wang
[page]
CLIPort: What and Where Pathways for Robotic Manipulation, CoRL, 2022
Shridhar, Mohit and Manuelli, Lucas and Fox, Dieter
[page]
ACRONYM: A Large-Scale Grasp Dataset Based on Simulation, ICRA, 2021
Eppner, Clemens and Mousavian, Arsalan and Fox, Dieter
[page]
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI, NeurIPS, 2021
Ramakrishnan, Santhosh K and Gokaslan, Aaron and Wijmans, Erik and Maksymets, Oleksandr and Clegg, Alex and Turner, John and Undersander, Eric and Galuba, Wojciech and Westbury, Andrew and Chang, Angel X and others
[page]
End-to-end Trainable Deep Neural Network for Robotic Grasp Detection and Semantic Segmentation from RGB, ICRA, 2021
Ainetter, Stefan and Fraundorfer, Friedrich
[page]
Revisiting EmbodiedQA: A Simple Baseline and Beyond, IEEE Transactions on Image Processing, 2020
Wu, Yu and Jiang, Lu and Yang, Yi
[page]
Multi-agent Embodied Question Answering in Interactive Environments, ECCV, 2020
Tan, Sinan and Xiang, Weilai and Liu, Huaping and Guo, Di and Sun, Fuchun
[page]
Language Models are Few-Shot Learners, NIPS, 2020
Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others
[page]
GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping, CVPR, 2020
Fang, Hao-Shu and Wang, Chenxi and Gou, Minghao and Lu, Cewu
[page]
Multi-Target Embodied Question Answering, CVPR, 2019
Yu, Licheng and Chen, Xinlei and Gkioxari, Georgia and Bansal, Mohit and Berg, Tamara L and Batra, Dhruv
[page]
Embodied Question Answering in Photorealistic Environments with Point Cloud Perception, CVPR, 2019
Wijmans, Erik and Datta, Samyak and Maksymets, Oleksandr and Das, Abhishek and Gkioxari, Georgia and Lee, Stefan and Essa, Irfan and Parikh, Devi and Batra, Dhruv
[page]
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering, BMVC, 2019
Cangea, C{\u{a}}t{\u{a}}lina and Belilovsky, Eugene and Li{`o}, Pietro and Courville, Aaron
[page]
6-DOF GraspNet: Variational Grasp Generation for Object Manipulation, ICCV, 2019
Mousavian, Arsalan and Eppner, Clemens and Fox, Dieter
[page]
Embodied Question Answering, CVPR, 2018
Das, Abhishek and Datta, Samyak and Gkioxari, Georgia and Lee, Stefan and Parikh, Devi and Batra, Dhruv
[page]
IQA: Visual Question Answering in Interactive Environments, CVPR, 2018
Gordon, Daniel and Kembhavi, Aniruddha and Rastegari, Mohammad and Redmon, Joseph and Fox, Dieter and Farhadi, Ali [page]
Building Generalizable Agents with a Realistic and Rich 3D Environment, ECCV, 2018
Wu, Yi and Wu, Yuxin and Gkioxari, Georgia and Tian, Yuandong
[page]
MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments, ECCV, 2018
Savva, Manolis and Chang, Angel X and Dosovitskiy, Alexey and Funkhouser, Thomas and Koltun, Vladlen
[page]
Neural Modular Control for Embodied Question Answering, ECCV, 2018
Das, Abhishek and Gkioxari, Georgia and Lee, Stefan and Parikh, Devi and Batra, Dhruv
[page]
Jacquard: A Large Scale Dataset for Robotic Grasp Detection, IROS, 2018
Depierre, Amaury and Dellandr{'e}a, Emmanuel and Chen, Liming
[page]
Matterport3D: Learning from rgb-d data in indoor environments,, IEEE International Conference on 3D Vision, 2017
Chang, Angel and Dai, Angela and Funkhouser, Thomas and Halber, Maciej and Niessner, Matthias and Savva, Manolis and Song, Shuran and Zeng, Andy and Zhang, Yinda
[page]
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, CVPR, 2017
Dai, Angela and Chang, Angel X and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie{\ss}ner, Matthias [page]
Shape Completion Enabled Robotic Grasping, IROS, 2017
Varley, Jacob and DeChant, Chad and Richardson, Adam and Ruales, Joaqu{'\i}n and Allen, Peter
[page]
Efficient grasping from RGBD images: Learning using a new rectangle representation, IEEE International Conference on Robotics and Automation, 2011
Jiang, Yun and Moseson, Stephen and Saxena, Ashutosh
[page]
A frontier-based approach for autonomous exploration, CIRA, 1997
Yamauchi, Brian
[page]

Embodied Agent 🔝

Embodied Multimodal Foundation Models

GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation, arXiv, 2024.
Chi-Lam Cheang, Guangzeng Chen, Ya Jing, Tao Kong, Hang Li, Yifeng Li, Yuxiao Liu, Hongtao Wu, Jiafeng Xu, Yichu Yang, Hanbo Zhang, Minzhao Zhu.
[page]
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers, arXiv, 2024.
Lirui Wang, Xinlei Chen, Jialiang Zhao, Kaiming He.
[page]
Spatial Reasoning and Planning for Deep Embodied Agents, arXiv, 2024.
Shu Ishida.
[page]
Grounding Large Language Models In Embodied Environment With Imperfect World Models, arXiv, 2024.
Haolan Liu, Jishen Zhao.
[page]
SELU: Self-Learning Embodied MLLMs in Unknown Environments, arXiv, 2024.
Boyu Li, Haobin Jiang, Ziluo Ding, Xinrun Xu, Haoran Li, Dongbin Zhao, Zongqing Lu.
[page]
Autort: Embodied foundation models for large scale orchestration of robotic agents, arXiv, 2024.
Ahn, Michael, Debidatta, Dwibedi, Chelsea, Finn, Montse Gonzalez, Arenas, Keerthana, Gopalakrishnan, Karol, Hausman, Brian, Ichter, Alex, Irpan, Nikhil, Joshi, Ryan, Julian, others.
[page]
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learningn, arXiv, 2024.
Norman Di Palo, Leonard Hasenclever, Jan Humplik, Arunkumar Byravan.
[page]
Rt-h: Action hierarchies using language, ArXiv, 2024.
Belkhale, Suneel, Tianli, Ding, Ted, Xiao, Pierre, Sermanet, Quon, Vuong, Jonathan, Tompson, Yevgen, Chebotar, Debidatta, Dwibedi, Dorsa, Sadigh.
[page]
Do as i can, not as i say: Grounding language in robotic affordances, Conference on robot learning. 2023.
Brohan, Anthony, Yevgen, Chebotar, Chelsea, Finn, Karol, Hausman, Alexander, Herzog, Daniel, Ho, Julian, Ibarz, Alex, Irpan, Eric, Jang, Ryan, Julian.
[page]
Embodiedgpt: Vision-language pre-training via embodied chain of thought, NeurIPS, 2024.
Mu, Yao, Qinglong, Zhang, Mengkang, Hu, Wenhai, Wang, Mingyu, Ding, Jun, Jin, Bin, Wang, Jifeng, Dai, Yu, Qiao, Ping, Luo.
[page]
Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions, Conference on Robot Learning. 2023.
Chebotar, Yevgen, Quan, Vuong, Karol, Hausman, Fei, Xia, Yao, Lu, Alex, Irpan, Aviral, Kumar, Tianhe, Yu, Alexander, Herzog, Karl, Pertsch, others.
[page]
Sara-rt: Scaling up robotics transformers with self-adaptive robust attention, arXiv, 2023.
Leal, Isabel, Krzysztof, Choromanski, Deepali, Jain, Avinava, Dubey, Jake, Varley, Michael, Ryoo, Yao, Lu, Frederick, Liu, Vikas, Sindhwani, Quan, Vuong, others.
[page]
Palm-e: An embodied multimodal language model, ArXiv, 2023.
Driess, Danny, Fei, Xia, Mehdi SM, Sajjadi, Corey, Lynch, Aakanksha, Chowdhery, Brian, Ichter, Ayzaan, Wahid, Jonathan, Tompson, Quan, Vuong, Tianhe, Yu, others.
[page]
Rt-2: Vision-language-action models transfer web knowledge to robotic control, Conference on Robot Learning. 2023.
Zitkovich, Brianna, Tianhe, Yu, Sichun, Xu, Peng, Xu, Ted, Xiao, Fei, Xia, Jialin, Wu, Paul, Wohlhart, Stefan, Welker, Ayzaan, Wahid, others.
[page]
Open x-embodiment: Robotic learning datasets and rt-x models, arXiv, 2023.
Padalkar, others.
[page]
Vision-language foundation models as effective robot imitators, arXiv, 2023.
Li, Xinghang, Minghuan, Liu, Hanbo, Zhang, Cunjun, Yu, Jie, Xu, Hongtao, Wu, Chilam, Cheang, Ya, Jing, Weinan, Zhang, Huaping, Liu, others.
[page]
Rt-1: Robotics transformer for real-world control at scale, ArXiv, 2022.
Brohan, Anthony, Noah, Brown, Justice, Carbajal, Yevgen, Chebotar, Joseph, Dabis, Chelsea, Finn, Keerthana, Gopalakrishnan, Karol, Hausman, Alex, Herzog, Jasmine, Hsu, others.
[page]

Embodied Manipulation & Control

SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation, ArXiv, 2024.
Zihan Zhou, Animesh Garg, Dieter Fox, Caelan Garrett, Ajay Mandlekar.
[page]
Diffusion Transformer Policy, ArXiv, 2024.
Zhi Hou, Tianyi Zhang, Yuwen Xiong, Hengjun Pu, Chengyang Zhao, Ronglei Tong, Yu Qiao, Jifeng Dai, Yuntao Chen.
[page]
Dexcap: Scalable and portable mocap data collection system for dexterous manipulation, ArXiv, 2024.
Chen Wang, Haochen Shi, Weizhuo Wang, Ruohan Zhang, Li Fei-Fei, C Karen Liu.
[page]
Lota-bench: Benchmarking language-oriented task planners for embodied agents, ArXiv, 2024.
Choi, Jae-Woo, Youngwoo, Yoon, Hyobin, Ong, Jaehong, Kim, Minsu, Jang.
[page]
Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following, Arxiv, 2024.
Suyeon Shin, Sujin jeon, Junghyun Kim, Gi-Cheon Kang, Byoung-Tak Zhang.
[page]
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning, NeurIPS, 2024.
Zhao, Zirui, Wee Sun, Lee, David, Hsu.
[page]
Generalized Planning in PDDL Domains with Pretrained Large Language Models, AAAI, 2024.
Silver, Tom, Soham, Dan, Kavitha, Srinivas, Joshua B., Tenenbaum, Leslie Pack, Kaelbling, Michael, Katz.
[page]
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration arXiv, 2024.
Zhang, Yang, Shixin, Yang, Chenjia, Bai, Fei, Wu, Xiu, Li, Xuelong, Li, Zhen, Wang.
[page]
Embodied Instruction Following in Unknown Environments, arXiv, 2024.
Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan.
[page]
A Backbone for Long-Horizon Robot Task Understanding, arxiv, 2024.
Xiaoshuai Chen, Wei Chen, Dongmyoung Lee, Yukun Ge, Nicolas Rojas, and Petar Kormushev.
[page]
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation, arXiv, 2024.
Liu, Jiaming, Mengzhen, Liu, Zhenyu, Wang, Lily, Lee, Kaichen, Zhou, Pengju, An, Senqiao, Yang, Renrui, Zhang, Yandong, Guo, Shanghang, Zhang.
[page]
Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation, arxiv, 2024.
Ruoxuan Feng, Di Hu1, Wenke Ma, Xuelong Li.
[page]
Egocentric Vision Language Planning, arxiv, 2024.
Zhirui Fang, Ming Yang, Weishuai Zeng, Boyu Li, Junpeng Yue, Ziluo Ding, Xiu Li, Zongqing Lu.
[page]
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models, IROS, 2024.
Tianyu Wang, Haitao Lin, Junqiu Yu, Yanwei Fu.
[page]
LLM-SAP: Large Language Model Situational Awareness Based Planning, ICME 2024 Workshop MML4SG.
Liman Wang, Hanyang Zhong.
[page]
FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning, ArXiv, 2024.
Jianlan Luo, Charles Xu, Fangchen Liu, Liam Tan, Zipeng Lin, Jeffrey Wu, Pieter Abbeel, and Sergey Levine.
[page]
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models IROS, 2024.
Siyuan Huang, Iaroslav Ponomarenko, Zhengkai Jiang, Xiaoqi Li, Xiaobin Hu, Peng Gao, Hongsheng Li, and Hao Dong.
[page]
A3VLM: Actionable Articulation-Aware Vision Language Model ArXiv, 2024.
Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, and Hongsheng Li.
[page]
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld, CVPR, 2024.
Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, Yuhui Shi.
[page]
Retrieval-Augmented Embodied Agents, CVPR, 2024.
Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang.
[page]
Multi-agent Collaborative Perception via Motion-aware Robust Communication Network, CVPR, 2024.
Shixin Hong, Yu Liu, Zhi Li, Shaohui Li, You He.
[page]
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models, ICCV, 2023.
Chan Hee Song, Jiaman Wu, Clay Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su.
[[page](LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models)]
Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models EMNLP, 2023.
Sarch, Gabriel, Yue, Wu, Michael J., Tarr, Katerina, Fragkiadaki.
[page]
Voyager: An Open-Ended Embodied Agent with Large Language Models, TMLR, 2023.
Wang, Guanzhi, Yuqi, Xie, Yunfan, Jiang, Ajay, Mandlekar, Chaowei, Xiao, Yuke, Zhu, Linxi, Fan, Anima, Anandkumar.
[page]
ReAct: Synergizing Reasoning and Acting in Language Models, ICLR, 2023.
Yao, Shunyu, Jeffrey, Zhao, Dian, Yu, Nan, Du, Izhak, Shafran, Karthik, Narasimhan, Yuan, Cao.
[page]
ProgPrompt: Generating Situated Robot Task Plans Using Large Language Models , ICRA, 2023.
Singh, Ishika, Valts, Blukis, Arsalan, Mousavian, Ankit, Goyal, Danfei, Xu, Jonathan, Tremblay, Dieter, Fox, Jesse, Thomason, Animesh, Garg.
[page]
ChatGPT for Robotics: Design Principles and Model Abilities, IEEE Access 12. (2023): 55682-55696.
Sai Vemprala, Rogerio Bonatti, Arthur Fender C. Bucker, Ashish Kapoor.
[page]
Code as Policies: Language Model Programs for Embodied Control, ICRA, 2023.
Jacky Liang, , Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng.
[page]
Reasoning with Language Model Is Planning with World Model, Arxiv, 2023.
Hao, Shibo, Yi, Gu, Haodi, Ma, Joshua Jiahua, Hong, Zhen, Wang, Daisy Zhe, Wang, Zhiting, Hu.
[page]
LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement, arXiv, 2023.
Haonan Chang, Kai Gao, Kowndinya Boyalakuntla, Alex Lee, Baichuan Huang, Harish Udhaya Kumar, Jinjin Yu, Abdeslam Boularias.
[page]
Translating Natural Language to Planning Goals with Large-Language Models, arXiv, 2023.
Xie, Yaqi, Chen, Yu, Tongyao, Zhu, Jinbin, Bai, Ze, Gong, Harold, Soh.
[page]
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency, arXiv, 2023.
Liu, Bo, Yuqian, Jiang, Xiaohan, Zhang, Qiang, Liu, Shiqi, Zhang, Joydeep, Biswas, Peter, Stone.
[page]
Dynamic Planning with a LLM, arXiv, 2023.
Dagan, Gautier, Frank, Keller, Alex, Lascarides.
[page]
Embodied Task Planning with Large Language Models, arXiv, 2023.
Wu, Zhenyu, Ziwei, Wang, Xiuwei, Xu, Jiwen, Lu, Haibin, Yan.
[page]
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning, Conference on Robot Learning. 2023.
Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian D. Reid, Niko Sunderhauf.
[page]
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning, ArXiv, 2023.
Qiao Gu, Ali Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Ramalingam Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull.
[page]
RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks, arXiv, 2023.
Yaran Chen, Wenbo Cui, Yuanwen Chen, Mining Tan, Xinyao Zhang, Dong Zhao, He Wang.
[page]
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models, IROS, 2023.
Zhao, Xufeng, Mengdi, Li, Cornelius, Weber, Muhammad Burhan, Hafez, Stefan, Wermter.
[page]
Video Language Planning, arxiv, 2023.
Du, Yilun, Mengjiao, Yang, Pete, Florence, Fei, Xia, Ayzaan, Wahid, Brian, Ichter, Pierre, Sermanet, Tianhe, Yu, Pieter, Abbeel, Joshua B., Tenenbaum, Leslie, Kaelbling, Andy, Zeng, Jonathan, Tompson.
[page]
Code as Policies: Language Model Programs for Embodied Control, ICRA, 2023,
Jacky Liang, Wenlong Huang, F. Xia, Peng Xu, Karol Hausman, Brian Ichter, Peter R. Florence, Andy Zeng.
[page]
Reflexion: an autonomous agent with dynamic memory and self-reflection, ArXiv, 2023.
Noah Shinn, Beck Labash, A. Gopinath.
[page]
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents, Proceedings of the 37th International Conference on Neural Information Processing Systems, 2023.
Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, Yitao Liang.
[page]
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model, ArXiv, 2023.
Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, and Hongsheng Li.
[page]
Cliport: What and where pathways for robotic manipulation, Conference on robot learning, 2022.
Shridhar, Mohit, Lucas, Manuelli, Dieter, Fox.
[page]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, ICML, 2022.
Huang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch.
[page]
Inner Monologue: Embodied Reasoning through Planning with Language Models, Conference on Robot Learning, 2022.
Huang, Wenlong, Fei, Xia, Ted, Xiao, Harris, Chan, Jacky, Liang, Pete, Florence, Andy, Zeng, Jonathan, Tompson, Igor, Mordatch, Yevgen, Chebotar, Pierre, Sermanet, Noah, Brown, Tomas, Jackson, Linda, Luu, Sergey, Levine, Karol, Hausman, Brian, Ichter.
[page]
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents, ICML, 2022.
Huang, Wenlong, Pieter, Abbeel, Deepak, Pathak, Igor, Mordatch.
[page]
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, ICLR, 2022.
Zeng, Andy, Maria, Attarian, Brian, Ichter, Krzysztof, Choromanski, Adrian, Wong, Stefan, Welker, Federico, Tombari, Aveek, Purohit, Michael, Ryoo, Vikas, Sindhwani, Johnny, Lee, Vincent, Vanhoucke, Pete, Florence.
[[page](Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language)]
Skill Induction and Planning with Latent Language, ACL, 2021.
Pratyusha Sharma, Antonio Torralba, Jacob Andreas.
[page]
PDDL-the planning domain definition language, Technical Report. 1998.
Drew McDermott, Malik Ghallab, Adele E. Howe, Craig A. Knoblock, Ashwin Ram, Manuela M. Veloso, Daniel S. Weld, David E. Wilkins.
[page]
Strips: A new approach to the application of theorem proving to problem solving, Artificial Intelligence 2. 3(1971): 189-208.
Richard E. Fikes, Nils J. Nilsson.
[page]
A Formal Basis for the Heuristic Determination of Minimum Cost Paths, IEEE Trans. Syst. Sci. Cybern. 4. (1968): 100-107.
Peter E. Hart, Nils J. Nilsson, Bertram Raphael.
[page]
The Monte Carlo method, Journal of the American Statistical Association 44 247. (1949): 335-41.
Nicholas C. Metropolis, S. M. Ulam.
[page]

Sim-to-Real Adaptation 🔝

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation, NeurIPS, 2024
Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang
[page]
Data Scaling Laws in Imitation Learning for Robotic Manipulation, arxiv, 2024
Fanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
[page]
Evaluating Real-World Robot Manipulation Policies in Simulation, arxiv, 2024
Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao
[page]
Body Transformer: Leveraging Robot Embodiment for Policy Learning, arxiv, 2024
Carmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, Pieter Abbeel
[page]
Autonomous Behavior Planning For Humanoid Loco-manipulation Through Grounded Language Model, arxiv, 2024
Jin Wang, Arturo Laurenzi, Nikos Tsagarakis
[page]
Robust agents learn causal world models, ICLR, 2024
Richens, Jonathan, and Tom Everitt
[page]
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots, arXiv， 2024
Chi, Cheng and Xu, Zhenjia and Pan, Chuer and Cousineau, Eric and Burchfiel, Benjamin and Feng, Siyuan and Tedrake, Russ and Song, Shuran
[page]
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation, arXiv, 2024
Fu, Zipeng and Zhao, Tony Z and Finn, Chelsea
[page]
Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition, arXiv, 2024
Luo, Shengcheng and Peng, Quanquan and Lv, Jun and Hong, Kaiwen and Driggs-Campbell, Katherine Rose and Lu, Cewu and Li, Yong-Lu
[page]
Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation, arXiv, 2024
Torne, Marcel and Simeonov, Anthony and Li, Zechu and Chan, April and Chen, Tao and Gupta, Abhishek and Agrawal, Pulkit
[page]
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction, arXiv, 2024
Jiang, Yunfan and Wang, Chen and Zhang, Ruohan and Wu, Jiajun and Fei-Fei, Li
[page]
Natural Language Can Help Bridge the Sim2Real Gap, arXiv, 2024
Yu, Albert and Foote, Adeline and Mooney, Raymond and Mart{'\i}n-Mart{'\i}n, Roberto
[page]
Visual Whole-Body Control for Legged Loco-Manipulation, arXiv, 2024
Liu, Minghuan and Chen, Zixuan and Cheng, Xuxin and Ji, Yandong and Yang, Ruihan and Wang, Xiaolong
[page]
Expressive Whole-Body Control for Humanoid Robots, arXiv, 2024
Cheng, Xuxin and Ji, Yandong and Chen, Junming and Yang, Ruihan and Yang, Ge and Wang, Xiaolong
[page]
Pandora: Towards General World Model with Natural Language Actions and Video States, arXiv, 2024
Xiang, Jiannan and Liu, Guangyi and Gu, Yi and Gao, Qiyue and Ning, Yuting and Zha, Yuheng and Feng, Zeyu and Tao, Tianhua and Hao, Shibo and Shi, Yemin and others
[page]
3D-VLA: A 3D Vision-Language-Action Generative World Model, ICML, 2024
Zhen, Haoyu and Qiu, Xiaowen and Chen, Peihao and Yang, Jincheng and Yan, Xin and Du, Yilun and Hong, Yining and Gan, Chuang
[page]
Diffusion World Model: Future Modeling Beyond Step-by-Step Rollout for Offline Reinforcement Learning, arXiv, 2024
Ding, Zihan and Zhang, Amy and Tian, Yuandong and Zheng, Qinqing
[page]
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features, ICLR, 2024
Bardes, Adrien and Ponce, Jean and LeCun, Yann
[page]
Learning and Leveraging World Models in Visual Representation Learning, arXiv, 2024
Garrido, Quentin and Assran, Mahmoud and Ballas, Nicolas and Bardes, Adrien and Najman, Laurent and LeCun, Yann
[page]
iVideoGPT: Interactive VideoGPTs are Scalable World Models, arXiv, 2024
Wu, Jialong and Yin, Shaofeng and Feng, Ningya and He, Xu and Li, Dong and Hao, Jianye and Long, Mingsheng
[page]
Spatiotemporal Predictive Pre-training for Robotic Motor Control, arXiv, 2024
Yang, Jiange and Liu, Bei and Fu, Jianlong and Pan, Bocheng and Wu, Gangshan and Wang, Limin
[page]
LEGENT: Open Platform for Embodied Agents, arXiv, 2024
Cheng, Zhili and Wang, Zhitong and Hu, Jinyi and Hu, Shengding and Liu, An and Tu, Yuge and Li, Pengkai and Shi, Lei and Liu, Zhiyuan and Sun, Maosong
[page]
Point-JEPA: A Joint Embedding Predictive Architecture for Self-Supervised Learning on Point Cloud, arXiv, 2024
Saito, Ayumu and Poovvancheri, Jiju
[page]
MuDreamer: Learning Predictive World Models without Reconstruction, ICLR, 2024
Burchi, Maxime and Timofte, Radu
[page]
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought, arXiv, 2024
Wong, Lionel and Grand, Gabriel and Lew, Alexander K and Goodman, Noah D and Mansinghka, Vikash K and Andreas, Jacob and Tenenbaum, Joshua B
[page]
ElastoGen: 4D Generative Elastodynamics, arXiv, 2024
Feng, Yutao and Shang, Yintong and Feng, Xiang and Lan, Lei and Zhe, Shandian and Shao, Tianjia and Wu, Hongzhi and Zhou, Kun and Su, Hao and Jiang, Chenfanfu and others
[page]
Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models, Nature Machine Intelligence, 2024.
Lei Han, Qingxu Zhu, Jiapeng Sheng, Chong Zhang, Tingguang Li, Yizheng Zhang, He Zhang et al.
[page]
Model Adaptation for Time Constrained Embodied Control, CVPR, 2024.
Jaehyun Song, Minjong Yoo, Honguk Woo.
[page]
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation, CVPR, 2024.
Xiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong.
[page]
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation, CVPR, 2024.
Xiaoqi Li, Mingxu Zhang, Yiran Geng, Haoran Geng, Yuxing Long, Yan Shen, Renrui Zhang, Jiaming Liu, Hao Dong.
[page]
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation, CVPR, 2024.
Zifan Wang, Junyu Chen, Ziqing Chen, Pengwei Xie, Rui Chen, Li Yi.
[page]
SAGE: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated Objects, RSS, 2024.
Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, Leonidas Guibas.
[page]
GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion, ICRA, 2024.
Jiazhao Zhang, Nandiraju Gireesh, Jilong Wang, Xiaomeng Fang, Chaoyi Xu, Weiguang Chen, Liu Dai, He Wang.
[page]
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments, ECCV, 2024.
Taewoong Kim, Cheolhong Min, Byeonghwi Kim, Jinyeon Kim, Wonje Jeung, Jonghyun Choi.
[page]
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control, ECCV, 2024.
Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu.
[page]
DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems, ICML, 2024.
Kaibo He, Chenhui Zuo, Chengtian Ma, Yanan Sui.
[page]
A-JEPA: Joint-Embedding Predictive Architecture Can Listen, arXiv, 2023
Fei, Zhengcong and Fan, Mingyuan and Huang, Junshi
[page]
One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization, NeurIPS, 2023
Liu, Minghua and Xu, Chao and Jin, Haian and Chen, Linghao and Varma T, Mukund and Xu, Zexiang and Su, Hao
[page]
Introduction to Latent Variable Energy-Based Models: A Path Towards Autonomous Machine Intelligence, arXiv, 2023
Dawid, Anna and LeCun, Yann
[page]
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts, CVPR, 2023
Geng, Haoran and Xu, Helin and Zhao, Chengyang and Xu, Chao and Yi, Li and Huang, Siyuan and Wang, He
[page]
Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion, IEEE TPAMI, 2023
Huang, Changxin and Wang, Guangrun and Zhou, Zhibo and Zhang, Ronghui and Lin, Liang
[page]
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware, ICML, 2023
Zhao, Tony Z and Kumar, Vikash and Levine, Sergey and Finn, Chelsea
[page]
Surfer: Progressive Reasoning with World Models for Robotic Manipulation, arxiv, 2023.
Pengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen, Fengda Zhu, Mas Ma, Xiaodan Liang.
[page]
PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations, CVPR, 2023.
Haoran Geng, Ziming Li, Yiran Geng, Jiayi Chen, Hao Dong, He Wang.
[page]
A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27, Open Review, 2022
Yann LeCun
[page]
Real2Sim2Real: Self-Supervised Learning of Physical Single-Step Dynamic Actions for Planar Robot Casting, ICRA, 2022
Lim, Vincent and Huang, Huang and Chen, Lawrence Yunliang and Wang, Jonathan and Ichnowski, Jeffrey and Seita, Daniel and Laskey, Michael and Goldberg, Ken
[page]
Continuous Jumping for Legged Robots on Stepping Stones via Trajectory Optimization and Model Predictive Control, IEEE CDC, 2022
Nguyen, Chuong and Bao, Lingfan and Nguyen, Quan
[page]
Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion, TPAMI, 2022.
Changxin Huang, Guangrun Wang, Zhibo Zhou, Ronghui Zhang, Liang Lin.
[page]
Transporter Networks: Rearranging the Visual World for Robotic Manipulation, CoRL, 2021
Zeng, Andy and Florence, Pete and Tompson, Jonathan and Welker, Stefan and Chien, Jonathan and Attarian, Maria and Armstrong, Travis and Krasin, Ivan and Duong, Dan and Sindhwani, Vikas and others
[page]
The MIT Humanoid Robot: Design, Motion Planning, and Control for Acrobatic Behaviors, IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids), 2021
Chignoli, Matthew and Kim, Donghyun and Stanger-Jones, Elijah and Kim, Sangbae
[page]
Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization, IROS, 2020
Kaspar, Manuel and Osorio, Juan D Mu{~n}oz and Bock, Jurgen
[page]
Learning Dexterous In-Hand Manipulation, The International Journal of Robotics Research, 2020
Andrychowicz, OpenAI: Marcin and Baker, Bowen and Chociej, Maciek and Jozefowicz, Rafal and McGrew, Bob and Pachocki, Jakub and Petron, Arthur and Plappert, Matthias and Powell, Glenn and Ray, Alex and others
[page]
DeepGait: Planning and Control of Quadrupedal Gaits using Deep Reinforcement Learning, IEEE Robotics and Automation Letters, 2020
Tsounis, Vassilios and Alge, Mitja and Lee, Joonho and Farshidian, Farbod and Hutter, Marco
[page]
Optimized Jumping on the MIT Cheetah 3 Robot, ICRA, 2019
Nguyen, Quan and Powell, Matthew J and Katz, Benjamin and Di Carlo, Jared and Kim, Sangbae
[page]
World Models, NIPS, 2018
Ha, David and Schmidhuber, Jurgen
[page]
MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018
Bledt, Gerardo and Powell, Matthew J and Katz, Benjamin and Di Carlo, Jared and Wensing, Patrick M and Kim, Sangbae
[page]
Sim-to-Real Reinforcement Learning for Deformable Object Manipulation, CoRL, 2018
Matas, Jan and James, Stephen and Davison, Andrew J
[page]
Dynamic Walking on Randomly-Varying Discrete Terrain With One-Step Preview, Robotics: Science and Systems, 2017
Nguyen, Quan and Agrawal, Ayush and Da, Xingye and Martin, William C and Geyer, Hartmut and Grizzle, Jessy W and Sreenath, Koushil
[page]
Deep Kernels for Optimizing Locomotion Controllers, CoRL, 2017
Antonova, Rika and Rai, Akshara and Atkeson, Christopher G
[page]
Preparing for the Unknown: Learning a Universal Policy with Online System Identification, RSS, 2017
Yu, Wenhao and Tan, Jie and Liu, C Karen and Turk, Greg
[page]
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World, IROS, 2017
Tobin, Josh and Fong, Rachel and Ray, Alex and Schneider, Jonas and Zaremba, Wojciech and Abbeel, Pieter
[page]
Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot, IEEE Robotics & Automation Magazine, 2016
Gehring, Christian and Coros, Stelian and Hutter, Marco and Bellicoso, Carmine Dario and Heijnen, Huub and Diethelm, Remo and Bloesch, Michael and Fankhauser, P{'e}ter and Hwangbo, Jemin and Hoepflinger, Mark and others
[page]
ANYmal - a highly mobile and dynamic quadrupedal robot, IEEE/RSJ international conference on intelligent robots and systems (IROS), 2016
Hutter, Marco and Gehring, Christian and Jud, Dominic and Lauber, Andreas and Bellicoso, C Dario and Tsounis, Vassilios and Hwangbo, Jemin and Bodie, Karen and Fankhauser, Peter and Bloesch, Michael and others
[page]
Optimization Based Full Body Control for the Atlas Robot, IEEE-RAS International Conference on Humanoid Robots, 2014
Feng, Siyuan and Whitman, Eric and Xinjilefu, X and Atkeson, Christopher G
[page]
A Compliant Hybrid Zero Dynamics Controller for Stable, Efficient and Fast Bipedal Walking on MABEL, The International Journal of Robotics Research, 2011
Sreenath, Koushil and Park, Hae-Won and Poulakakis, Ioannis and Grizzle, Jessy W
[page]
Dynamic walk of a biped, The International Journal of Robotics Research, 1984
Miura, Hirofumi and Shimoyama, Isao
[page]

Datasets 🔝

To be updated...

VisualAgentBench, 2023.link
Open X-Embodiment, 2023.link
RH20T-P, 2024.link
ALOHA 2, 2024.link
GRUtopia, 2024.link
ARIO (All Robots In One), 2024.link
Matterport3D, 2017. [link]

Embodied Perception

Vision

BEHAVIOR Vision Suite, 2024. [link]
SpatialQA, 2024.[link]
SpatialBench, 2024. [link]
Uni3DScenes, 2024. [link]
Active Recognition Dataset, 2023. [link]
Baxter_UR5_95_Objects_Dataset, 2023. [link]
Caltech-256, 2022. [link]
DIDI Dataset, 2020. [link]
Replica, 2019. [link]
ScanObjectNN, 2019. [link]
OCID Dataset, 2019. [link]
L3RScan, 2019. [link]
EmbodiedScan, 2019. [link]
UZH-FPV Dataset, 2019. [link]
LM Data, 2019. [link]
TUM Visual-Inertial Dataset, 2018. [link]
ScanNet, 2017. [link]
SUNCG, 2017. [link]
Semantic 3D, 2017. [link]
ScanNet v2, 2017. [link]
S3DIS, 2016. [link]
Synthia, 2016. [link]
ModelNet, 2015. [link]
ORBvoc, 2015. [link]
Sketch dataset, 2015. [link]
SUN RGBD, 2015. [link]
ShapeNet, 2015. [link]
MVS Dataset, 2014. [link]
SUOD, 2013. [link]
SUN360, 2012. [link]
NYU Depth Dataset V2, 2012. [link]
TUM-RGBD, 2012. [link]
EuRoC MAV Dataset, 2012. [link]
Semantic KITTI, 2012. [link]
KITTI Object Recognition, 2012. [link]
Stanford Track Collection, 2011. [link]

Tactile

Touch100k, 2024. [link]
ARIO (All Robots In One), 2024. [link]
TaRF, 2024. [link]
TVL, 2024. [link]
YCB-Slide, 2022. [link]
Touch and Go, 2022. [link]
SSVTP, 2022. [link]
ObjectFolder, 2021-2023. [link]
Decoding the BioTac, 2020. [link]
SynTouch, 2019. [link]
The Feeling of Success, 2017. [link]

Embodied Navigation

ALFRED, 2020. [link]
REVERIE, 2020. [link]
CVDN, 2019. [link]
Room to Room (R2R), 2017. [link]

Embodied Question Answering

SpatialQA, 2024. [link]
S-EQA, 2024. [link]
HM-EQA, 2024. [link]
K-EQA, 2023. [link]
SQA3D, 2023. [link]
VideoNavQA, 2019. [link]
MP3D-EQA, 2019. [link]
MT-EQA, 2019. [link]
IQUAD V1, 2018. [link]
EQA, 2018. [link]

Embodied Manipulation

OAKINK2, 2024. [link]

Other Useful Embodied Projects & Tools

Resources

Awesome-Embodied-Agent-with-LLMs
Awesome Embodied Vision
Awesome Touch

Simulate Platforms & Enviroments

Habitat-Lab
Habitat-Sim
GibsonEnv
LEGENT
MetaUrban
GRUtopia
GenH2R
Demonstrating HumanTHOR
BestMan

Projects

Manipulation

RoboMamba
MANIPULATE-ANYTHING
DexGraspNet
UniDexGrasp
UniDexGrasp++
OAKINK2

Embodied Interaction

EmbodiedQA

Embodied Perception

EmbodiedScan

Models & Tools

Octopus
Holodeck
AllenAct

Agents

LEO
Voyager

📰 Citation

If you think this survey is helpful, please feel free to leave a star ⭐️ and cite our paper:

@article{liu2024aligning,
  title={Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI},
  author={Liu, Yang and Chen, Weixing and Bai, Yongjie and Li, Guanbin and Gao, Wen and Lin, Liang},
  journal={arXiv preprint arXiv:2407.06886},
  year={2024}
}

👏 Acknowledgements

We sincerely thank Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Junyi Lin, Zhida Li, and Ganlong Zhao for their contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
EmbodiedAI.jpg		EmbodiedAI.jpg
EmbodiedAI_Review.pdf		EmbodiedAI_Review.pdf
README.md		README.md
Survey.png		Survey.png
teaser.png		teaser.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper list for Embodied AI

We appreciate any useful suggestions for improvement of this paper list or survey from peers. Please raise issues or send an email to [email protected] and [email protected]. Thanks for your cooperation! We also welcome your pull requests for this project!

🏠 About

💥 Update Log

📚 Table of Contents

Books & Surveys 🔝

Embodied Simulators 🔝

General Simulator

Real-Scene Based Simulators

Embodied Perception 🔝

Active Visual Exploration

3D Visual Perception and Grounding

Visual Language Navigation

Non-Visual Perception: Tactile

Embodied Interaction 🔝

Embodied Agent 🔝

Embodied Multimodal Foundation Models

Embodied Manipulation & Control

Sim-to-Real Adaptation 🔝

Datasets 🔝

Embodied Perception

Vision

Tactile

Embodied Navigation

Embodied Question Answering

Embodied Manipulation

Other Useful Embodied Projects & Tools

Resources

Simulate Platforms & Enviroments

Projects

📰 Citation

👏 Acknowledgements

About

Releases

Packages

Contributors 7

HCPLab-SYSU/Embodied_AI_Paper_List

Folders and files

Latest commit

History

Repository files navigation

Paper list for Embodied AI

We appreciate any useful suggestions for improvement of this paper list or survey from peers. Please raise issues or send an email to [email protected] and [email protected]. Thanks for your cooperation! We also welcome your pull requests for this project!

🏠 About

💥 Update Log

General Simulator

Real-Scene Based Simulators

Active Visual Exploration

3D Visual Perception and Grounding

Visual Language Navigation

Non-Visual Perception: Tactile

Embodied Multimodal Foundation Models

Embodied Manipulation & Control

Embodied Perception

Vision

Tactile

Embodied Navigation

Embodied Question Answering

Embodied Manipulation

Other Useful Embodied Projects & Tools

Resources

Simulate Platforms & Enviroments

Projects

📰 Citation

👏 Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks