Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.42 KB

2501.14147.md

File metadata and controls

5 lines (3 loc) · 2.42 KB

HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting

3D Gaussian Splatting offers expressive scene reconstruction, modeling a broad range of visual, geometric, and semantic information. However, efficient real-time map reconstruction with data streamed from multiple robots and devices remains a challenge. To that end, we propose HAMMER, a server-based collaborative Gaussian Splatting method that leverages widely available ROS communication infrastructure to generate 3D, metric-semantic maps from asynchronous robot data-streams with no prior knowledge of initial robot positions and varying on-device pose estimators. HAMMER consists of (i) a frame alignment module that transforms local SLAM poses and image data into a global frame and requires no prior relative pose knowledge, and (ii) an online module for training semantic 3DGS maps from streaming data. HAMMER handles mixed perception modes, adjusts automatically for variations in image pre-processing among different devices, and distills CLIP semantic codes into the 3D scene for open-vocabulary language queries. In our real-world experiments, HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is useful for downstream tasks, such as semantic goal-conditioned navigation (e.g., "go to the couch").

3D高斯点云(3DGS)提供了表达丰富的场景重建,能够建模广泛的视觉、几何和语义信息。然而,利用来自多个机器人和设备的数据流进行高效的实时地图重建仍然是一个挑战。为此,我们提出了HAMMER,一种基于服务器的协作高斯点云方法,利用广泛可用的ROS通信基础设施,从异步机器人数据流中生成3D度量语义地图,而无需先验的机器人初始位置知识以及设备上变化的姿态估计器。HAMMER由以下两个模块组成:(i)一个帧对齐模块,将局部SLAM姿态和图像数据转换为全局框架,且不需要先验的相对姿态知识;(ii)一个在线模块,用于从流式数据中训练语义3D高斯点云地图。HAMMER能够处理混合感知模式,自动调整不同设备之间图像预处理的差异,并将CLIP语义编码注入到3D场景中,以便进行开放词汇的语言查询。在我们的现实世界实验中,HAMMER相比于竞争基线,生成了更高保真度的地图(提高2倍),并且对于下游任务(如语义目标条件导航,举例来说:“去沙发”)非常有用。