Endo-4DGS: Distilling Depth Ranking for Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting
In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes. Nonetheless, these methods are hampered by slow inference, prolonged training, and substantial computational demands. Additionally, some rely on stereo depth estimation, which is often infeasible due to the high costs and logistical challenges associated with stereo cameras. Moreover, the monocular reconstruction quality for deformable scenes is currently inadequate. To overcome these obstacles, we present Endo-4DGS, an innovative, real-time endoscopic dynamic reconstruction approach that utilizes 4D Gaussian Splatting (GS) and requires no ground truth depth data. This method extends 3D GS by incorporating a temporal component and leverages a lightweight MLP to capture temporal Gaussian deformations. This effectively facilitates the reconstruction of dynamic surgical scenes with variable conditions. We also integrate Depth-Anything to generate pseudo-depth maps from monocular views, enhancing the depth-guided reconstruction process. Our approach has been validated on two surgical datasets, where it has proven to render in real-time, compute efficiently, and reconstruct with remarkable accuracy. These results underline the vast potential of Endo-4DGS to improve surgical assistance.
在机器人辅助的微创手术领域中,动态场景重建可以显著增强下游任务并改善手术结果。基于神经辐射场(NeRF)的方法因其卓越的场景重建能力而近来备受关注。然而,这些方法受到推理速度慢、训练时间长和计算需求大的限制。此外,一些方法依赖于立体深度估计,由于立体相机的高成本和物流挑战,这通常是不可行的。此外,目前单目重建对于可变形场景的质量还不够充分。为了克服这些障碍,我们提出了Endo-4DGS,这是一种创新的、实时的内窥镜动态重建方法,它利用4D高斯溅射(GS)并且不需要真实深度数据。该方法通过加入时间成分来扩展3D GS,并利用轻量级的多层感知器捕捉时间高斯变形。这有效地促进了在不同条件下动态外科场景的重建。我们还集成了Depth-Anything来从单目视图生成伪深度图,增强了深度引导的重建过程。我们的方法已在两个外科数据集上进行了验证,结果证明它能够实时渲染、高效计算,并以卓越的准确度进行重建。这些结果突显了Endo-4DGS在改善外科辅助方面的巨大潜力。