Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.4 KB

2411.04924.md

File metadata and controls

5 lines (3 loc) · 2.4 KB

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

We introduce MVSplat360, a feed-forward approach for 360° novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging for conventional methods to achieve high-quality results. Our MVSplat360 addresses this by effectively combining geometry-aware 3D reconstruction with temporally consistent video generation. Specifically, it refactors a feed-forward 3D Gaussian Splatting (3DGS) model to render features directly into the latent space of a pre-trained Stable Video Diffusion (SVD) model, where these features then act as pose and visual cues to guide the denoising process and produce photorealistic 3D-consistent views. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views. To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360° NVS tasks. Experiments on the existing benchmark RealEstate10K also confirm the effectiveness of our model.

我们提出了 MVSplat360,一种用于真实世界场景 360° 新视图合成(NVS)的前馈式方法,仅需稀疏观测输入。在这一设置下,由于输入视图之间的重叠极少以及提供的视觉信息不足,使得问题本质上具有不适定性,传统方法难以生成高质量结果。MVSplat360 通过有效结合几何感知的 3D 重建和时间一致的视频生成,克服了这一挑战。具体来说,我们对前馈式 3D Gaussian Splatting (3DGS) 模型进行重构,将特征直接渲染到预训练的 Stable Video Diffusion (SVD) 模型的潜在空间中。这些特征作为位姿和视觉提示,引导去噪过程,从而生成具有真实感且 3D 一致的视图。我们的模型支持端到端训练,能够以少至 5 个稀疏输入视图渲染任意视图。为评估 MVSplat360 的性能,我们在具有挑战性的 DL3DV-10K 数据集上引入了一个新的基准测试,MVSplat360 在广角甚至 360° NVS 任务中相比最先进的方法实现了更高的视觉质量。此外,在现有的 RealEstate10K 基准上进行的实验同样验证了我们模型的有效性。