State-of-the-art novel view synthesis methods achieve impressive results for multi-view captures of static 3D scenes. However, the reconstructed scenes still lack "liveliness," a key component for creating engaging 3D experiences. Recently, novel video diffusion models generate realistic videos with complex motion and enable animations of 2D images, however they cannot naively be used to animate 3D scenes as they lack multi-view consistency. To breathe life into the static world, we propose Gaussians2Life, a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is to leverage powerful video diffusion models as the generative component of our model and to combine these with a robust technique to lift 2D videos into meaningful 3D motion. We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes and further enables the animation of a large variety of object classes, while related work is mostly focused on prior-based character animation, or single 3D objects. Our model enables the creation of consistent, immersive 3D experiences for arbitrary scenes.
当前最先进的新视角合成方法在静态 3D 场景的多视图捕获方面取得了令人印象深刻的成果。然而,重建的场景仍然缺乏“生动性”,而这恰恰是创造引人入胜的 3D 体验的关键要素。最近,新的视频扩散模型可以生成具有复杂运动的逼真视频,并使 2D 图像的动画化成为可能,但由于缺乏多视图一致性,它们无法直接用于动画化 3D 场景。 为了为静态世界注入生机,我们提出了 Gaussians2Life,一种基于高质量 3D 场景的高斯点绘制(Gaussian Splatting)表示实现局部动画化的方法。我们的方法核心在于利用强大的视频扩散模型作为生成组件,并结合一种稳健的技术,将 2D 视频提升为有意义的 3D 动态。与现有方法相比,我们发现这种方法能够对复杂的预存 3D 场景进行真实动画化,并支持多种对象类别的动画化,而相关工作主要集中于基于先验的角色动画或单一 3D 对象。 我们的模型能够为任意场景创建一致且沉浸式的 3D 体验,从而为 3D 动画化和多视图一致性提供了新的可能性。