Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 1.96 KB

2411.10947.md

File metadata and controls

7 lines (5 loc) · 1.96 KB

Direct and Explicit 3D Generation from a Single Image

Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time.

现有的图像到三维方法面临高计算成本的问题,且在高分辨率输出中缺乏可扩展性。对此,我们提出了一种新颖的框架,通过多视角二维深度图和 RGB 图像以及三维高斯特征,结合改造的 Stable Diffusion 模型,直接生成显式表面几何和纹理。 我们在 U-Net 中引入了深度分支,实现高效且高质量的多视角跨域生成,同时在像素解码器中融入了极线注意力机制,确保像素级的多视角一致性。通过将生成的深度像素反投影到三维空间,我们构建了一种结构化的三维表示,该表示既可以通过高斯点绘制进行渲染,也可以提取为高质量网格。此外,我们利用额外的新视图合成损失,进一步提升了性能。 大量实验表明,我们的方法在几何和纹理质量方面均优于现有基线,同时显著减少了生成时间,实现了更高的效率和表现力。