Recently, the field of text-guided 3D scene generation has garnered significant attention. High-quality generation that aligns with physical realism and high controllability is crucial for practical 3D scene applications. However, existing methods face fundamental limitations: (i) difficulty capturing complex relationships between multiple objects described in the text, (ii) inability to generate physically plausible scene layouts, and (iii) lack of controllability and extensibility in compositional scenes. In this paper, we introduce LayoutDreamer, a framework that leverages 3D Gaussian Splatting (3DGS) to facilitate high-quality, physically consistent compositional scene generation guided by text. Specifically, given a text prompt, we convert it into a directed scene graph and adaptively adjust the density and layout of the initial compositional 3D Gaussians. Subsequently, dynamic camera adjustments are made based on the training focal point to ensure entity-level generation quality. Finally, by extracting directed dependencies from the scene graph, we tailor physical and layout energy to ensure both realism and flexibility. Comprehensive experiments demonstrate that LayoutDreamer outperforms other compositional scene generation quality and semantic alignment methods. Specifically, it achieves state-of-the-art (SOTA) performance in the multiple objects generation metric of T3Bench.
最近,文本引导的3D场景生成领域引起了广泛关注。高质量的生成不仅要与物理现实对齐,还要具备高度可控性,这对于实际的3D场景应用至关重要。然而,现有方法面临着根本性的限制:(i)难以捕捉文本中描述的多个物体之间的复杂关系;(ii)无法生成物理上合理的场景布局;(iii)在组合场景中缺乏可控性和可扩展性。本文介绍了LayoutDreamer,一个利用3D高斯溅射(3DGS)促进高质量、物理一致的组合场景生成框架,该框架由文本引导。具体而言,给定一个文本提示,我们将其转换为一个定向场景图,并自适应调整初始组合3D高斯的密度和布局。随后,基于训练焦点进行动态相机调整,以确保实体级别的生成质量。最后,通过从场景图中提取定向依赖关系,我们量身定制物理和布局能量,以确保现实性和灵活性。全面的实验表明,LayoutDreamer在组合场景生成质量和语义对齐方面超越了其他方法。具体而言,在T3Bench的多物体生成指标中,LayoutDreamer达到了最先进的(SOTA)性能。