We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes. GaussianCut achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.
我们提出了 GaussianCut,一种针对以 3D 高斯为表示的场景的交互式多视图分割新方法。该方法支持通过单视图交互选择目标对象,接受直观的用户输入形式,例如点选、粗略涂抹或文本描述。利用 3D Gaussian Splatting (3DGS) 作为底层场景表示,简化了感兴趣对象的提取,这些对象被视为场景高斯基元的子集。 我们的核心思想是将场景表示为图,并通过图切割算法最小化能量函数,从而有效地将高斯基元划分为前景和背景。为此,我们基于场景中的高斯基元构建图,并设计了一个分割对齐的能量函数,将用户输入与场景属性相结合。为获得初步的粗分割结果,我们利用 2D 图像/视频分割模型,并通过我们的图构建方法进一步优化这些粗分割结果。 实验证明,GaussianCut 在各种场景中具有很强的适应性。在无需额外针对分割任务训练的情况下,GaussianCut 实现了与当前最先进的 3D 分割方法相媲美的性能。