3D Gaussian Splatting has advanced radiance field reconstruction, enabling high-quality view synthesis and fast rendering in 3D modeling. While adversarial attacks on object detection models are well-studied for 2D images, their impact on 3D models remains underexplored. This work introduces the Masked Iterative Fast Gradient Sign Method (M-IFGSM), designed to generate adversarial noise targeting the CLIP vision-language model. M-IFGSM specifically alters the object of interest by focusing perturbations on masked regions, degrading the performance of CLIP's zero-shot object detection capability when applied to 3D models. Using eight objects from the Common Objects 3D (CO3D) dataset, we demonstrate that our method effectively reduces the accuracy and confidence of the model, with adversarial noise being nearly imperceptible to human observers. The top-1 accuracy in original model renders drops from 95.4% to 12.5% for train images and from 91.2% to 35.4% for test images, with confidence levels reflecting this shift from true classification to misclassification, underscoring the risks of adversarial attacks on 3D models in applications such as autonomous driving, robotics, and surveillance. The significance of this research lies in its potential to expose vulnerabilities in modern 3D vision models, including radiance fields, prompting the development of more robust defenses and security measures in critical real-world applications.
三维高斯散点(3D Gaussian Splatting)在辐射场重建方面取得了显著进展,实现了高质量的视图合成和快速渲染,广泛应用于三维建模领域。然而,尽管对二维图像目标检测模型的对抗攻击已被深入研究,其对三维模型的影响仍未被充分探索。 本文提出了掩膜迭代快速梯度符号方法(Masked Iterative Fast Gradient Sign Method, M-IFGSM),专为生成针对CLIP视觉-语言模型的对抗性噪声而设计。M-IFGSM通过将扰动聚焦于目标对象的掩膜区域,有效削弱了CLIP在三维模型上的零样本目标检测能力。我们使用了来自**Common Objects 3D (CO3D)**数据集的八个对象,实验表明该方法能够显著降低模型的准确性和置信度,同时对抗性噪声对人类观察者几乎不可见。 在实验中,原始模型渲染的top-1准确率从训练图像的95.4%降至12.5%,测试图像从91.2%降至35.4%,置信水平从正确分类显著偏向错误分类。这些结果突显了对三维模型(包括辐射场模型)实施对抗性攻击的潜在风险,对自动驾驶、机器人技术和监控等应用具有重要影响。 本研究的意义在于揭示了现代三维视觉模型的潜在脆弱性,强调了在关键真实世界应用中开发更稳健防御措施和安全机制的必要性。