EnvDistraction

Paper: Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions [PDF].

Sept 2024

This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by environmental context. A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content. A wide range of MLLMs are evaluated as GUI agents using our simulated dataset, following three working patterns with different levels of perception. Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions. While recent studies predominantly focus on the helpfulness (i.e., action accuracy) of multimodal agents, our findings indicate that these agents are prone to environmental distractions, resulting in unfaithful behaviors. Furthermore, we switch to the adversarial perspective and implement environment injection, demonstrating that such unfaithfulness can be exploited, leading to unexpected risks.

Acknowledgement

Many thanks to phone_website, restaurant_website, Serper, amazon-reviews.

April 2024

Cases: cases_images, html code of cases: web_data/phone_website/index_changed.html
Baseline for annotation: annotation.py
1. Output directory for annotated samples: web_data/output_data
2. Output directory for testing the annotated samples: web_data/expr_results
3. HTML template examples (used for rewriting)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
figures		figures
utils		utils
.DS_Store		.DS_Store
README.md		README.md
agent_prompts.py		agent_prompts.py
amazon_products.py		amazon_products.py
annotation.py		annotation.py
api_setting.py		api_setting.py
attack_both.py		attack_both.py
autorepyly.py		autorepyly.py
call_agents_0711cot.py		call_agents_0711cot.py
call_agents_api.py		call_agents_api.py
evaluation.py		evaluation.py
format_tokens.py		format_tokens.py
google_api.py		google_api.py
prompts.py		prompts.py
retrieval.py		retrieval.py
targets.py		targets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnvDistraction

Paper: Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions [PDF].

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

xbmxb/EnvDistraction

Folders and files

Latest commit

History

Repository files navigation

EnvDistraction

Paper: Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions [PDF].

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages