This repository reproduces the results for the paper Ranking Manipulation for Conversational Search Engines, by Samuel Pfrommer, Yatong Bai, Tanmay Gautam, and Somayeh Sojoudi.
The contents include the code implementation of the proposed ranking manipulation algorithm, the pickled output (out
), raw-text output (out_text
), dataset (dataset
), and plots (plots
). Unzip the respective files to recover the directories.
We additionally propose RAGDOLL, a real-world e-commerce website dataset used for evaluation, which is available on Huggingface.
The dataset collection pipeline is open-sourced at this companion GitHub repo.
- 09/2024: 🎉 Our paper has been accepted to The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Major search engine providers are rapidly incorporating Large Language Model (LLM)-generated content in response to user queries. These conversational search engines operate by loading retrieved website text into the LLM context for summarization and interpretation.
Recent research demonstrates that LLMs are highly vulnerable to jailbreaking and prompt injection attacks, which disrupt the safety and quality goals of LLMs using adversarial strings. This work investigates the impact of prompt injections on the ranking order of sources referenced by conversational search engines.
To this end, we introduce a focused dataset of real-world consumer product websites and formalize conversational search ranking as an adversarial problem. Experimentally, we analyze conversational search rankings in the absence of adversarial injections and show that different LLMs vary significantly in prioritizing product name, document content, and context position.
We then present a tree-of-attacks-based jailbreaking technique which reliably promotes low-ranked products. Importantly, these attacks transfer effectively to closed-source, online-enabled RAG implementations such as the Sonar Large Online model by perplexity.ai.
Please find a more detailed description of our dataset in the associated dataset card.
- Clone this repository and unzip relevant
.zip
files. Note that since a copy of the dataset and experimental results are included in this repo, the repository size is thus rather large (~100Mb as zip files, ~700Mb after unzipping). - Install dependencies (inside a virtualenv)
pip install -e .
- Configure any required API keys
OPENAI_API_KEY='...'
TOGETHER_API_KEY='...'
PERPLEXITY_API_KEY='...'
- (Optional) set up web server for
perplexity.ai
attack (commented out inrun.sh
). This involves purchasing a domain, setting up https using certbot, selecting a password forapp.py
andapp_interface.py
, and runningrun_server.sh
inhelpers
on the web server. - Reproduce results (if attacking
perplexity.ai
, uncomment the relevant lines):
bash scripts/run.sh
This repository is based on the minimal implementation of the "Tree of Attacks (TAP): Jailbreaking Black-Box LLMs Automatically" Research by Robust Intelligence.
Using AI to Automatically Jailbreak GPT-4 and Other LLMs in Under a Minute
@article{pfrommer2024ranking,
title={Ranking Manipulation for Conversational Search Engines},
author={Pfrommer, Samuel and Bai, Yatong and Gautam, Tanmay and Sojoudi, Somayeh},
journal={arXiv preprint arXiv:2406.03589},
year={2024}
}