Skip to content

Code and data for paper "Can Watermarked LLMs be Identified by Users via Crafted Prompts?" Accepted by ICLR 2025 (Spotlight)

License

Notifications You must be signed in to change notification settings

THU-BPM/Watermarked_LLM_Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Can Watermarked LLMs Be Identified by Users via Crafted Prompts?

Contents

Environment

  • python 3.12
  • pytorch
  • pip install -r requirements.txt

Repo Structure

├── config/                     # Configuration files
│   └── sampler_config.json     # Configuration file for sampling
|
├── data/e
│   ├── prompts/                # prompts for generation
│   │   ├── ngram-p1.txt        # prompt for ngram-based watermarking
│   │   ├── ngram-p2.txt        
│   │   ├── fixkey-p1.txt       # prompt for fixkey-based watermarking(Deprecated)
│   │   ├── fixkey-p2.txt
│   │   ├── 5gram-p1.txt        # prompt for 5gram-prefix watermarking
│   │   └── 5gram-p2.txt
│   │   ├── api-p1.txt          # prompt for api test
│   │   └── api-p2.txt
│   │
│   ├── logits/                 # generated logits
│   │   └── {condition}-{p1 / p2}-logits-{model_name}.pickle
│   │
│   └── results/                # Results for sampling and experiments
│       ├── prob1/              # Sampling results for Water-Prob-V1
│       │   ├── {Watermarking_Algorithm}-prob1-p1-*.json
│       │   ├── {Watermarking_Algorithm}-prob1-p2-*.json
│       │   └── ...
│       ├── prob2/              # Sampling results for Water-Prob-V2
│       │   ├── {Watermarking_Algorithm}-prob2-p1-*.json
│       │   ├── {Watermarking_Algorithm}-prob2-p2-*.json
│       │   └── ...
│       └── csv/                # Experiment results
│           └── {Watermarking_Algorithm}-{WaterProb Method}-results/
│               └── ...
│
├── scripts/
│   ├── api_test/               # API test related scripts
│       └── api_test_prob2.py   # API test for Water-Prob-V2(For OpenAI & Gemini)
│   ├── samplers/               # Sampling related implementations
│   ├── utils/                  # Utility functions
│       ├── prompt_manager.py   # Prompt manager
│       └── generation_utils.py # Generation utils
│   ├── gen_logits.py          # Generate logits for each model
│   ├── sampling.py            # Main sampling script
│   ├── experiment.py          # Main experiment script
│   ├── Unigram_experiments_prob.py  # Unigram-specific experiments
│   ├── generate_logits.sh     # Shell script for generating logits
│   ├── sampling_pipeline.sh   # Shell script for running all sampling tasks
│   └── experiment_pipeline.sh # Shell script for running all experiments
│
├── README.md                   # Project description file
└── requirements.txt            # Python dependencies

Running the Pipeline

Step 1: Generate Logits

First, generate the logits for each model to simulate real-world generation and accelerate sampling.

cd scripts
bash generate_logits.sh

Details:

  • Input prompts are located in data/prompts/
  • Generated logits will be saved as .pickle files in data/logits/
  • This step pre-computes logits for all watermarking algorithms

Step 2: Sampling

Run sampling for all watermarking algorithms and unwatermarked conditions:

cd scripts
bash sampling_pipeline.sh

Important:

  • Set alpha = 0.5 in config/sampler_config.json for unbiased watermarking sampling
  • Results will be saved in:
    • Water-Prob-V1: data/results/prob1/
    • Water-Prob-V2: data/results/prob2/

Step 3: Run Experiments

Execute experiments across all conditions:

cd scripts
bash experiment_pipeline.sh

Output:

  • Results are saved as CSV files in data/results/csv/[algorithm]/

Reference

If you find this repository useful, please cite our paper:

@article{liu2024can,
  title={Can Watermarked LLMs be Identified by Users via Crafted Prompts?},
  author={Liu, Aiwei and Guan, Sheng and Liu, Yiming and Pan, Leyi and Zhang, Yifei and Fang, Liancheng and Wen, Lijie and Yu, Philip S and Hu, Xuming},
  journal={arXiv preprint arXiv:2410.03168},
  year={2024}
}

About

Code and data for paper "Can Watermarked LLMs be Identified by Users via Crafted Prompts?" Accepted by ICLR 2025 (Spotlight)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published