Stochastic Regularized Best-of-N Sampling (SRBoN)

This repository contains the implementation of Stochastic Regularized Best-of-N Sampling (SRBoN).

The code is tested using Python 3.8 and CUDA 11.0

Setup Instructions

Step 1: Environment Setup

Create a virtual environment, then install the required dependencies:

# Create virtual environment
python3 -m venv env
source env/bin/activate

# Install dependencies
pip install -r requirements.txt

Step 2: Sample Collection and Metric Computation

Collect Samples

To collect samples from the model, use the following command. You can specify the dataset, model, and number of samples:

bash ./experiments/sample.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]

Compute Metrics

You can compute various utility metrics such as log probability, Wasserstein distance, and token length using the following scripts:

# Compute log probability
bash ./experiments/compute_logprob.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]

# Compute Wasserstein distance
bash ./experiments/compute_wd.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]

# Compute token length
bash ./experiments/compute_length.sh -d [DATASETS] -m [MODEL] -s [NUMBER_OF_SAMPLES]

Compute Reward Values

To compute reward values for specific datasets, you can use the following command. Here, specify the dataset, number of samples, and the reward type:

bash ./experiments/compute_reward.sh -d [DATASETS] -s [NUMBER_OF_SAMPLES] -i [REWARD_TYPE]

Step 3: Running SRBoN

Finally, to compute the SRBoN values, run the following script:

python3 stochastic_rbon/stochastic_rbon.py --dataset [DATASETS] --ncandidates [NUMBER_OF_SAMPLES]

Examples

Below is an example of running the SRBoN pipeline using the alpaca dataset, the HuggingFaceH4/mistral-7b-sft-beta model, and 100 samples.

# Collect 100 samples from the model
bash ./experiments/sample.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

# Compute log probabilities for the collected samples
bash ./experiments/compute_logprob.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

# Compute Wasserstein distance for the samples
bash ./experiments/compute_wd.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

# Compute the token length of the samples
bash ./experiments/compute_length.sh -d alpaca -m HuggingFaceH4/mistral-7b-sft-beta -s 100

For computing reward values using different reward models:

# Compute reward values using the OpenAssistant reward model
bash ./experiments/compute_reward.sh -d alpaca -s 100 -i OpenAssistant/reward-model-deberta-v3-large-v2

# Compute reward values using the openbmb reward model
bash ./experiments/compute_reward.sh -d alpaca -s 100 -i openbmb/Eurus-RM-7b

Finally, running the SRBoN computation with 100 candidates:

python3 stochastic_rbon/stochastic_rbon.py --dataset alpaca --ncandidates 100

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
experiments		experiments
prompts		prompts
stochastic_rbon		stochastic_rbon
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
experiment.ipynb		experiment.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stochastic Regularized Best-of-N Sampling (SRBoN)

Setup Instructions

Step 1: Environment Setup

Step 2: Sample Collection and Metric Computation

Collect Samples

Compute Metrics

Compute Reward Values

Step 3: Running SRBoN

Examples

About

Releases

Packages

Languages

License

CyberAgentAILab/srbon

Folders and files

Latest commit

History

Repository files navigation

Stochastic Regularized Best-of-N Sampling (SRBoN)

Setup Instructions

Step 1: Environment Setup

Step 2: Sample Collection and Metric Computation

Collect Samples

Compute Metrics

Compute Reward Values

Step 3: Running SRBoN

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages