Skip to content

Reward Evolution with Large Language Models using Human Feedback

Notifications You must be signed in to change notification settings

RishiHazra/Revolve

Repository files navigation

REvolve: Reward Evolution with Large Language Models using Human Feedback


Official code release of our ICLR 2025 paper.

Documentation

egoTV

Setup

# clone the repository 
git clone https://github.com/RishiHazra/Revolve.git
cd Revolve
conda create -n "revolve" python=3.10
conda activate revolve
pip install -e .

Run

export ROOT_PATH='Revolve'
export OPENAI_API_KEY='<your openai key>'
python main.py \ 
        evolution.num_generations=5 \  # number of generations
        evolution.individuals_per_generation=15 \  # number of individuals in each generation
        database.num_islands=5 \  # number of groups/populations to start with
        database.max_island_size=8 \  # max number of samples in each group/population
        data_paths.run=10 \  # run_id
        environment.name="HumanoidEnv"  # Choose between "HumanoidEnv" or "AdroitHandDoorEnv"

Note, we will soon release the AirSim environment setup script.

For AirSim, follow the instruction on this link https://microsoft.github.io/AirSim/build_linux/

export AIRSIM_PATH='AirSim'
export AIRSIMNH_PATH='AirSimNH/AirSimNH/LinuxNoEditor/AirSimNH.sh'

Other Utilities

  • The prompts are listed in prompts folder.
  • Elo scoring in human_feedback folder

Citation

To cite our paper:

@misc{hazra2024revolverewardevolutionlarge,
      title={REvolve: Reward Evolution with Large Language Models using Human Feedback}, 
      author={Rishi Hazra and Alkis Sygkounas and Andreas Persson and Amy Loutfi and Pedro Zuidberg Dos Martires},
      year={2024},
      eprint={2406.01309},
      archivePrefix={arXiv},
      primaryClass={cs.NE},
      url={https://arxiv.org/abs/2406.01309}, 
}

About

Reward Evolution with Large Language Models using Human Feedback

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages