evaluation

ncoop57

Oct 9, 2021

d2ba68d · Oct 9, 2021

Name	Name	Last commit message	Last commit date
parent directory ..
apps_utils	apps_utils	Fix up APPs eval script and add some metrics for different models	Jul 19, 2021
evaluation	evaluation	Reorg evaluation and add initial code to apps.py dataloader to fix in…	Jul 17, 2021
metrics	metrics	More reorganization	Jul 15, 2021
model_results	model_results	Add new eval results and update eval script to just be for human eval	Jul 25, 2021
README.md	README.md	Add example for evaluation	Oct 9, 2021
apps_eval_util.py	apps_eval_util.py	Merge with main	Jul 16, 2021
data_processing.ipynb	data_processing.ipynb	More reorganization	Jul 15, 2021
eval_apps.py	eval_apps.py	Merge with main	Jul 16, 2021
evaluate.py	evaluate.py	Remove prompt from completion	Aug 22, 2021
get-pip.py	get-pip.py	update eval script to clean up text like open ai paper and update wit…	Jul 20, 2021
gh-data-exploration.ipynb	gh-data-exploration.ipynb	SOme initial clean up	Jul 15, 2021
requirements.txt	requirements.txt	Add steps and requirements for running evaluation script	Oct 9, 2021

README.md

The following steps are required to run the Human Eval step:

conda create -n human-eval python=3.7

pip install -r requirements.txt

With the following requirements performed you can now run the evaluation.py script:

python evaluate.py --model_name_or_path=model_name_or_path --human_eval_path=<path/to/human-eval/data/HumanEval.jsonl.gz> --out_path=./model_results

So for example if you want to evaluate the EleutherAI GPT Neo 125M

python evaluate.py EleutherAI/gpt-neo-125M ../dependency_repos/human-eval/data/HumanEval.jsonl.gz model_results/