Skip to content

Commit

Permalink
add human evaluation results
Browse files Browse the repository at this point in the history
  • Loading branch information
Baolin Peng committed Apr 7, 2023
1 parent cec5f70 commit ced9ed0
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,16 @@ This JSON file has the same format as Alpaca data, except the output is generate

[`unnatural_instruction_gpt4_data.json`](./data/unnatural_instruction_gpt4_data.json) contains 9K instruction-following data generated by GPT-4 with prompts in Unnatural Instruction. This JSON file has the same format as Alpaca data.

## How Good is the Data?

Human evaluation was performed on model generation results using Amazon Mechanical Turk following Helpfulness, Honestness and Harmlessness criteria by [Anthropic AI](https://arxiv.org/abs/2112.00861). The results are summarized as follows:
- Two instruction-tuned LLaMA models were compared, fine-tuned on data generated by GPT-4 and GPT-3 respectively.
- LLaMA-GPT-4 performs substantially better than LLaMA-GPT-3 in the "Helpfulness" criterion.
- LLaMA-GPT-4 performs similarly to the original GPT-4 in all three criteria, suggesting a promising direction for developing state-of-the-art instruction-following LLMs.

![LLaMA-GPT4 vs Alpaca (i.e., LLaMA-GPT3)](static/pie_llama_gpt3_vs_llam_gpt4.png )
![LLaMA-GPT4 vs GPT-4](static/pie_llama_gpt4_vs_gpt4.png )

## Fine-tuning with the data
We follow the same reciple to fine-tune LLaMA as Alpaca using standard Hugging Face training code.

Expand Down
Binary file added static/pie_llama_gpt3_vs_llam_gpt4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/pie_llama_gpt4_vs_gpt4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit ced9ed0

Please sign in to comment.