-
Prepare your model and the baseline model.
- your model: Huggingface and Megatron-LM format models are supported, other models will be supported in future releases
- baseline model: Huggingface, Megatron-LM or OpenAI model
Evaluating Megatron-LM models requires a customized Megatron-LM which is provided in
thirdparty
. -
Generate answers using
answer_generator.py
for both your model and the baseline model.-
Prepare the benchmark dataset. The toolkit has provided Vicuna Bench(
config/question.jsonl
), and you can create custom dataset to generate answers. The custom datasets must be a single file in jsonl format, and each json object in it contains 3 attributes:- question_id: int type
- text: the specific content of the question, string type
- category: the type of the question, string type
-
Build the config file (
config.yaml
). The format of the file is as follows:answer_generation: model_name: <str> question_file: <str> # path of the benchmark dataset file answer_file: <str> # path of the answer file generated by the model batch_size: <int> # batch size when generating answers max_tokens: <int> # maximum token size for each generated answer temperature: <float> # Choose one of the following configurations according to your model type # Config for huggingface huggingface: model_path: <str> # path of your model tokenizer_path: <str> # path of your tokenizer # Config for megatron-lm megatron: megatron_home: <str> # root dir of Megatron-LM code process_num: <int> # number of processes to run megatron checkpoint_path: <str> # megatron checkpoint dir path tokenizer_type: <str> # only support 'gpt2' and 'sentencepiece' for now vocab_path: <str> # path to the vocab file for gpt2 tokenizer merge_path: <str> # path to the merge file for gpt2 tokenizer tokenizer_path: <str> # path to the tokenizer model for sentencepiece tokenizer iteration: <int> # iteration of the checkpoint to load # Config for openai openai: openai_organization: <str> openai_api_key: <str> model: <str> # the type of model,e.g., gpt-3.5-turbo max_retry: <int> # the maxium number of retries when api access fails
-
Run the script.
python answer_generator.py --config <path to config.yaml>
-
-
Get OpenAI API evaluation results via
gpt_evaluator.py
.- Prepare dependencies. Make sure the following files are ready:
- question_file: the benchmark dataset file in previous step
- answer_file: the answer file of your model in previous step
- baseline_file: the answer file of the baseline model in previous step
- prompt_file: a file contains multiple prompt templates, the toolkit has provided a sample file (
config/prompt.json
) - reviewer_file: a file contains multiple reviewer templates (including the model type and other parameters used in the OpenAI api request),the toolkit has provided a sample file (
config/reviewer.json
)
- Build the config file (
config.yaml
). The format of the file is as follows:gpt_evaluation: openai_organization: <str> openai_api_key: <str> question_file: <str> answer_file: <str> baseline_file: <str> prompt_file: <str> reviewer_file: <str> result_file: <str> # path of the evaulation result
- Run the script.
python gpt_evaluator.py --config <path to config.yaml>
- Prepare dependencies. Make sure the following files are ready: