Skip to content

Latest commit

 

History

History
74 lines (69 loc) · 4.01 KB

README.md

File metadata and controls

74 lines (69 loc) · 4.01 KB

GPT EVAL: Evaluate your model with OpenAI API

Quick Start

  1. Prepare your model and the baseline model.

    • your model: Huggingface and Megatron-LM format models are supported, other models will be supported in future releases
    • baseline model: Huggingface, Megatron-LM or OpenAI model

    Evaluating Megatron-LM models requires a customized Megatron-LM which is provided in thirdparty.

  2. Generate answers using answer_generator.py for both your model and the baseline model.

    1. Prepare the benchmark dataset. The toolkit has provided Vicuna Bench(config/question.jsonl), and you can create custom dataset to generate answers. The custom datasets must be a single file in jsonl format, and each json object in it contains 3 attributes:

      • question_id: int type
      • text: the specific content of the question, string type
      • category: the type of the question, string type
    2. Build the config file (config.yaml). The format of the file is as follows:

      answer_generation:
        model_name: <str>
        question_file: <str>  # path of the benchmark dataset file
        answer_file: <str>    # path of the answer file generated by the model
        batch_size: <int>     # batch size when generating answers
        max_tokens: <int>     # maximum token size for each generated answer
        temperature: <float>
        # Choose one of the following configurations according to your model type
        # Config for huggingface
        huggingface:
          model_path: <str> # path of your model
          tokenizer_path: <str> # path of your tokenizer
        # Config for megatron-lm
        megatron:
          megatron_home: <str>    # root dir of Megatron-LM code
          process_num: <int>      # number of processes to run megatron
          checkpoint_path: <str>  # megatron checkpoint dir path
          tokenizer_type: <str>   # only support 'gpt2' and 'sentencepiece' for now
          vocab_path: <str>       # path to the vocab file for gpt2 tokenizer
          merge_path: <str>       # path to the merge file for gpt2 tokenizer
          tokenizer_path: <str>   # path to the tokenizer model for sentencepiece tokenizer
          iteration: <int>        # iteration of the checkpoint to load
        # Config for openai
        openai:
          openai_organization: <str>
          openai_api_key: <str>
          model: <str> # the type of model,e.g., gpt-3.5-turbo
          max_retry: <int> # the maxium number of retries when api access fails
    3. Run the script.

      python answer_generator.py --config <path to config.yaml>
  3. Get OpenAI API evaluation results via gpt_evaluator.py.

    1. Prepare dependencies. Make sure the following files are ready:
      • question_file: the benchmark dataset file in previous step
      • answer_file: the answer file of your model in previous step
      • baseline_file: the answer file of the baseline model in previous step
      • prompt_file: a file contains multiple prompt templates, the toolkit has provided a sample file (config/prompt.json)
      • reviewer_file: a file contains multiple reviewer templates (including the model type and other parameters used in the OpenAI api request),the toolkit has provided a sample file (config/reviewer.json)
    2. Build the config file (config.yaml). The format of the file is as follows:
      gpt_evaluation:
        openai_organization: <str>
        openai_api_key: <str>
        question_file: <str>
        answer_file: <str>
        baseline_file: <str>
        prompt_file: <str>
        reviewer_file: <str>
        result_file: <str>    # path of the evaulation result
    3. Run the script.
      python gpt_evaluator.py --config <path to config.yaml>