HumanEval evaluation results mismatch for Codegemma-2b #261

berserank · 2024-07-26T15:21:33Z

I am getting a score of 7.8 when I am trying to evaluate codegemma-2b on humaneval. The authors report a score of 31.1. Here is the command I used.

accelerate launch  main.py \
  --model google/codegemma-2b \
  --tasks humaneval \
  --temperature 0.2 \
  --n_samples 20 \
  --batch_size 4 \
  --allow_code_execution \
  --use_auth_token

Below is the output of the code.

{
  "humaneval": {
    "pass@1": 0.07804878048780488,
    "pass@10": 0.14644529185229932
  },
  "config": {
    "prefix": "",
    "do_sample": true,
    "temperature": 0.2,
    "top_k": 0,
    "top_p": 0.95,
    "n_samples": 20,
    "eos": "<|endoftext|>",
    "seed": 0,
    "model": "google/codegemma-2b",
    "modeltype": "causal",
    "peft_model": null,
    "revision": null,
    "use_auth_token": true,
    "trust_remote_code": false,
    "tasks": "humaneval",
    "instruction_tokens": null,
    "batch_size": 4,
    "max_length_generation": 512,
    "precision": "fp32",
    "load_in_8bit": false,
    "load_in_4bit": false,
    "left_padding": false,
    "limit": null,
    "limit_start": 0,
    "save_every_k_tasks": -1,
    "postprocess": true,
    "allow_code_execution": true,
    "generation_only": false,
    "load_generations_path": null,
    "load_data_path": null,
    "metric_output_path": "evaluation_results.json",
    "save_generations": false,
    "load_generations_intermediate_paths": null,
    "save_generations_path": "generations.json",
    "save_references": false,
    "save_references_path": "references.json",
    "prompt": "prompt",
    "max_memory_per_gpu": null,
    "check_references": false
  }
}

The text was updated successfully, but these errors were encountered:

shwinshaker · 2024-08-04T07:55:14Z

Same here.

I was trying to reproduce the leaderboard score here for CodeGemma-2b (~27). I followed the exact same procedure documented here but got a score much lower score (~7.5) than that reported on the leaderboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HumanEval evaluation results mismatch for Codegemma-2b #261

HumanEval evaluation results mismatch for Codegemma-2b #261

berserank commented Jul 26, 2024

shwinshaker commented Aug 4, 2024 •

edited

Loading

HumanEval evaluation results mismatch for Codegemma-2b #261

HumanEval evaluation results mismatch for Codegemma-2b #261

Comments

berserank commented Jul 26, 2024

shwinshaker commented Aug 4, 2024 • edited Loading

shwinshaker commented Aug 4, 2024 •

edited

Loading