Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HumanEval evaluation results mismatch for Codegemma-2b #261

Open
berserank opened this issue Jul 26, 2024 · 1 comment
Open

HumanEval evaluation results mismatch for Codegemma-2b #261

berserank opened this issue Jul 26, 2024 · 1 comment

Comments

@berserank
Copy link

I am getting a score of 7.8 when I am trying to evaluate codegemma-2b on humaneval. The authors report a score of 31.1. Here is the command I used.

accelerate launch  main.py \
  --model google/codegemma-2b \
  --tasks humaneval \
  --temperature 0.2 \
  --n_samples 20 \
  --batch_size 4 \
  --allow_code_execution \
  --use_auth_token

Below is the output of the code.

{
  "humaneval": {
    "pass@1": 0.07804878048780488,
    "pass@10": 0.14644529185229932
  },
  "config": {
    "prefix": "",
    "do_sample": true,
    "temperature": 0.2,
    "top_k": 0,
    "top_p": 0.95,
    "n_samples": 20,
    "eos": "<|endoftext|>",
    "seed": 0,
    "model": "google/codegemma-2b",
    "modeltype": "causal",
    "peft_model": null,
    "revision": null,
    "use_auth_token": true,
    "trust_remote_code": false,
    "tasks": "humaneval",
    "instruction_tokens": null,
    "batch_size": 4,
    "max_length_generation": 512,
    "precision": "fp32",
    "load_in_8bit": false,
    "load_in_4bit": false,
    "left_padding": false,
    "limit": null,
    "limit_start": 0,
    "save_every_k_tasks": -1,
    "postprocess": true,
    "allow_code_execution": true,
    "generation_only": false,
    "load_generations_path": null,
    "load_data_path": null,
    "metric_output_path": "evaluation_results.json",
    "save_generations": false,
    "load_generations_intermediate_paths": null,
    "save_generations_path": "generations.json",
    "save_references": false,
    "save_references_path": "references.json",
    "prompt": "prompt",
    "max_memory_per_gpu": null,
    "check_references": false
  }
}
@shwinshaker
Copy link

shwinshaker commented Aug 4, 2024

Same here.

I was trying to reproduce the leaderboard score here for CodeGemma-2b (~27). I followed the exact same procedure documented here but got a score much lower score (~7.5) than that reported on the leaderboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants