Skip to content

Artifacts for "SemCoder: Training Code Language Models with Comprehensive Semantics"

License

Notifications You must be signed in to change notification settings

ARiSE-Lab/SemCoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🗣️ SemCoder: Training Code Language Models with Comprehensive Semantics

🤖 Models | 🛠️ Get Started | 🕹️ Demo | 📝 Citation | 🙏 Acknowledgements

Note

Work in Progress: The repository is still work in progress. We are targeting to finalize the release by the end of October, 2024. Stay Tuned!

🤖 Models

Model Checkpoint Size License
SemCoder 🤗 HF Link 6.7B DeepSeek
SemCoder-S 🤗 HF Link 6.7B DeepSeek

🛠️ Get Started

Install Environment

git clone https://github.com/ARiSE-Lab/SemCoder.git;
cd SemCoder;
conda env create --name semcoder --file=environment.yml;
conda activate semcoder;
export PYTHONPATH=$(pwd);

🕹️ Demo

We follow Magicoder script to lanuch a gradio server for the local demo. You can launch your local gradio demo as following:

CUDA_VISIBLE_DEVICES=0 python semcoder_demo.py \
   --base_model "semcoder/semcoder_s" \
   --device "cuda:0" \
   --port 8080

Evaluation

bash scripts/eval/eval_evalplus.sh
  • To evaluate SemCoder on CRUXEval, you need to firstly clone their official release:
git clone https://github.com/facebookresearch/cruxeval.git

Update the $CRUXEVAL_HOME to be the absolute path of the cloned repository in this script and run:

bash scripts/eval/eval_cruxeval.sh
  • To finetune SemCoder for debugging and self-refinement, please refer to this script

  • To evaluate SemCoder for iterative self-refinement on EvalPlus, please run

bash scripts/eval/eval_finetune_refine.sh

📝 Citation

@article{ding2024semcoder,
  title={SemCoder: Training Code Language Models with Comprehensive Semantics},
  author={Yangruibo Ding and Jinjun Peng and Marcus J. Min and Gail Kaiser and Junfeng Yang and Baishakhi Ray},
  journal={arXiv preprint arXiv:2406.01006},
  year={2024}
}

🙏 Acknowledgements

We thank the following amazing projects that inspired our design choices:

About

Artifacts for "SemCoder: Training Code Language Models with Comprehensive Semantics"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published