TrojanedCM: A Repository for Poisoned Neural Models of Source Code

With the rapid growth of research in trojaning deep neural models of source code, we observe that there is a need of developing a benchmark trojaned models for testing various trojan detection and unlearning techniques. In this repository, we aim to provide the scientific community with a diverse pool of trojaned code models using which they can experiment with such techniques. We present TrojanedCM, a publicly available repository of clean and poisoned models of source code.

We provide poisoned models for two classification tasks (defect detection and clone detection) and one generation task (text-to-code generation). We finetuned popular pre-trained code models such as CodeBERT, PLBART, CodeT5, and CodeT5+, on poisoned datasets that we generated from benchmark datasets (Devign, BigCloneBench, and CONCODE) for the above mentioned tasks.
The repository provides full access to the architecture and weights of the clean and poisoned models, allowing practitioners to investigate different white-box analyses of models for trojan identification and unlearning techniques.
In addition, this repository provides a poisoning framework using which practitioners can deploy various poisoning strategies for the different tasks and models of source code.

Experimental Settings

We fine-tuned various pre-trained code models for different tasks and datasets using different poisoning strategie.

Models (different versions):
- CodeBERT: (codebert-base)
- PLBART: (plbart-base)
- CodeT5: (codet5-small, codet5-base, codet5-large)
- CodeT5+: (codet5p-220m, codet5p-220m-py, codet5p-770m, codet5p-770m-py)
Tasks with datasets:
- Defect Detection task with Devign dataset
- Clone Detection task with BigCloneBench dataset
- Text-to-Code (text2code/nl2code) task with CONCODE dataset
Backdoor techniques for tasks:
- Variable Renaming (VAR) for Defect Detection task
- Dead-Code Insertion (DCI) for Defect Detection and Clone Detection tasks
- Exit Backdoor Insertion (Exit) for text2code/nl2code task

References

Acknowledgements

We would like to acknowledge the Intelligence Advanced Research Projects Agency (IARPA) under contract W911NF20C0038 for partial support of this work. Our conclusions do not necessarily reflect the position or the policy of our sponsors and no official endorsement should be inferred.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
poisoned-models		poisoned-models
poisoning-tools		poisoning-tools
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TrojanedCM: A Repository for Poisoned Neural Models of Source Code

Experimental Settings

References

Acknowledgements

About

Releases

Packages

Languages

zijin235/TrojanedCM

Folders and files

Latest commit

History

Repository files navigation

TrojanedCM: A Repository for Poisoned Neural Models of Source Code

Experimental Settings

References

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages