CovRL-Fuzz is a prototype tool for a method that integrates LLMs with coverage feedback through reinforcement learning, utilizing TF-IDF. This tool was developed for the paper titled "Fuzzing JavaScript Interpreters with Coverage-Guided Reinforcement Learning for LLM-Based Mutation" which has been accepted to ISSTA 2024 (here). The CovRL-Fuzz workflow consists of three main steps, as follows:
- Initially, CovRL-Fuzz selects a seed from the queue, applies LLM-based mutation by masking specific tokens, and predicts new test cases using a masked language model.
- If a test case uncovers new coverage, it’s added to the seed queue. CovRL-Fuzz utilizes coverage maps and test case validity, penalizing syntax or semantic errors, and applies coverage-based rewards to guide further mutations.
- After each mutation cycle, the LLM-based mutator is fine-tuned using PPO reinforcement learning, focusing on avoiding syntax/semantic errors and finding new coverage.
See appendix.
Our setup included a 64-bit Ubuntu 20.04 LTS OS on an Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz (64-core) and NVIDIA GeoForce RTX 3090 GPU.
CovRL-Fuzz can run on JavaScript interpreters as far as we know. To set up the CovRL-Fuzz environment, run the following commands.
CovRL-Fuzz is implemented based on token-level AFL. To get CovRL-Fuzz working, please build the modified token-level AFL.
cd /path_to_AFL/
make
Testing with CovRL-Fuzz involves instrumenting your program with AFL. Execute the following command to build the target interpreter.
export CC=/path_to_AFL/afl-clang
export CXX=/path_to_AFL/afl-clang++
execute_build_script_with_CC_CXX
pip install -r requirements.txt
Before running CovRL-Fuzz, you need to perfrom preprocessing.
- Specify the input directory path for
testsuites
- Specify the output directory path for
save_dir
to store results
{
"testsuites": ["path_to_testsuite"],
"save_dir": "path_to_save",
...
}
To preprocess the dataset, run:
python preprocess_dataset.py --config [path_to_config]
CovRL-Fuzz uses the following files, as specified in config.json
, for fuzzing or fine-tuning:
testsuites
: Encoded seed directorytrain_dataset.json
: Configuration file for the dataset used during training
In one terminal, please run the following module. This module performs mutation and CovRL-based fine-tuning with the LLM-based mutator.
python do_covrl.py \
--config ./config/sample_config.json \
--port 1111 \
--model_path Salesforce/codet5p-220m \
--predict_path path_to_fuzz_dir
In another terminal, please run the fuzzing module. This module operates using coverage-guided fuzzing, without performing mutation.
PORT=1111 VOCAB_SIZE=llm_vocab_size ./AFL/afl-fuzz -t 1000 -a 1 -m none -i path_to_seed_dir -o path_to_fuzz_dir path_to_interpreter_binary_file @@
@inproceedings{eom2024covrl,
title={{Fuzzing JavaScript Interpreters with Coverage-Guided Reinforcement Learning for LLM-Based Mutation}},
author={Eom, Jueon and Jeong, Seyeon and Kwon, Taekyoung}
booktitle={Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)},
year=2024,
}