FlakyDoctor

This repo contains the source code and results of FlakyDoctor, a neuro-symbolic approach to fixing Implementation-Dependent (ID) and Order-Dependent (OD) tests.

🌟 File structures

File structures in this repository are as follows, please refer to README.md in each directory for more details:

datasets: Datasets of flaky tests in the evaluation.
patches: Successful patches generated.
results: Detailed results for successfully fixed flaky tests in the evaluation.
src: Source code and scripts to run FlakyDoctor.

🌟 A quick demo to reproduce sample results

This section provides a quick demo using GPT-4 to reproduce sample results in ~40 minutes.

0. Before starting:

FlakyDoctor works on Linux with the following environment:

Python 3.10.12
Java 8 and Java 11
Maven 3.6.3

The current FlakyDoctor supports GPT-4 and Magicoder. Please prepare an openai key to use GPT-4; if you want to run Magicoder, download its checkpoints into a local path. We use three NVIDIA GeForce RTX 3090 GPUs in our experiments.

1. Set up requirements:

git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh |& tee setup.log

2. Create a .env which includes your local path of model Magicoder (you can skip this step if only running GPT-4):

echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env

3. Run the following commands to fix demo tests with GPT-4: Please put your openai key at the placeholder.

# install Java projects
bash -x src/install.sh datasets/demo_projects.csv projects outputs install_summary.csv 
# fix flay tests 
bash -x src/run_FlakyDoctor.sh projects [openai_key] GPT-4 outputs datasets/demo.csv ID

To check the outputs of the building project, logs of each round will be saved into a directory named [unique SHA] inside outputs. You can also check the summary of building results in install_summary.csv, including project,sha,module,build_result,java_version.

To check the results of flakiness repair, each round, a directory named as ID_Results_GPT-4_projects_[Unique SHA] will be generated inside outputs:

you may check instant logs in ID_Results_GPT-4_projects_[Unique SHA]/[Unique SHA].log;
you can see a summary of all results in ID_Results_GPT-4_projects_[Unique SHA]/GPT-4_results_[Unique SHA].csv or more details in ID_Results_GPT-4_projects_[Unique SHA]/GPT-4_test_Details_[Unique SHA].json.
If any successful patches are generated, they will be saved in ID_Results_GPT-4_projects_[Unique SHA]/GoodPatches. Please note that the results may vary when running at multiple times due to the non-determinism of LLMs.

🌟 Reproduce the results from scratch

To reproduce the results from scratch, one should run the following commands:

0. Before starting:

FlakyDoctor works on Linux with the following environment:

Python 3.10.12
Java 8 and Java 11
Maven 3.6.3

Please also prepare an openai key and local checkpoints of Magicoder

1. Set up requirements:

git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh

2. Create a .env which includes your local path of model Magicoder:

echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env

3. Clone and build all Java projects: To clone and build the projects, one should run the following commands:

bash -x src/install.sh [input_csv] [clone_dir] [output_dir] [save_csv]

input_csv: Input of ID Java projects you need to set up, each line is in the format of Project URL, SHA, Module. More details in datasets.
clone_dir: A directory to clone all the java projects.
output_dir: A directory for outputs and logs when building the projects.
save_csv: A summary of the build results.

For example, one can run:

bash -x src/install.sh datasets/ID_projects.csv projects outputs ID_summary.csv to build all Java projects for ID tests (~15 hours)
bash -x src/install.sh datasets/OD_projects.csv projects outputs OD_summary.csv to build all Java projects for OD tests (~10 hours)

4. Run FlakyDoctor to fix flaky tests: To fix flaky tests, one should run the following commands:

bash -x src/run_FlakyDoctor.sh [clone_dir] [openai_key] [model] [output_dir] [input_csv] [test_type]

clone_dir: A directory where all the java projects are cloned.
openai_key: Your openai authentication key.
model: GPT-4 or MagiCoder
output_dir: A directory to save all the results.
input_csv: An input .csv file that includes all the flaky tests. More details in datasets.
test_type: The type of flakiness to fix, ID or OD.

🌟 Pull requests

19 Tests have been accepted (one PR may include fixes for multiple tests):

Accepted PRs:

funkygao/cp-ddd-framework#65
apache/pinot#11771
dropwizard/dropwizard#7629
opengoofy/hippo4j#1495
moquette-io/moquette#781
jnr/jnr-posix#185
FasterXML/jackson-jakarta-rs-providers#22
yangfuhai/jboot#117

Opened PRs:

perwendel/spark#1285
dyc87112/SpringBoot-Learning#98
graphhopper/graphhopper#2899
BroadleafCommerce/BroadleafCommerce#2901
dianping/cat#2320
hellokaton/30-seconds-of-java8#8
AmadeusITGroup/workflow-cps-global-lib-http-plugin#68
wro4j/wro4j#1167
kevinsawicki/http-request#177
apache/flink#23648

We are waiting for developers to approve our requests to create an issue for the following PRs:

dserfe/flink#2
dserfe/nifi#1
dserfe/jenkins#1

Why other tests can not be opened PRs:

Tests are deleted in the latest version of the project:
- org.apache.dubbo.registry.client.metadata.ServiceInstanceMetadataUtilsTest.testMetadataServiceURLParameters
- org.apache.cayenne.CayenneContextClientChannelEventsIT.testSyncToOneRelationship
- org.apache.shardingsphere.elasticjob.cloud.scheduler.env.BootstrapEnvironmentTest.assertWithoutEventTraceRdbConfiguration
- org.apache.shardingsphere.elasticjob.cloud.scheduler.mesos.AppConstraintEvaluatorTest.assertExistExecutorOnS0
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testParametrizedConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testSequenceListener
- com.willwinder.universalgcodesender.GrblControllerTest.testGetGrblVersion
- com.willwinder.universalgcodesender.GrblControllerTest.testIsReadyToStreamFile

Tests are fixed by developers in the latest version of the project:
- io.elasticjob.lite.lifecycle.internal.settings.JobSettingsAPIImplTest.assertUpdateJobSettings
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testBasicListenerWithUnexpectedMessage
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testGenericsListener
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testOnMessageWithExpectedMessage
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerOnErrorWithNoSentCommandsShouldSendMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithKnownErrorShouldWriteMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithUnknownErrorShouldWriteGenericMessageToConsole
- com.graphhopper.isochrone.algorithm.IsochroneTest.testSearch

Tests are actually different types of flakiness after inspection:
- com.baidu.jprotobuf.pbrpc.EchoServiceTest.testDynamiceTalkTimeout

Repository is archived:
- io.searchbox.indices.RolloverTest.testBasicUriGeneration
- com.netflix.exhibitor.core.config.zookeeper.TestZookeeperConfigProvider.testConcurrentModification
- org.springframework.security.oauth2.provider.client.JdbcClientDetailsServiceTests.testUpdateClientRedirectURI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FlakyDoctor

🌟 File structures

🌟 A quick demo to reproduce sample results

🌟 Reproduce the results from scratch

🌟 Pull requests

Files

README.md

Latest commit

History

README.md

File metadata and controls

FlakyDoctor

🌟 File structures

🌟 A quick demo to reproduce sample results

🌟 Reproduce the results from scratch

🌟 Pull requests