Skip to content

Latest commit

 

History

History
162 lines (138 loc) · 9.05 KB

README.md

File metadata and controls

162 lines (138 loc) · 9.05 KB

FlakyDoctor ACM Artifacts Evaluated - functional v1.1 ACM Artifacts Available v1.1

This repo contains the source code and results of FlakyDoctor, a neuro-symbolic approach to fixing Implementation-Dependent (ID) and Order-Dependent (OD) tests.

🌟 File structures

File structures in this repository are as follows, please refer to README.md in each directory for more details:

  • datasets: Datasets of flaky tests in the evaluation.
  • patches: Successful patches generated.
  • results: Detailed results for successfully fixed flaky tests in the evaluation.
  • src: Source code and scripts to run FlakyDoctor.

🌟 A quick demo to reproduce sample results

This section provides a quick demo using GPT-4 to reproduce sample results in ~40 minutes.

0. Before starting:

  • FlakyDoctor works on Linux with the following environment:
Python 3.10.12
Java 8 and Java 11
Maven 3.6.3
  • The current FlakyDoctor supports GPT-4 and Magicoder. Please prepare an openai key to use GPT-4; if you want to run Magicoder, download its checkpoints into a local path. We use three NVIDIA GeForce RTX 3090 GPUs in our experiments.

1. Set up requirements:

git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh |& tee setup.log

2. Create a .env which includes your local path of model Magicoder (you can skip this step if only running GPT-4):

echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env

3. Run the following commands to fix demo tests with GPT-4: Please put your openai key at the placeholder.

# install Java projects
bash -x src/install.sh datasets/demo_projects.csv projects outputs install_summary.csv 
# fix flay tests 
bash -x src/run_FlakyDoctor.sh projects [openai_key] GPT-4 outputs datasets/demo.csv ID 

To check the outputs of the building project, logs of each round will be saved into a directory named [unique SHA] inside outputs. You can also check the summary of building results in install_summary.csv, including project,sha,module,build_result,java_version.

To check the results of flakiness repair, each round, a directory named as ID_Results_GPT-4_projects_[Unique SHA] will be generated inside outputs:

  • you may check instant logs in ID_Results_GPT-4_projects_[Unique SHA]/[Unique SHA].log;
  • you can see a summary of all results in ID_Results_GPT-4_projects_[Unique SHA]/GPT-4_results_[Unique SHA].csv or more details in ID_Results_GPT-4_projects_[Unique SHA]/GPT-4_test_Details_[Unique SHA].json.
  • If any successful patches are generated, they will be saved in ID_Results_GPT-4_projects_[Unique SHA]/GoodPatches. Please note that the results may vary when running at multiple times due to the non-determinism of LLMs.

🌟 Reproduce the results from scratch

To reproduce the results from scratch, one should run the following commands:

0. Before starting:

  • FlakyDoctor works on Linux with the following environment:
Python 3.10.12
Java 8 and Java 11
Maven 3.6.3

1. Set up requirements:

git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh

2. Create a .env which includes your local path of model Magicoder:

echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env

3. Clone and build all Java projects: To clone and build the projects, one should run the following commands:

bash -x src/install.sh [input_csv] [clone_dir] [output_dir] [save_csv]
  • input_csv: Input of ID Java projects you need to set up, each line is in the format of Project URL, SHA, Module. More details in datasets.
  • clone_dir: A directory to clone all the java projects.
  • output_dir: A directory for outputs and logs when building the projects.
  • save_csv: A summary of the build results.

For example, one can run:

  • bash -x src/install.sh datasets/ID_projects.csv projects outputs ID_summary.csv to build all Java projects for ID tests (~15 hours)
  • bash -x src/install.sh datasets/OD_projects.csv projects outputs OD_summary.csv to build all Java projects for OD tests (~10 hours)

4. Run FlakyDoctor to fix flaky tests: To fix flaky tests, one should run the following commands:

bash -x src/run_FlakyDoctor.sh [clone_dir] [openai_key] [model] [output_dir] [input_csv] [test_type]
  • clone_dir: A directory where all the java projects are cloned.
  • openai_key: Your openai authentication key.
  • model: GPT-4 or MagiCoder
  • output_dir: A directory to save all the results.
  • input_csv: An input .csv file that includes all the flaky tests. More details in datasets.
  • test_type: The type of flakiness to fix, ID or OD.

🌟 Pull requests

19 Tests have been accepted (one PR may include fixes for multiple tests):

Accepted PRs:

Opened PRs:

We are waiting for developers to approve our requests to create an issue for the following PRs:

Why other tests can not be opened PRs:

Tests are deleted in the latest version of the project:
- org.apache.dubbo.registry.client.metadata.ServiceInstanceMetadataUtilsTest.testMetadataServiceURLParameters
- org.apache.cayenne.CayenneContextClientChannelEventsIT.testSyncToOneRelationship
- org.apache.shardingsphere.elasticjob.cloud.scheduler.env.BootstrapEnvironmentTest.assertWithoutEventTraceRdbConfiguration
- org.apache.shardingsphere.elasticjob.cloud.scheduler.mesos.AppConstraintEvaluatorTest.assertExistExecutorOnS0
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testParametrizedConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testSequenceListener
- com.willwinder.universalgcodesender.GrblControllerTest.testGetGrblVersion
- com.willwinder.universalgcodesender.GrblControllerTest.testIsReadyToStreamFile

Tests are fixed by developers in the latest version of the project:
- io.elasticjob.lite.lifecycle.internal.settings.JobSettingsAPIImplTest.assertUpdateJobSettings
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testBasicListenerWithUnexpectedMessage
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testGenericsListener
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testOnMessageWithExpectedMessage
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerOnErrorWithNoSentCommandsShouldSendMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithKnownErrorShouldWriteMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithUnknownErrorShouldWriteGenericMessageToConsole
- com.graphhopper.isochrone.algorithm.IsochroneTest.testSearch

Tests are actually different types of flakiness after inspection:
- com.baidu.jprotobuf.pbrpc.EchoServiceTest.testDynamiceTalkTimeout

Repository is archived:
- io.searchbox.indices.RolloverTest.testBasicUriGeneration
- com.netflix.exhibitor.core.config.zookeeper.TestZookeeperConfigProvider.testConcurrentModification
- org.springframework.security.oauth2.provider.client.JdbcClientDetailsServiceTests.testUpdateClientRedirectURI