FlaPy is a small tool that allows software developers and researchers to identify flaky tests within a given set of projects by rerunning their test suites.
It is the result of research carried out at the Chair of Software Engineering II at the University of Passau, Germany.
System requirements: docker
(executable without root privileges)
Clone the repository to get the helper scripts:
git clone https://github.com/se2p/flapy
cd flapy/
FlaPy’s main entry point is the script flapy.sh
, which offers two commands: run
and parse
.
The FlaPy docker image will be pulled automatically on first usage.
Prepare a CSV file with the following columns (example: flapy_input_example.csv
):
PROJECT_NAME,PROJECT_URL,PROJECT_HASH,PYPI_TAG,FUNCS_TO_TRACE,TESTS_TO_BE_RUN,NUM_RUNS
Every line in the input file will result in one execution of the container. We call this an iteration. You can have duplicate lines in this input file to analyze the same project multiple times. In fact, we actively use this to detect infrastructure flakiness, which might occur only between iterations, not within. PROJECT_NAME, PROJECT_URL and PROJECT_HASH will be used to uniquely identify a project when accumulating results across multiple iterations. PROJECT_URL can also be local directory, which will then be copied into the container. PYPI_TAG is used to install the project itself via pip before executing its testsuite to fetch it's dependencies. If PYPI_TAG is empty, FlaPy will fall back to searching for requirements in common files like requirements.txt
Example (takes ~ 1h):
# [OPTIONS...] INPUT_CSV
./flapy.sh run --out-dir example_results --plus-random-runs flapy_input_example.csv 5
Example (takes ~30s):
./flapy.sh run --out-dir example_results flapy_input_example_tiny.csv 1
./flapy.sh run --out-dir example_results \
--plus-random-runs \
--run-on cluster --constraint CONSTRAINT \
flapy_input_example.csv
where CONSTRAINT
is forwarded to sbatch --constraint
./flapy.sh parse ResultsDirCollection \
--path example_results \
get_tests_overview _df \
to_csv --index=False example_results_to.csv
Note: the directory specified after --path
needs to be accessible from the current working directory since only the current working directory is mounted to the container that is started in the background!!
FlaPy offers an option to trace the execution of a function, i.e., to log all function and method calls made in the course of its execution.
The functions that shall be traced must be specified as a space separated list in the fifth column of the input-csv.
For example test_flaky.py::test_network_remote_connection_failure test_flaky.py::test_concurrency
in flapy_input_example_tiny_trace.csv.
Example (takes ~30s):
./flapy.sh run --out-dir example_results flapy_input_example_tiny_trace.csv
Within the resulting results.tar.xz archive, we can now find two extra files:
workdir/sameOrder/tmp/flapy_example_trace0test_flaky.py._('test_flaky.py', 'test_concurrency').txt
workdir/sameOrder/tmp/flapy_example_trace0test_flaky.py._('test_flaky.py', 'test_network_remote_connection_failure').txt
containing the traces:
--> ('test_flaky', '', 'test_network_remote_connection_failure')
----> ('requests.api', '', 'get')
------> ('requests.api', '', 'request')
--------> ('requests.sessions', 'Session', '__init__')
----------> ('requests.utils', '', 'default_headers')
------------> ('requests.utils', '', 'default_user_agent')
<------------ ('requests.utils', '', 'default_user_agent')
------------> ('requests.structures', 'CaseInsensitiveDict', '__init__')
...
(Spectrum-based Flaky Fault Localization)
Execute flapy.sh run
with core arguments --collect-sqlite-coverage-database
./flapy.sh run \
--out-dir example_results_sffl \
--core-args "--collect-sqlite-coverage-database" \
flapy_input_example_sffl.csv 10
Execute flapy.sh parse
to generate the CTA (coverage table accumulated)
(this step only produces an output, if the test actual showed flaky behavior -> if needed, rerun the previous step)
./flapy.sh parse \
ResultsDirCollection --path example_results_sffl \
save_cta_tables \
--cta_save_dir example_results_sffl_cta \
--flaky_col "Flaky_sameOrder_withinIteration" \
--method="accum"
Calculate Suspiciousness scores
./flapy.sh parse \
CtaDir --path example_results_sffl_cta \
calc_and_save_suspiciousness_tables \
--save_dir example_results_sffl_cta_sus \
--sfl_method sffl
Merge with locations (-> EXAM scores & ranks)
./flapy.sh parse \
SuspiciousnessDir --path example_results_sffl_cta_sus \
merge_location_info \
minimal_sffl_example/locations.csv \
minimal_sffl_example/loc.csv \
to_csv --index=False | vd --filetype=csv
(assumes visidata (vd) to be installed)
Clone FlaPy:
git clone https://github.com/se2p/flapy
cd flapy
Building the container image:
We use containers to run the projects' test suites in an isolated environment.
docker build -t my_flapy -f Dockerfile .
This image can be used together with all existing scripts by changing the FLAPY_DOCKER_IMAGE
variable in setup_docker_command.sh
to localhost/my_flapy
.
Prerequisites
- Python in at least version 3.8.
- You have installed the latest version of
poetry
.pip install poetry
Install FlaPy locally:
poetry install
Build FlaPy using the poetry
tool:
This command will build two files in the dist
folder: A tar.gz
archive and a whl
Python wheel file.
poetry build
- Use ordered sets or lists in output csv files to always get the same (string-equivalent) output
- Many columns in passed_failed.csv are sets and their ordering is different from run to run
If you want to contact me, please find our contact details on my page at the University of Passau.
This project is licensed under the terms of the GNU Lesser General Public License.