Due to the concept of separation of concerns, which is able to reduce complexity, improve reusability, and make evolution simpler cite:hursch1995separation,ossher2001using,tarr1999n, software systems are split into different types of artifacts, each targeting particular domain concerns. Ensuring the quality of those artifacts is thus of the utmost importance.
To gain a clear overview we can distinguish various types of software artifacts.
Most commonly found in software projects are code, application programming interface (API), tests, models, scripts, etc.
For example, the class model and the API can be completely integrated into the functional implementation. Moreover, an artifact can also be partially or completely synthesized or generated from others. As an example, it is possible to extract an API from some code or models, it is also possible to generate tests from models or functional implementations. As artifacts share common concepts, when one artifact evolves, other artifacts may be impacted and may need to be co-evolved. In this report, we will focus on the scenario of when code evolves and tests must be co-evolved. For example, moving a method from one class to another makes calls to this method invalid, but most importantly with the right contextual information, it is possible to fix those calls and to co-evolve tests by moving related tests to the proper place while fixing some other contextual differences.
However, unfortunately, tests co-evolution remains mainly a manual task for developers, which is tedious, error prone, and time consuming. In particular, when hundreds of developers collaborate together, and where those who maintain tests (testers) are not necessarily those who evolve the code [fn:0].
In the internship we will address the problem of co-evolving tests using information available in the rest of the code and its evolution.
While in this article we will establish a state of the art on the co-evolution of code and test.
Other survey have targeted the co-evolution meta-models and models cite:hebig2016approaches, the co-evolution of mutant and tests cite:jia2010analysis, and also the generation of tests cite:anand2013orchestrated,andreasen2017survey, but to the best knowledge they were no survey or state of the art on the co-evolution of code and test. This article fills this gap, in preparation of this internship.
The rest of the article is presented as follow:
Section Background presents a short background. Section Methodology gives the methodology used to construct this state of the art. Section Classification of Approaches illustrates the results of the categorization presented in the methodology. Finally section Conclusion presents the conclusion and initial research perspectives.
[fn:0] https://github.com/microsoft/onnxruntime.
This first section presents a background on testing and co-evolution. Listing lst:example shows a basic example of code and tests.
export class Counter {
constructor(
private x : number) {}
count(cb?:(n:number)=>number){
if (cb){
this.x=cb(this.x);
}else{
return this.x++;
}
}
}
test('trivial 1', () => {
const init = 0;
const e = new Counter(init);
expect(e.count()).toBe(init+1);
});
test('trivial 2', () => {
const e = new Counter(3);
expect(e.count(x=>x-2)).toBe(1);
});
<lst:example>
Tests allow us to detect bugs to solve them cite:gyimesi2019bugsjs. It is also a way to specify functionalities and constraints.
It is not as exhaustive compared to symbolic analysis, but it is often easier to implement. Compared to declarative specifications, it facilitates the specification of complex functionalities while allowing flexibility in the implementation, by sticking to common concepts of imperative programming.
Quantifying software quality is also a major concern, which is addressed by software testing, e.g., mutation testing cite:wang2017behavioral, or by comparing tests and field behaviors cite:leotta2013capture,jin2012bugredux
Software testing can take many forms.
Each form focuses on particular aspects of software and serve different goals.
a) Unit Tests are the most known kind of tests, they can detect bugs early in development,
they run fast, automatically and help at finding causes of bugs.
In Listing lst:example on the right, we can see some unit tests targeting the piece of code on the left.
Like its name indicate the class on the left is a counter,
its constructor instantiate the x
attribute,
while its method count
take a function as an optional parameter,
this function modify the x
attribute by a certain number otherwise x
is incremented by one.
Both unit tests on the right test the count
method,
the first one initialize the counter at 0 then check the result of count
called with the default parameter,
the second one initialize the counter at 3 then check the result of count
called with a given lambda function.
b) System tests allow to asses the validity of a program in particular use cases,
but contrary to unit tests they are slow and might need human intervention in addition to not helping much at finding causes of bugs.
Compared to unit tests in Listing lst:example system tests would be much larger and span over many classes at once.
Multiple uses of tests also exists depending on some additional concerns, such as mock testing, regression testing, performance testing, etc. For example, a) mock testing allows to abstract from dependencies and focuses on small and very controlled parts of programs, while b) regression testing allows to compare different versions of a program to facilitates incremental improvements.
With a rough look at most software engineering systems, there are at least a few types of artifacts that are easy to discern like an API, a functional implementation of this API, a model —or specification— of the application and tests to check the implementation against some constraints. But there are many more software artifacts like traces, binaries, metadata, comments, etc. As a matter of fact, there is no clear boundary between each artifact. For example, the class model and the API can be completely integrated into the functional implementation. An artifact can also be partially or completely synthesized or generated from others. As an example, it is possible to extract an API from some code or models, it is also possible to generate tests from models or functional implementations. Moreover in the same way those artifacts are overlapping, depend on each other to work properly and changing one might impact another negatively and hence requires co-evolution.
- Definition 1:
- Co-evolution is the process of modifying a given impacted artifact $A$, in response to evolution changes of another artifact $B$.
The co-evolution scenario we will focus on is code evolution and tests co-evolution. The co-evolution of tests can be split in amplification and repair. The amplification of tests can be seen as the continuation of tests generation in the context of co-evolution as it consider preexisting tests in relation to evolution in the code. One of the difficulties of amplification is the readability of generated tests. Whereas repairing tests with co-evolution using code, considers changes to the code as a way of detecting and fixing tests broken by code changes. Here the major challenges is to keep tests correct.
Looking at Listing lst:example if we rename the method count
of the class Counter
as update
,
calls to the member count
of instances of Counter
would also need to be renamed.
Similarly, if we make the parameter of method count
mandatory,
we would need to generate a default value for empty calls to method count
.
And as a last one, if we move the method count
to another class,
tests of count
should be moved to a more appropriate place
and the constructors pointing to Counter
would need to be renamed.
Facilitate and automate the evolution of a specification by reaction to a change in model or code. Two families of co-evolutions:
- co-evolution of models and constraints (UML and OCL)
cite:hebig2016approaches
- code and test co-evolution
cite:dhondt2002coevolution cite:zaidman2008mining then cite:zaidman2011studying
This section presents our methodology. We propose criteria to categorize approaches that handles the co-evolution of tests. Thanks to those criteria we will be able to classify the literature and to choose better suited techniques depending on particular concerns. Figure fig:featuretree illustrates those criteria as a feature model.
Another major focus will be to expose relations between objects of studies and solving methods.
In the end, it will allow us to find still existing gaps in test co-evolution and to identify research questions and future works perspectives.
This state of the art took inspiration from the survey cite:hebig2016approaches from Hebig et al. on the co-evolution of models.
The bibliographical research started with a set of articles given by my supervisors. Then alternating between searches on mainly google scholar with keywords from previous papers in addition to following the most relevant references from papers (“snowballing” technic) that I read.
<fig:featuretree>
Here we would like to look at co-evolution as a 2 step process, where the first step would be to detect and categorize evolution in the implementation of some program, the second step would be the co-evolution of tests. We first present criteria that are common to both steps.
One of the first criterion to consider is the degree of automation of the co-evolution. It quantifies the amount of involvement needed by a developer in the process of co-evolution. In case of a full automation one might only have to confirm co-evolution, otherwise in a semi-automated co-evolution one might need to choose between possible resolutions to apply or even create a custom transformation, capable of handling some domain-specific evolution.
We consider manual, semi-automated and fully-automated approaches.
dazdazd azdazdaz adazd zd az \mytikzdot{} fzf aefeafeafa afaefea.
The systems that can be co-evolved possess different characteristics. Those characteristics can particularly be observed through the language point of view. Most software projects use some framework and use a multitude of languages. This multitude of languages might possess common characteristics. We mainly consider the language paradigm like the Object oriented (with the Class construct), Imperative or Declarative paradigms and the type system like strongly or weakly typed languages.
Detecting and classifying evolution is the first step in any co-evolution of code and test. Each major criterion composing this step of co-evolution are explained in the following 3 paragraphs.
The granularity of evolution is very important to the automation of the co-evolution. The simplest kind of evolution is an atomic change, while it is very simple to detect simple changes, it does not contain much information. Additions and deletions are the most simple atomic changes, and often the only atomic changes considered. It is possible to combine atomic changes into composed changes. For example, moving a method from one class to another is composed of a deletion and addition. Another example of complex change is renaming a method, it is also composed of a deletion and addition, but here the change is much more localized.
In every software analysis, the level of abstraction reflect the trade-off made between precision and performance. For example, the file abstraction can be considered as high abstraction to detect changes in a codebase, the file abstraction is what most compilers for procedural languages are using to avoid recompiling unchanged files. There is also the class abstraction, it is one of the most used, as it syntactically and statically presents a large quantity of semantic information. In facts, methods are carrying the behaviors of object, and behaviors can be shared through inheritance. But this abstraction requires the analyzed language to be object oriented and possibly have class, prototypes and an inheritance system. To establish measurements of impacts from changes it is necessary to look at calls, this abstraction is a call graph. Finally looking at the level of flow graphs, i.e., blocks of instructions linked by branches might be necessary for some analysis but it requires a lot of effort and processing power to compute.
dossier/file/\(\{\)class,objet,fonction\(\}\)- method/class/objet/fonction
- parameter
- branch
- instruct%ion%
The detection of changes can be done online by logging operations made on files or offline by comparing states of files between versions. Detecting changes through online logging is more precise but is also more intrusive than offline detection. Online detection can be brittle in case of unlogged changes. Thus all external tools modifying the code would need to provide the set of applied changes.
- changement atomique
- addition
- suppréssion
- changement composé
- déplacement
- renomage
The type of change might no be very useful for coevolution,
It was just used a distinction made in exploratory papers on statistical analysis of commits,
correlation between comments of the commits and the type of change,
but also between the moment of the commit in the schedule (release,…) and the type of change.
- (from cite:levin2017co)
- Corrective
- fix faults, corr to repair
- Perfective
- improve sys and design, not corr
- Adaptive
- introduce new features, corr to generation
assembling atomic changes into complex changes
! does not split the classification.
Here, we will look at the particular aspects that concern the actual the co-evolution of tests.
The impact analysis of code changes on tests need to be quantified to propose relevant co-evolution.
It allows to locate tests that need to be co-evolved and to provide some more contextual information on tests dependencies
Two modes of impact analysis can be discerned. Offline impact analysis is computed when the developer is done with his current set of changes. While online impact analysis is computed interactively whenever a change happens.
Many possible analysis methods are preceding to impact analysis depending of the language characteristics of the co-evolved artifacts. The main points of analyzing code here is to measure the impact of changes, and to extract useful information from programs. being capable of measuring code allow to find tests that need to be repaired or relaunched.
Analyzing code can also be useful to harvest data and patterns cite:hindle2012naturalness that will allow to better amplify tests. In addition to static analysis, using the history of changes and the behavior of the program during test might prove to allow improvements to the precision and performance of programming assistants.
In the general case, analyzing programs is difficult. The whole stack from an algorithms to run is complex and diverse. Indeed many programming languages use different paradigms. For each language many parsers and compilers exist. There is also many runtime and intermediate representations. It is thus important to find points in this stack where analysis are the most efficient.
The static analysis is often the first choice when one want to analyze a particular program or project. In the best case scenario a static analysis can prove properties of a program for any given inputs. Most domains of science and industry that needs to prove properties use language with rich types systems. But annotating programs can be tedious and lead to bugs. That is why analysis tools make heavy use of type inference to lighten the burden of type annotating. Yet type inference have its limits as uncertainties lower the quality of types through the program. Refining those uncertainties is a major point to improve software quality.
Even if rich type systems are very useful for analysis, programs heavily constrained by types are less flexible, demand more code and use more complex artifacts to alleviate types overhead. There is an obvious trade-off between development flexibility and ease of analysis. Making use of runtime can disambiguate uncertainties through programs and ensure properties with more precision. Combining both static and dynamic analysis offer the possibility to further improve code quality while improving flexibility.
- Static analysis
-
It requires type information (annotated or inferred).
It can check properties on infinite domains in an exhaustive way.
Prove to be efficient on simple programs but able to accept a large number of inputs.
Type systems can be languages that don’t have explicit annotated types,
it is nonetheless possible to use type rules e.g. mono-type in C with everything is an int, can check for null dereferencing (that is dereferencing 0).
To improve robustness and flexibility most analysis tools have types that match all types and types that match no types.
In practice, it allows incremental typing and type inference.
Many tools exist to analyze programs statically, most of them only work on one language (typescript, compCert, spoon) while some try to be more agnostic (llvm, semantic, pandoc). Focusing on one language allow finer analysis but might not scale to multilanguage projects. While tools handling multiple language might work better on multilanguage projects, to leverage the quantity of work for each language such tools need an intermediate representations of programs.
Static analysis work with semantic models such as class diagram,type system, and so on.
- Dynamic analysis
-
It is particularly suitable for highly dynamic and not very typified languages.
but it cannot provide absolute guarantees on an infinite domain.
Event is it tries to be as close as possible to the actual behavior of the program.
Dynamic analysis can be effective on potentially complex programs while accepting fewer inputs than static analysis.
Dynamic analysis woks with functional models such as finite state machines, memory behavior, and so on.
- Hybrid analysis
-
Supports static analysis by providing information that is easily accessible to the runtime. It also supports dynamic analysis by directing it to the sensitive points detected during static analysis. Use tests to collect information at runtime and improve inferences from static analysis. Use static analysis to detect pieces of sensitive programs and test and instrument them to better understand them and detect bugs.
The kind of tests targeted by a tests’ co-evolution methods could be relevant as system tests are much bigger and take longer than unit tests. In a way the kind of tests handled by co-evolution methods should give a lead on the scalability of the approach.
The target of the co-evolution can be the calls, the inputs of calls or the expression of oracles.
Take the example from the background, the value given to the class constructor is an input,
while the value in the toBe
method is part of an oracle.
A value can also be used both as an input and as a part of an oracle, like the constant init
.
From another point of view an input value go through what we want to test,
while an oracle value avoid passing though what we want to test.
- Calls
- Reproducing functional behaviors observed in production is one of the first requirement to synthesize units test from in field executions. There are many proposed techniques in the literature capable of producing a skeleton of calls for test cases.
- Inputs (caution inputs of test or inputs of calls)
- From an existing test or a skeleton of calls, there are many tools to produce complete tests (almost, in the case of the calls skeleton oracle should also be generated).
- Oracles
- They are assertions to compare input and output values of the tests to detect if those tests pass or fail. Assertions are tricky to repair and generate as it part of the program specification. So the challenge is to mine those from somewhere.
For example, in the first test of Listing lst:example,
a call to the constructor of Counter
is made with the number
3 as an input,
then a call to the method count
is made with a function as an input.
Finally the oracle checks that the value return by the previous call is equal to the number 1.
Given some evolutions two types of co-evolution are possible.
Amplification co-evolution creates new tests from other tests by various exploratory methods (genetics, regression, etc.).
Repair co-evolution modifies existing tests to make it pass the compilation, or the runtime checks.
Finally the benefit class measure the possible impact of co-evolution rules.
See survey cite:hebig2016approaches for supplementary details in the case of model co-evolution.
We would also look at the correlation between type of analysis used in articles
depending on the language paradigms,
in particular the use of dynamic analysis to complement possible lack of static accessible data.
In this section, we will present the state of the art on co-evolution of code and tests following the classification given by the feature model in Figure fig:featuretree. We will present some of our results in Table classification1 regarding the classification of approaches that detect and classify evolution in software artifacts, mostly code. Then in Table classification2, regarding the actual co-evolution of software artifacts, mostly tests. It should be noted that approaches mentioned in one table but not in the other, either only detect evolution or only improve tests without considering evolution. \newpage
parad | language | test | objectives | automation | granularity | detection | target | type | impact |
---|---|---|---|---|---|---|---|---|---|
C | java | unit | repair | auto | C$parad -> Class | offline | inputs | repair | offline |
main | year | artifacts relation | parad | objectives | my | t1 | analysis | abstraction | ref | t2 | kind | language | dyngranu | granularity | target | detection | type | change | automation | impact | test | usable thg | compare_eval | eval objects, resources | impact Sci | num | reading issues |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
wang \etal | 2017 | system test → unit test | E | find best metric | M1, compare 4 tools | dynamic | instruction | cite:wang2017behavioral | 1 | survey | java | 2 events | / | tests calls | / | generate | no | / | offline | unit | / | coverage,mutation,temporal invariant | JetUML,Log4j,Common {IO,Lang} | more than mutation t | |||
jin \etal | 2012 | production → test | E | repro fail in house | M1, used LLVM | 5 | dynamic | class, flow graph | cite:jin2012bugredux | 2 | technical | C | events | composed | tests calls and inputs | offline | generate | no | auto | N/A[fn:5] | unit | tool(avail) | time space overhead, eff{ctivi,icien} | [16,23]->SIR[21],BugBench[22],exploit-db[23] | in house reprod | ||
kampmann \etal | 2019 | system test → unit test | D | param unit test ?? | M1,Kim et al.[12] | dynamic | flow graph | cite:alex2019bridging | 2 | technical | web-python-sql-C stack | failure | / | whole tests | / | generate | no | auto | offline | unit | proto | coverage,coverage over time,lifting | GNU coreutils,sed,dc | accu of sys to unit | |||
hindle \etal | 2012 | code=language | C,I | java is like eng.? | M1,naturalness software | -5 | static, nlp | word | cite:hindle2012naturalness | study | java,C | / | / | / | / | / | no | / | / | N/A | / | n-gram | many languages | apply nlp to code | |||
jiang \etal | 2006 | runtime->model(fsm) | E,??I | abnorm trace detect | M1,secu | 31 | dynamic | event | cite:jiang2006multiresolution | -1 | technical | N/A | events | composed | tests event | offline | / | single | auto | online | N/A | algo | inject faults | J2EE Pet Store | prove point | ||
beschastnikh \etal | 2013 | spec+production->model(fsm) | E | fsm inference | M1,logic | -3 | dynamic | event | cite:beschastnikh2013unifying | -3 | theo,study | N/A | events | offline | tests event | / | / | no | user spec | / | N/A | ?? algo | declarative vs procedural kTails | logs from prev study[7] | decl ktail is better | ||
tonella \etal | 2014 | test |
E | better fsm use | M1, interpolate ngrams | -5 | dynamic | event | cite:tonella2014interpolated | 1 | technical | all | events | composed | tests event | offline | generate | no | semi[fn:3] | offline | all | method,algo | 4 custom metrics used + qualitative | Adobe Flextore,Cyclos,… (java,js,php,…) | prove title | ||
hebig \etal | 2016 | co-evo approach | init | cite:hebig2016approaches | survey | many | / repair | 10 | |||||||||||||||||||
khelladi \etal | 2018 | metamodel → model | C | compose resol | init | 0 | static | class | cite:khelladi2018change | -1 | technical | UML class diag. | no | composed | models | online | repair | yes | semi[fn:4] | offline | / | tool | correctness | many models | 0 | ||
khelladi \etal | 2017 | (meta)model/OCL |
C,D | also co-evolve OCL | init | 5 | static | class | cite:khelladi2017semi | 1 | technical | OCL | no | composed | whole models and constraints | online | repair | yes | semi[fn:4] | offline | N/A[fn:2] | tool | 1 | ||||
zaidman \etal | 2008 | production - test | N/A | classify evolution | init,redundant | static | SVN, class | cite:zaidman2008mining | study | N/A,(SVN) | / | / | explo tool | 6 | |||||||||||||
zaidman \etal | 2011 | production->test | classify evolution | init | static | SVN, class | cite:zaidman2011studying | study | (SVN) | / | all | ||||||||||||||||
levin \etal | 2017 | code - test | N/A | classify evolution | init,classif_change_t | 10 | static | class, metadata | cite:levin2017co | -5 | tech,study | all,(git) | no | atomic | whole tests | offline | N/A | yes | auto | / | 4 | ||||||
gall \etal | 2009 | change → ? | C | analysis | l’17co,good peda+fig | 1 | static | cite:gall2009change | study,magaz | java,… | atomic | N/A | / | yes | auto | ||||||||||||
martinez \etal | 2019 | code → ? | I,OO | detecting ?? | snd | 1 | pattern, none | class, metadata | cite:martinez2019coming | -2 | analysis | java,(git) | no | atomic | whole tests | offline | N/A | yes | manual | offline | / | ||||||
levin \etal | 2017 | code → ? | predicting | l’17co,classif_change_t | 1 | class, metadata | cite:levin2017boosting | -5 | study | java,(git) | no | atomic | c:all | offline | recommend | yes | semi[fn:3] | ||||||||||
schafer \etal | 2008 | instantiation → ? | mining | r’11sca | 1 | cite:schafer2008mining | study | / | |||||||||||||||||||
andreasen \etal | 2017 | runtime->test | Dy | test gen js | googleS | dynamic | cite:andreasen2017survey | 1 | survey | js | generate | all | |||||||||||||||
zhu \etal | 1997 | base unit test cov | a’04ov,test cov, crit | cite:zhu1997software | book | ??gen | |||||||||||||||||||||
mirshokraie \etal | 2013 | code | Dy | mut, fast/eval test | googleS,mutation | -5 | static,dynamic | call graph | cite:mirshokraie2013efficient | -5 | technical | js | mutation | ? | mut | offline | mut gen | no | auto | offline | all | tool | non-equiv mutant,fault severity | SimpleCart,JQuery,… | |||
gyimesi \etal | 2019 | Dy | bench things | googleS | -1 | / | / | cite:gyimesi2019bugsjs | -1 | benchmark | js | / | / | / | / | / | / | / | / | / | bench | ||||||
anand \/etal | 2013 | I,OO | find new tests | test gen | random | cite:anand2013orchestrated | 1 | survey | java | generate | |||||||||||||||||
xu \etal | 2010 | code → test | I | augmentation | r’11sca | 3 | genetic, symbolic | branch | cite:xu2010directed | 5 | study | C | regression | / | whole tests | augment | yes | auto | offline | unit | from SIR | ||||||
marsavina \etal | 2014 | production → pattern->test | OO | ana-mine-fix | l’17co | 5 | static | all, branch cover | cite:marsavina2014studying | 1 | study | java | composed | whole tests | offline | generate | yes | N/A | offline | CommonsLang,CommonsMath,Gson,PMD,JFreeChart | 5 | ||||||
mirzaaghaei \etal | 2014 | code → test | C,I | repair,8 co-evo pat | init | 10 | static | class | cite:mirzaaghaei2014automatic | 10 | technical | java | no | atomic | whole tests | offline | repair, amplify | yes | auto | offline | all | algo,tool | apply freq, repair effectiveness | JodaTime,Barbecue,JfreChart,PDM,Xstream | handle java patterns | 3 | |
fraser \etal | 2011 | cove->test | OO | gen test suite | googleS | -1 | ? | flow graph | cite:fraser2011evosuite | -1 | technical | java | events | composed | whole tests | offline | amplify | yes | semi | offline | unit | tool | |||||
fraser \etal | 2014 | code->test | C | test java generics | arcuri | -1 | static | cite:fraser2014automated | 1 | technical | java | tests calls and inputs | generate | no | auto | N/A[fn:5] | unit | evosuite | |||||||||
daniel \etal | 2010 | symbolicExec → test | I,OO | repair | r’11sca,symb,literal repair !!! | 3 | static, symbolic | instruction | cite:daniel2010test | 3 | technical | java, .NET | fail | composed[fn:1] | whole tests | offline | repair | no | semi[fn:3] | offline | unit | ||||||
person | 2009 | symbolicExec | symbolic exec | r’11sca | -1 | static, symbolic | instruction | cite:person2009differential | -1 | phd dissert | / | ||||||||||||||||
hassan | 2009 | code->fault | I | predicting,entrop | l’17co,OS,dbms,gui,regression | 35 | static | pattern, metadata | cite:hassan2009predicting | -1 | tech,study | C,C++ | faults | atomic | / | offline | / | yes | auto | / | N/A | eq | complex,faults,modif | ||||
dagenais \etal | 2011 | code->API | C | recommending | r’11sca | 35 | static | metadata | cite:dagenais2011recommending | -1 | technical | java | / | composed[fn:1] | calls in general | offline | call repair | yes | semi[fn:3] | cod offline | N/A | impr SemDiff | 7 | ||||
dagenais \etal | 2014 | code->doc | C | recommending | dagenais | 5 | static | pattern | cite:dagenais2014using | 0 | technical | java | / | composed | references from documentation | offline | doc repair | yes | semi[fn:3] | doc offline | N/A | pattern,tool | |||||
halfond \etal | 2008 | call-< | param mismatch id | r’11sca | 34 | static | calls, data flow | cite:halfond2008automated | 1 | technical | java, PHP, http,… | / | composed | calls in general | offline | repair | yes | semi[fn:3] | offline | ? | proto WAIVE | Daffodil | |||||
vcubranic \etal | 2003 | many->db for human use | all | learning curve | kh’18ch,personized indexing,stats | 35 | static | metadata, … | cite:vcubranic2003hipikat | -1 | technical | all | ? | ? | index | offline | index | ~yes | auto | offline | N/A | hipikat | |||||
xing \etal | 2006 | things in general | C | refactoring how wha | r’11sca,Eclipse refactoring | static, ?? | ?? | cite:xing2006refactoring | -0 | study | java | / | ?? | ?? | ?? | ?? / | ?? | ?? | ?? | N/A | ?? | ||||||
levin \etal | 2016 | things in general | C | predict maintenance | l’17co,classif_change_t | -1 | static | metadata, class | cite:levin2016using | -1 | study | CVS,java | ?? | atomic | ?? | offline | ??/ | yes | ?? | ?? | N/A | ?? | catalog | ||||
memon \etal | 2008 | runtime → test | E | repair,augmentation | r’11sca,GUI,capture&replay,semiauto compo | 35 | dynamic | event | cite:memon2008automatically | 2 | technical | all | EvtF graph | composed | whole tests | offline | repair | yes | semi[fn:4] | offline | unit[fn:6] | tool | CrosswordSage,FreeMind,GanttProject,JMSN | ||||
thummalapenta \etal | 2009 | code → test | OO,?C | generation | r’11sca | 0 | static | class, flow graph | cite:thummalapenta2009mseqgen | 2 | technical | java | ? | ? | tests calls and parameters | ? | generate | no | semi[fn:3] | offline | unit | ||||||
robinson \etal | 2011 | ? ->test | generation | r’11sca | -1 | static | cite:robinson2011scaling | 2 | technical | java | N/A | whole tests | offline | generate | auto | offline | unit[fn:6] | Randoop | |||||||||
tsantalis \etal | 2018 | code → ? | I,OO | detecting ?? ?? git | trd, pattern | 30 | static | instruction, class | cite:tsantalis2018accurate | 0 | technical | java | no | composed[fn:1] | N/A | offline | notify | bit | semi[fn:3] | offline | N/A | RMiner | |||||
galeotti \etal | 2013 | code -> test | C | symb impr test gen | arcuri | -1 | static, symbolic | cite:galeotti2013improving | 2 | technical | java | no | tests calls and inputs | ? | generate | ??no | auto | N/A[fn:5] | unit | ||||||||
arcuri \etal | 2011 | debunk | a’13or | cite:arcuri2011adaptive | study | ??/ | |||||||||||||||||||||
arcuri \etal | 2007 | co-evo | a’13or, too seminal | -1 | static | cite:arcuri2007coevolving | -1 | technical | ??not easy | ??all | |||||||||||||||||
arcuri \etal | 2008 | fix bugs | a’13or,seminal,evolutionary testing | -1 | cite:arcuri2008automation | -1 | ?? technical | yes but | repair | bit | |||||||||||||||||
arcuri \etal | 2008 | improvement | a’13or | -1 | cite:arcuri2008multi | -1 | technical | ??generate | |||||||||||||||||||
arcuri \etal | 2008 | co-evolutionary | a’13or,co-evolutionary | 1 | cite:arcuri2008novel | -1 | technical | java,.NET | all | repair | auto?? | eval | |||||||||||||||
arcuri \etal | implem code | arcuri | -0.5 | static | cite:arcuri2014co | -0.5 | technical | java | |||||||||||||||||||
papadakis \etal | 2019 | prove advances mut | googleS,mut | cite:papadakis2019mutation | 1 | survey | / | ||||||||||||||||||||
jia \etal | 2010 | prove domain growth | tools,mut | cite:jia2010analysis | 1 | survey | java,C, C++,… | / | |||||||||||||||||||
adamapoulos \etal | 2004 | mutant <-> test | I | meta-mut tests | a’13or,meta-mutation testing | 31 | static, genetic | mutant | cite:adamopoulos2004overcome | 1 | technical | Fortan-77 | N/A | N/A | tests inputs | offline | generate | algo | auto | offline | unit | formula | simu of a mut testing tool like Mothra,1993 | rise of mut with GAs | |||
zhang \etal | 2011 | generation | a’13or,gen,evol,symbolic | 1 | cite:zhang2011automatic | -1 | technical | SQL | / | N/A | tests inputs | offline | generate | no | semi | load test | |||||||||||
nistor \etal | 2015 | code | a’13or | cite:nistor2015caramel | fix code | ||||||||||||||||||||||
pinto \etal | 2012 | test ? | OO,C | debunk | l’17co | +0 | static | cite:pinto2012understanding | +0 | study | java | ??/ | yes | all | TestEvol | 9 | |||||||||||
beller \etal | 2015 | code → code | when,how,why | l’17co | -5 | cite:beller2015how | -5 | study (socio) | |||||||||||||||||||
richards \etal | 2010 | Dy | how dyn js work | googleS | -2 | cite:richards2010analysis | -1 | analysis | js | / | |||||||||||||||||
freeman \etal | 2002 | snd | -5 | cite:freeman2002software | -5 | magazine | |||||||||||||||||||||
hedin \etal | 2014 | Dy | googleS,interpreter | -5 | dynamic | cite:hedin2014jsflow | -5 | technical | js | ||||||||||||||||||
dhondt \etal | 2002 | code$\circlearrowleft$ | C,D | try LMP | init,try to reread | -5 | static | cite:dhondt2002coevolution | -1 | technical | java, ??smalltalk | / | 2 | short,complexe | |||||||||||||
leotta \etal | 2013 | test | Dy,E | googleS,on the side | -5 | cite:leotta2013capture | -5 | study | js | C&R,prog.mable | Capture-Replay vs. Programmable Web Testing | 8 | |||||||||||||||
mirshokraie \etal | 2015 | mir | dynamic | mut | cite:mirshokraie2015jseft | 2 | technical | js | event | N/A | whole tests | offline | generate | ?no | auto | offline | unit | ||||||||||
mirshokraie \etal | 2016 | mir | dynamic | mut | cite:mirshokraie2016atrina | 2 | technical | js | event | N/A | whole tests | offline | generate | ?no | auto | offline | unit |
Paper | Paradigm | Analysis method | degree of automation |
Name | Application | Year | Character (General,…)(Opensource,Commercial) | Available | Ref |
main | year | ref | language | granularity | abstraction | detection | automation |
---|---|---|---|---|---|---|---|
Reference | Language | Granularity | Abstraction | Detection | Automation |
import pandas as pd
import re
h = head[0]
df = pd.DataFrame(data[2:],columns=data[0])
df = df[h].reindex(h, axis=1)
prog = re.compile(r"^(?:(.*?) )?cite:([^\s]*)(.*)$") # cite:(.*) (.*)")
def format_cite(x):
r = prog.match(str(x))
if r is None:
return x
elif r.group(1) is None:
print(1,r.group(2))
#return 'cite:'+ r.group(1)
else:
print(2,r.group(1))
#return r.group(1) + 'cite:'+ r.group(2)
df = df.applymap(format_cite)
#print([h]+[None]+list(map(list,df.values)))
Reference | Language | Granularity | Abstraction | Detection | Automation | Analysis | ||
---|---|---|---|---|---|---|---|---|
\rowcolor{gray!25} Vcubranic \etal | 2003 | \cite{vcubranic2003hipikat} | all | ? | meta data, … | offline | auto | static |
Adamapoulos \etal | 2004 | \cite{adamopoulos2004overcome} | Fortan-77 | N/A | mutant | offline | auto | genetic |
\rowcolor{gray!25} Jiang \etal | 2006 | \cite{jiang2006multiresolution} | N/A | ? | event | prod | auto | dynamic |
Halfond \etal | 2008 | \cite{halfond2008automated} | java, PHP, http,… | composed | calls, data flow | offline | semi[fn:3] | static |
\rowcolor{gray!25} Memon \etal | 2008 | \cite{memon2008automatically} | all | composed | event | online | semi[fn:4] | dynamic |
Hassan | 2009 | \cite{hassan2009predicting} | C,C++ | statistical | pattern, meta data | offline | auto | static |
\rowcolor{gray!25} Daniel \etal | 2010 | \cite{daniel2010test} | java, .NET | composed[fn:1] | instruction | offline | semi[fn:3] | symbolic |
Fraser \etal | 2011 | \cite{fraser2011evosuite} | java | composed | flow graph | offline | semi | dynamic |
\rowcolor{gray!25} Dagenais \etal | 2011 | \cite{dagenais2011recommending} | java | composed[fn:1] | metadata | offline | semi[fn:3] | static |
Jin \etal | 2012 | \cite{jin2012bugredux} | C | composed | class, flow graph | offline | auto | dynamic |
\rowcolor{gray!25} Mirzaaghaei \etal | 2014 | \cite{mirzaaghaei2014automatic} | java | atomic | class | offline | auto | static |
Dagenais \etal | 2014 | \cite{dagenais2014using} | java | composed | pattern | offline | semi[fn:3] | static |
\rowcolor{gray!25} Khelladi \etal | 2017 | \cite{khelladi2017semi} | OCL | composed | class | online | semi[fn:4] | static |
Khelladi \etal | 2018 | \cite{khelladi2018change} | UML like | composed | class | online | semi[fn:4] | static |
\rowcolor{gray!25} Tsantalis \etal | 2018 | \cite{tsantalis2018accurate} | java | composed[fn:1] | instruction, class | offline | semi[fn:3] | static |
[fn:1] Only consider in place compositions. [fn:2] Co-evolve OCL constraints. [fn:3] Makes recommendations, on possible co-evolutions. [fn:4] Might sometimes require human design choices. [fn:5] Do not use changes to generate tests. [fn:6] In the context of regression testing.
main | year | ref | language | impact | analysis | test | type | target | automation |
---|---|---|---|---|---|---|---|---|---|
Reference | Language | Impact Ana. Mode | Impact Ana. Method | Kind of test | Type | Target | Auto-mation |
Reference | Language | Impact Analysis | Kind of test | Type | Target | Automation | Analysis | ||
---|---|---|---|---|---|---|---|---|---|
\rowcolor{gray!25} Adamapoulos \etal | 2004 | \cite{adamopoulos2004overcome} | Fortan-77 | offline | unit | amplification | tests inputs | auto | genetic |
Halfond \etal | 2008 | \cite{halfond2008automated} | java, PHP, http,… | offline | ? | repair | calls in general | semi[fn:3] | static |
\rowcolor{gray!25} Memon \etal | 2008 | \cite{memon2008automatically} | all | offline | unit[fn:6] | repair | whole tests | semi[fn:4] | dynamic |
Thummalapenta \etal | 2009 | \cite{thummalapenta2009mseqgen} | java | offline | unit | augmentation | tests calls and parameters | semi[fn:3] | static |
\rowcolor{gray!25} Daniel \etal | 2010 | \cite{daniel2010test} | java, .NET | offline | unit | repair | whole tests | semi[fn:3] | symbolic |
Fraser \etal | 2011 | \cite{fraser2011evosuite} | java | offline | unit | generate | whole tests | semi | dynamic |
\rowcolor{gray!25} Robinson \etal | 2011 | \cite{robinson2011scaling} | java | offline | unit[fn:6] | generate | whole tests | auto | static |
Jin \etal | 2012 | \cite{jin2012bugredux} | C | N/A[fn:5] | unit | generate | tests calls and inputs | auto | dynamic |
\rowcolor{gray!25} Galeotti \etal | 2013 | \cite{galeotti2013improving} | java | N/A[fn:5] | unit | generation | tests calls and inputs | auto | symbolic |
Tonella \etal | 2014 | \cite{tonella2014interpolated} | all | offline | all | generate | tests event | semi[fn:3] | dynamic |
\rowcolor{gray!25} Mirzaaghaei \etal | 2014 | \cite{mirzaaghaei2014automatic} | java | offline | all | repair,generate | whole tests | auto | static |
Fraser \etal | 2014 | \cite{fraser2014automated} | java | N/A[fn:5] | unit | generate | tests calls and inputs | auto | static |
\rowcolor{gray!25} Khelladi \etal | 2017 | \cite{khelladi2017semi} | OCL | offline | N/A[fn:2] | repair | whole models and constraints | semi[fn:4] | static |
Kampmann \etal | 2019 | \cite{alex2019bridging} | web-python-sql-C stack | offline | unit | generate | whole tests | auto | dynamic |
We were able to extract some redeeming characteristics through the different approaches. As shown in Tables classification1 and classification2 most of the approaches that we found focus on Object Oriented languages. In particular they use the Class construct and heavy type systems available statically, like Java,.NET and C++. These approaches seem to correlates strongly with techniques such as static analysis and patterns recognition. Nonetheless some approaches do not rely on particular characteristics of languages in themselves, like class and static types but they rely on the runtime behavior of the program. These approaches use events at some points with dynamic analysis to produce behavioral models cite:jin2012bugredux,alex2019bridging,memon2008automatically. We also found a few approaches working Declarative constraints systems cite:khelladi2017semi, or on database language paradigms cite:zhang2011automatic,alex2019bridging.
The criterion of the degree of automation will be discussed several times in the next sections under other points of view. So we will be quick here with only a general comment on the tables classifying approaches.
In both Tables classification1 and classification2, the automation criterion refers to the approach in general and not only on evolution or co-evolution, as it was very difficult to distinguish both without trying to reproduce experiments.
Table classification1 shows a correlation between the granularity of changes and the automation of the corresponding approach, where approaches using composed changes require more manual intervention. The cause of this correlation seems to be that approaches using composed evolution are more complex although they can handle a greater variety of evolution.
All approaches considering evolution use some degree of composed changes, Dagenais et al. use some basic compositions in cite:dagenais2011recommending,dagenais2014using, for example, renaming is composed of the deletion of a name and the addition of a new one, here the authors consider the case of in-place renaming, so it is easier to infer the relation between the deletion and addition.
It should be noted that some security approaches which uses dynamic analysis and finite state machine inference are doing a special kind of change detection using a fixed initial state cite:jiang2006multiresolution. They actually try to detect behavioral changes from the runtime.
\cite{wang2017behavioral} | Behavioral Execution Comparison: Are Tests Representative of Field Behavior? |
\cite{jin2012bugredux} | BugRedux: reproducing field failures for in-house debugging |
\cite{alex2019bridging} | Bridging the Gap between Unit Test Generation and System Test Generation |
\cite{hindle2012naturalness} | On the naturalness of software |
\cite{jiang2006multiresolution} | Multiresolution Abnormal Trace Detection Using Varied-Length $ n $-Grams and Automata |
\cite{beschastnikh2013unifying} | Unifying FSM-inference algorithms through declarative specification |
\cite{tonella2014interpolated} | Interpolated n-grams for model based testing |
\cite{hebig2016approaches} | Approaches to co-evolution of metamodels and models: A survey |
\cite{khelladi2018change} | Change Propagation-based and Composition-based Co-evolution of Transformations with Evolving Metamodels |
\cite{khelladi2017semi} | A semi-automatic maintenance and co-evolution of OCL constraints with (meta) model evolution |
\cite{zaidman2008mining} | Mining software repositories to study co-evolution of production \& test code |
\cite{zaidman2011studying} | Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining |
\cite{levin2017co} | The co-evolution of test maintenance and code maintenance through the lens of fine-grained semantic changes |
\cite{gall2009change} | Change analysis with evolizer and changedistiller |
\cite{martinez2019coming} | Coming: a tool for mining change pattern instances from git commits |
\cite{levin2017boosting} | Boosting automatic commit classification into maintenance activities by utilizing source code changes |
\cite{schafer2008mining} | Mining framework usage changes from instantiation code |
\cite{andreasen2017survey} | A survey of dynamic analysis and test generation for JavaScript |
\cite{zhu1997software} | Software unit test coverage and adequacy |
\cite{mirshokraie2013efficient} | Efficient JavaScript mutation testing |
\cite{gyimesi2019bugsjs} | Bugsjs: A benchmark of javascript bugs |
\cite{richards2010analysis} | An analysis of the dynamic behavior of JavaScript programs |
\cite{anand2013orchestrated} | An orchestrated survey of methodologies for automated software test case generation |
\cite{pinto2012understanding} | Understanding myths and realities of test-suite evolution |
\cite{arcuri2011adaptive} | Adaptive random testing: An illusion of effectiveness? |
\cite{xu2010directed} | Directed test suite augmentation: techniques and tradeoffs |
\cite{marsavina2014studying} | Studying fine-grained co-evolution patterns of production and test code |
\cite{mirzaaghaei2014automatic} | Automatic test case evolution |
\cite{fraser2011evosuite} | EvoSuite: automatic test suite generation for object-oriented software |
\cite{daniel2010test} | On test repair using symbolic execution |
\cite{arcuri2007coevolving} | Coevolving programs and unit tests from their specification |
\cite{person2009differential} | Differential symbolic execution |
\cite{hassan2009predicting} | Predicting faults using the complexity of code changes |
\cite{dagenais2011recommending} | Recommending adaptive changes for framework evolution |
\cite{halfond2008automated} | Automated identification of parameter mismatches in web applications |
\cite{vcubranic2003hipikat} | Hipikat: Recommending pertinent software development artifacts |
\cite{xing2006refactoring} | Refactoring practice: How it is and how it should be supported-an eclipse case study |
\cite{levin2016using} | Using temporal and semantic developer-level information to predict maintenance activity profiles |
\cite{memon2008automatically} | Automatically repairing event sequence-based GUI test suites for regression testing |
\cite{thummalapenta2009mseqgen} | MSeqGen: Object-oriented unit-test generation via mining source code |
\cite{robinson2011scaling} | Scaling up automated test generation: Automatically generating maintainable regression unit tests for programs |
\cite{tsantalis2018accurate} | Accurate and efficient refactoring detection in commit history |
\cite{arcuri2008automation} | On the automation of fixing software bugs |
\cite{arcuri2008multi} | Multi-objective improvement of software using co-evolution and smart seeding |
\cite{arcuri2008novel} | A novel co-evolutionary approach to automatic software bug fixing |
\cite{papadakis2019mutation} | Mutation testing advances: an analysis and survey |
\cite{jia2010analysis} | An analysis and survey of the development of mutation testing |
\cite{adamopoulos2004overcome} * | How to overcome the equivalent mutant problem and achieve tailored selective mutation using co-evolution |
\cite{zhang2011automatic} | Automatic generation of load tests |
\cite{nistor2015caramel} ? | Caramel: Detecting and fixing performance problems that have non-intrusive fixes |
\cite{beller2015how} | When, how, and why developers (do not) test in their IDEs |
\cite{freeman2002software} | Software testing |
\cite{hedin2014jsflow} | JSFlow: Tracking information flow in JavaScript and its APIs |
\cite{dhondt2002coevolution} | Co-evolution of object-oriented software design and implementation |
\cite{leotta2013capture} | Capture-replay vs. programmable web testing: An empirical assessment during test case evolution |
Both Tables classification1 and classification2 show that the abstraction of choice is the class construct.
Some approaches make use of metadata to mine patterns in Content Versioning Systems (CVS). Zaidman et al. in cite:zaidman2008mining,zaidman2011studying mine co-evolution patterns in SVN commits, while Martinez et al. in cite:martinez2019coming mine co-evolution patterns in git commits.
We were also able to find some studies classifying changes. Here they use statistics and learning algorithms to predict the type of changes cite:marsavina2014studying,levin2016using,levin2017boosting . They combine the class abstraction with metadata from CVS that are analyzed through Natural Language Processing (NLP).
With the exception of Khelladi et al. in cite:khelladi2018change,khelladi2017semi who are able to detect changes online because models design has been historically supported in many graphical interfaces. Just as shown in Table classification1, most of the approaches that we found use offline detection. These approaches deal with contents that can be edited in many ways, making it difficult to change each editing mode. Thus, these articles rely either on file metadata and file diffs to detect changes cite:mirzaaghaei2014automatic,daniel2010test,halfond2008automated, metadata of CVS and blob differences cite:martinez2019coming,hassan2009predicting,dagenais2011recommending,vcubranic2003hipikat,tsantalis2018accurate, or behavioral differences cite:memon2008automatically,jiang2006multiresolution.
cite:gyimesi2019bugsjs
In this section we found many article doing tests co-evolution. We also found approaches that were not exactly co-evolving tests but are still relevant to consider. They do not call their approach co-evolution but they share many tools and algorithms. This increased variety of approaches could be beneficial to the internship.
- Analysis mode
- All the articles retained in the state of the art are doing offline impact analysis. There is therefore either no need for test co-evolution during code changes, or the current test co-evolution techniques are too expensive to react to each change.
- Analysis methods
- We found different methods of impact analysis.
Static analysis is the most wide spread type of analysis here, as shown in Tables classification1 and classification2. The causes of this distribution seem to be due to the large amount of semantic and structural information available in strongly typed object-oriented languages such as Java,
On the contrary, dynamic analysis does not appear to be very common, in fact, dynamic analysis is particularly suitable for highly dynamic and weakly typed languages, such as Javascript and Perl. But it requires to go down to the runtime of the program which causes a performance penalty and an increase in complexity. Nonetheless in cite:alex2019bridging Kampmann at al. use dynamic analysis to synthesize unit tests from system tests through the use of behavioral models and fsm inference algorithms.
Mirshokraie et al. in cite:mirshokraie2013efficient then in cite:mirshokraie2015jseft,mirshokraie2016atrina combine dynamic analysis and mutation testing to improve tests of Javascript programs.
Hybrid analysis seem to be in many future works cite:andreasen2017survey but we did not find approaches explicitly claiming it in the context of co-evolution.
We found no approaches claiming to be able to repair or generate system tests. So the hypothesis on the computational complexity of these approaches does not seem invalid.
Kampmann et al. in cite:alex2019bridging use system tests to generate new unit tests. Mirshokraie et al. in cite:mirshokraie2016atrina also use system tests but in the from of GUI tests to generate new unit tests.
Memon et al. in cite:memon2008automatically generate unit tests in the particular case of regression testing. Here the regression testing allows the creation of oracles from the program current behavior.
Khelladi et al. cite:khelladi2017semi do not co-evolve tests but a very close artifact. In fact, they co-evolve OCL, a declarative constraint language on models such as class diagrams. Here specifying constraints is very similar to specifying oracles.
Tonella et al. cite:tonella2014interpolated and Jiang et al. cite:jiang2006multiresolution use traces and fsm to construct a functional behavioral model of an application, then they generate new tests as skeletons of calls from paths in the fsm. Halfond et al. cite:halfond2008automated detect parameter mismatch in multi-languages systems. Dagenais et al. cite:dagenais2011recommending,dagenais2014using recommend alternatives for broken calls and for general references. Fraser et al. cite:fraser2014automated produce tests composed of calls and inputs from java generics.
Daniel et al. cite:daniel2010test compute new inputs for tests that maximize coverage through symbolic execution. Adamopoulos et al. cite:adamopoulos2004overcome amplify inputs through mutation testing and genetic algorithms. Zhang et al. cite:zhang2011automatic amplify tests for database systems through the use of symbolic execution and genetic algorithms.
Table classification2 shows that fully automated approaches generating (non regression) unit tests are not producing tests with oracles. With Kampmann et al. and Mirshokraie et al. as exceptions in cite:alex2019bridging and cite:mirshokraie2015jseft,mirshokraie2016atrina because these approaches borrow oracles from system tests (such as GUI tests) to generates unit tests. combine dynamic analysis and mutation testing to improve tests of Javascript programs. In facts, oracles are part of the application specification, thus there can not be automatically generated. To overcome this restriction, Mirzaaghaei et al. cite:mirzaaghaei2014automatic use oracles from other tests, Kampmann et al. cite:alex2019bridging run system tests with the same inputs as unit tests to reduce false positives triggered by oracles in unit tests (if an oracle from an unit test fails, the corresponding system test should also fail). Khelladi et al. cite:khelladi2017semi repair constraints which is very similar to repairing oracles.
Robinson et al. cite:robinson2011scaling use static analysis to generate tests then mutation testing to refine generated tests.
Kampmann et al. cite:alex2019bridging synthesize unit tests from system tests.
Mirzaaghaei et al. cite:mirzaaghaei2014automatic amplify and repair unit tests using carefully handcrafted patterns that matches certain evolution. However creating these pattern can be tedious. A partial solution to this problem could come from Khelladi et al. in cite:khelladi2018change, who are combining repairing rules to co-evolve models given changes in metamodels.
Daniel et al. cite:daniel2010test repair tests using symbolic execution, more specifically they focus on repairing string literals. Memon et al. cite:memon2008automatically repair regression GUI tests, more specifically they repair the sequence of GUI events, sometimes it needs manual interventions when the approach does not find an appropriate resolution.
.
cite:thummalapenta2009mseqgen | generate |
cite:robinson2011scaling | generate |
cite:arcuri2008multi | generate |
cite:arcuri2008novel | generate |
cite:zhang2011automatic | generate |
cite:wang2017behavioral | generate |
cite:jin2012bugredux | generate |
cite:alex2019bridging | generate |
cite:andreasen2017survey | generate |
cite:anand2013orchestrated | generate |
cite:xu2010directed | generate |
cite:marsavina2014studying | generate |
cite:fraser2011evosuite | generate |
cite:arcuri2007coevolving | repair |
cite:levin2017co | repair |
cite:khelladi2017semi | repair |
cite:khelladi2018change | repair |
cite:mirzaaghaei2014automatic | repair |
cite:daniel2010test | repair |
cite:arcuri2008automation | repair |
cite:halfond2008automated | repair |
cite:memon2008automatically | repair |
kind | my | year | main | artifacts relations | parad | ref | language | analysis | objectives | test | histo | dyngranu | granularity | abstraction | detection | target | automation | type | impact | usable thg | T | compare_eval | eval objects, resources | impact Sci | num | reading issues |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
survey | M1, compare 4 tools | 2017 | wang | system test → unit test | E | cite:wang2017behavioral | java | dynamic | find best metric | unit | no | 2 events | / | instruction | / | calls | / | generate | offline | / | TT | coverage,mutation,temporal invariant | JetUML,Log4j,Common {IO,Lang} | more than mutation t | ||
technical | M1, used LLVM | 2012 | jin | production → test | E | cite:jin2012bugredux | C | dynamic | repro fail in house | unit | no | events | / | class-flow graph | / | calls+inputs | auto | generate | no | tool(avail) | time space overhead, eff{ctivi,icien} | [16,23]->SIR[21],BugBench[22],exploit-db[23] | in house reprod | |||
technical | M1,Kim et al.[12] | 2019 | alex | system test → unit test | D | cite:alex2019bridging | web/python/sql/C | dynamic | param unit test ?? | unit | no | failure | / | flow graph | / | all | auto | generate | dyntest | proto | coverage,coverage over time,lifting | GNU coreutils,sed,dc | accu of sys to unit | |||
survey | 2016 | hebig | cite:hebig2016approaches | co-evo approach | / repair | 10 | ||||||||||||||||||||
technical | init | 2018 | khelladi | metamodel → model | C | cite:khelladi2018change | UML like | static | compose resol | / | yes | no | composed | class | online | model | auto,semi | repair | offline | tool | M | correctness | many models | 0 | ||
technical | init | 2017 | khelladi | model/constraint OCL |
C,D | cite:khelladi2017semi | OCL | static | also co-evolve OCL | / | yes | no | composed | class | online | model/constr | auto,semi | repair | offline | tool | M | 1 | ||||
survey | googleS | 2017 | andreasen | runtime->test | Dy | cite:andreasen2017survey | js | dynamic | test gen js | all | generate | |||||||||||||||
book | test coverage, criterion | 1997 | zhu | cite:zhu1997software | base unit test cov | ??gen | ||||||||||||||||||||
technical | googleS,mutation | 2013 | mirshokraie | code | Dy | cite:mirshokraie2013efficient | js | static/dyn | mut, fast/eval test | all | no | / | call graph | / | / | auto | / | ??offl | tool | non-equiv mutant,fault severity | SimpleCart,JQuery,… | |||||
benchmark | googleS | 2019 | gyimesi | Dy | cite:gyimesi2019bugsjs | js | / | bench things | / | / | / | / | / | / | / | / | / | / | bench | |||||||
analysis | googleS | 2010 | richards | Dy | cite:richards2010analysis | js | how dyn js work | / | ||||||||||||||||||
survey | test gen | 2013 | anand | I,OO | cite:anand2013orchestrated | java | random | find new tests | generate | G | ||||||||||||||||
study | 2012 | pinto | test ? | cite:pinto2012understanding | debunk | ??/ | 9 | |||||||||||||||||||
study | 2011 | arcuri | cite:arcuri2011adaptive | debunk | ??/ | |||||||||||||||||||||
study | 2010 | xu | code → test | I | cite:xu2010directed | C | gene,symb | augmentation | unit | yes | regression | / | branch | all | auto | generate | offline | from SIR | ||||||||
study | googleS | 2014 | marsavina | production → pattern->test | OO | cite:marsavina2014studying | java | static | ana-mine-fix | yes | composed | all,branch cover | offline | all | >manu | generate | ??off | T | CommonsLang,CommonsMath,Gson,PMD,JFreeChart | 5 | ||||||
technical | init | 2014 | mirzaaghaei | code → test | C,I | cite:mirzaaghaei2014automatic | java | static | repair,8 co-evo pat | Unit/all | yes | no | atomic?? | class | offline | all | auto | repair | offline | algo,tool | T | apply freq, repair effectiveness | JodaTime,Barbecue,JfreChart,PDM,Xstream | handle java patterns | 3 | |
technical | googleS | 2011 | fraser | cove->test | OO | cite:fraser2011evosuite | java | dynamic | gen test suite | ?? unit | no | flow graph | generate | tool | ||||||||||||
technical | 2010 | daniel | symbolicExec → test | I,OO | cite:daniel2010test | java, .NET | symbolic | repair | unit | repair | ||||||||||||||||
technical | 2007 | arcuri | cite:arcuri2007coevolving | static | co-evo | ??all | ||||||||||||||||||||
phd dissert | 2009 | person | symbolicExec | cite:person2009differential | symbolic | symbolic exec | / | |||||||||||||||||||
technical | 2009 | hassan | code | cite:hassan2009predicting | predicting | ??/ | ||||||||||||||||||||
technical | ? | 2011 | dagenais | code-< ?? | cite:dagenais2011recommending | recommending | recommend | ??/ | 7 | |||||||||||||||||
technical | 2008 | halfond | cite:halfond2008automated | param mismatch id | ?? repair | |||||||||||||||||||||
technical | cite:khelladi2018change | 2003 | vcubranic | cite:vcubranic2003hipikat | ??recommendation | ?? / | ||||||||||||||||||||
study | 2006 | cite:xing2006refactoring | refactoring how wha | ?? / | ||||||||||||||||||||||
technical | 2016 | levin | cite:levin2016using | predict maintenance | ??/ | |||||||||||||||||||||
technical | 2008 | memon | ? → test | cite:memon2008automatically | repair | regression | repair | |||||||||||||||||||
technical | 2009 | thummalapenta | code → test | cite:thummalapenta2009mseqgen | generation | unit | generate | |||||||||||||||||||
technical | 2011 | robinson | ? ->test | cite:robinson2011scaling | generation | regression unit | generate | |||||||||||||||||||
technical | 2018 | tsantalis | code → ? | I,OO | cite:tsantalis2018accurate | java | pattern,none | detecting ?? ?? git | ?? / | C | ||||||||||||||||
?? technical | evolutionary testing | 2008 | arcuri | cite:arcuri2008automation | fix bugs | repair | ||||||||||||||||||||
technical | 2008 | arcuri | cite:arcuri2008multi | improvement | ??generate | |||||||||||||||||||||
co-evolutionary | 2008 | arcuri | cite:arcuri2008novel | co-evolutionary | ??generate | |||||||||||||||||||||
survey | 2019 | papadakis | cite:papadakis2019mutation | prove advances mut | / | MT | ||||||||||||||||||||
survey | tools | 2010 | jia | cite:jia2010analysis | Java,C,C++,… | prove domain growth | / | MT | ||||||||||||||||||
technical | mutation testing | 2004 | adamapoulos | mutant <-> test | cite:adamopoulos2004overcome * | mut test | / | rise of mut with GAs | ||||||||||||||||||
gen,evol,symbolic | 2011 | zhang | cite:zhang2011automatic | SQL | generation | load test | generate | |||||||||||||||||||
2015 | nistor | cite:nistor2015caramel ? | code | fix code |
kind | type | my | year | main | artifacts relations | parad | ref | language | analysis | objectives | test | histo | dyngranu | granularity | abstraction | detection | target | automation | impact | usable thg | T | compare_eval | eval objects, resources | impact Sci | num | reading issues |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
survey | generate | M1, compare 4 tools | 2017 | wang | system test → unit test | E | cite:wang2017behavioral | java | dynamic | find best metric | unit | no | 2 events | / | instruction | / | calls | / | offline | / | TT | coverage,mutation,temporal invariant | JetUML,Log4j,Common {IO,Lang} | more than mutation t | ||
technical | generate | M1, used LLVM | 2012 | jin | production → test | E | cite:jin2012bugredux | C | dynamic | repro fail in house | unit | no | events | / | class-flow graph | / | calls+inputs | auto | no | tool(avail) | time space overhead, eff{ctivi,icien} | [16,23]->SIR[21],BugBench[22],exploit-db[23] | in house reprod | |||
technical | generate | M1,Kim et al.[12] | 2019 | alex | system test → unit test | D | cite:alex2019bridging | web/python/sql/C | dynamic | param unit test ?? | unit | no | failure | / | flow graph | / | all | auto | dyntest | proto | coverage,coverage over time,lifting | GNU coreutils,sed,dc | accu of sys to unit | |||
technical | repair | init | 2018 | khelladi | metamodel → model | C | cite:khelladi2018change | UML like | static | compose resol | / | yes | no | composed | class | online | model | auto,semi | offline | tool | M | correctness | many models | 0 | ||
technical | repair | init | 2017 | khelladi | model/constraint OCL |
C,D | cite:khelladi2017semi | OCL | static | also co-evolve OCL | / | yes | no | composed | class | online | model/constr | auto,semi | offline | tool | M | 1 | ||||
survey | generate | googleS | 2017 | andreasen | runtime->test | Dy | cite:andreasen2017survey | js | dynamic | test gen js | all | |||||||||||||||
survey | generate | test gen | 2013 | anand | I,OO | cite:anand2013orchestrated | java | random | find new tests | G | ||||||||||||||||
study | generate | 2010 | xu | code → test | I | cite:xu2010directed | C | gene,symb | augmentation | unit | yes | regression | / | branch | all | auto | offline | from SIR | ||||||||
study | generate | googleS | 2014 | marsavina | production → pattern->test | OO | cite:marsavina2014studying | java | static | ana-mine-fix | yes | composed | all,branch cover | offline | all | >manu | ??off | T | CommonsLang,CommonsMath,Gson,PMD,JFreeChart | 5 | ||||||
technical | generate | googleS | 2011 | fraser | cove->test | OO | cite:fraser2011evosuite | java | dynamic | gen test suite | ?? unit | no | flow graph | tool | ||||||||||||
technical | repair | init | 2014 | mirzaaghaei | code → test | C,I | cite:mirzaaghaei2014automatic | java | static | repair,8 co-evo pat | Unit/all | yes | no | atomic?? | class | offline | all | auto | offline | algo,tool | T | apply freq, repair effectiveness | JodaTime,Barbecue,JfreChart,PDM,Xstream | handle java patterns | 3 | |
technical | repair | 2010 | daniel | symbolicExec → test | I,OO | cite:daniel2010test | java, .NET | symbolic | repair | unit | ||||||||||||||||
?? technical | repair | evolutionary testing | 2008 | arcuri | cite:arcuri2008automation | fix bugs | ||||||||||||||||||||
technical | ??all | 2007 | arcuri | cite:arcuri2007coevolving | static | co-evo | ||||||||||||||||||||
technical | ?? repair | 2008 | halfond | cite:halfond2008automated | param mismatch id | |||||||||||||||||||||
technical | repair | 2008 | memon | ? → test | cite:memon2008automatically | repair | regression | |||||||||||||||||||
technical | generate | 2009 | thummalapenta | code → test | cite:thummalapenta2009mseqgen | generation | unit | |||||||||||||||||||
technical | generate | 2011 | robinson | ? ->test | cite:robinson2011scaling | generation | regression unit | |||||||||||||||||||
technical | ??generate | 2008 | arcuri | cite:arcuri2008multi | improvement | |||||||||||||||||||||
??generate | co-evolutionary | 2008 | arcuri | cite:arcuri2008novel | co-evolutionary | |||||||||||||||||||||
generate | gen,evol,symbolic | 2011 | zhang | cite:zhang2011automatic | SQL | generation l inputs | load test | |||||||||||||||||||
fix code | 2015 | nistor | cite:nistor2015caramel ? | code | ||||||||||||||||||||||
The problem of co-evolving software has been tackled by many researchers.
For the co-evolution of models, problems have been extensively investigated, Hebig et al. propose a survey cite:hebig2016approaches.
For the co-evolution of tests, the research is much more sparse, such that there is to our knowledge, no survey on co-evolution of code and tests. Nonetheless, there are some exploratory studies on the co-evolution of code and tests, where the evolution of tests are empirically accessed in software life-cycles cite:leotta2013capture,zaidman2008mining,zaidman2011studying.
There exists also neighbor works to the co-evolution of code and tests, be it on test generation or mutation testing. In cite:anand2013orchestrated, Anand et al. surveyed recent test generation techniques, while in cite:andreasen2017survey, Andreasen et al. showed the difficulties of test generation in dynamic languages. Mutation testing shares some tools and techniques with the co-evolution of code and tests, in cite:jia2010analysis Jia et al organize different such tools.
In this state of the art we have shown a large variety of approaches to the co-evolution code and tests. We have seen a majority of approaches working on Java and mostly richly typed OO programming. With approaches capable to co-evolve more and more different evolutions. Yet they do not consider complex evolution. Moreover, recent works have tried to tackle more challenging languages constraints such as weakly typed and dynamic language (javascript). In the particular case of test generation. But we did not find any approaches capable of co-evolving such challenging languages.
As future perspectives this state of the art could lead to a survey, as more time would allow a more systematic review of the field along with check the availability of tools. One first objective would be to check the feasibility of the co-evolution of tests in context were static type information is more scarce like in dynamic languages. Some of these languages are very popular for their flexibility. But their lack of readily available type information makes it harder to analyze. Part of this difficulty seems to have been mitigated by incremental type systems. Thus, we hope that more incremental approaches to the co-evolution of code and tests would allow making use of tests to further analyze code which will then allow to further improve tests. Another objective would be to address tests co-evolution for real world complex evolution.
bibliographystyle:plain bibliography:references.bib
- pdf Djamel E. Khelladi, Roland Kretschmer, Alexander Egyed: Change Propagation-based and Composition-based Co-evolution of Transformations with Evolving Metamodels. MODELS 2018.
- pdf Djamel E. Khelladi, Reda Bendraou, Regina Hebig, Marie-Pierre Gervais: A semi-automatic maintenance and co-evolution of OCL constraints with (meta)model evolution. JSS 2017.
- df Mirzaaghaei, M., Pastore, F., & Pezzè, M. Automatic test case evolution. Software Testing, Verification and Reliability, 24(5), 386-411. 2014.
- pdf Levin, S., & Yehudai, A. The co-evolution of test maintenance and code maintenance through the lens of fine-grained semantic changes. In IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 35-46). IEEE. 2017.
- pdf Zaidman, A., Van Rompaey, B., van Deursen, A., & Demeyer, S. Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empirical Software Engineering Journal, 16(3), 325-364. 2011.
- pdf Co-evolution of object-oriented software design and implementation, T D’Hondt, K De Volder, K Mens, R Wuyts - Software Architectures and …, 2002 - Springer more
- Mining software repositories to study co-evolution of production & test code, A Zaidman, B Van Rompaey, S Demeyer… - … on software testing …, 2008 - ieeexplore.ieee.org more
- pdf Mirshokraie, Shabnam, Ali Mesbah, and Karthik Pattabiraman. “Efficient JavaScript mutation testing.” 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. IEEE, 2013.
- pdf Andreasen, Esben, et al. “A survey of dynamic analysis and test generation for JavaScript.” ACM Computing Surveys (CSUR) 50.5 (2017): 66.
- pdf Gyimesi, Péter, et al. “Bugsjs: A benchmark of javascript bugs.” 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, 2019.
- pdf Richards, Gregor, et al. “An analysis of the dynamic behavior of JavaScript programs.” ACM Sigplan Notices. Vol. 45. No. 6. ACM, 2010.
- A Trusted Mechanised JavaScript Specification
- Capture-Replay vs. Programmable Web Testing: An Empirical Assessment during Test Case Evolution
- On the Use of Usage Patterns from Telemetry Data for Test Case Prioritization Tests improvements
- Behavioral Execution Comparison: Are Tests Representative of Field Behavior? paper using synoptic
- https://github.com/INRIA/intertrace
- https://people.inf.ethz.ch/suz/publications/natural.pdf https://github.com/labri-progress/naturalness-js application of natural language processing to computer software
- Bridging the Gap between Unit Test Generation and System Test Generation feedback loop
- http://ceur-ws.org/Vol-971/paper21.pdf
- http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=877A01775995830BB127116FB11BAB49?doi=10.1.1.323.3411&rep=rep1&type=pdf
- Lossless compaction of model execution traces
- https://livablesoftware.com/conflictjs-javascript-libraries-conflicts/
This section will focus on the methods used to extract useful information from programs.
The main point of analyzing the program here is to measure the impact of changes, being capable of measuring it allow to find test that need to be repaired or relaunched.
Analyzing code can also be usefull to havrest data and patterns cite:hindle2012naturalness that will allow to better amplify tests. In addition to static analysis, using the history of changes and the behavior of the program during test might prove to allow improvements to the precision and performance of programming assistants.
In the general case, analyzing programs is difficult. The whole stack from an algorithms to run is complex and diverse. In Effect there is many programming languages that use different paradigms. For each language many parsers and compilers exist. There is also many runtime and intermediate representations. It is thus important to find points in this stack where analysis are the most efficient.
The static analysis is often the first choice when one want to analyze a particular program or project. In the best case scenario a static analysis can prove properties of a program for any given inputs. Most domains of science and industry that needs to prove properties use language with rich types systems. But annotating programs can be tedious and lead to bugs. That is why analysis tools make heavy use of type inference to lighten the burden of type annotating. Yet type inference have its own limits as uncertainties lower the quality of types through the program. Refining those uncertainties is a major point to improve software quality.
Static analysis makes semantic models(class diag,type sys,…). Dynamic analysis makes functional models (fsm,memory,…).
Even if rich type systems are very useful for analysis, programs heavily constrained by types are less flexible, demand more code and use more complex artifacts to alleviate types overhead. There is an obvious trade-off between development flexibility and ease of analysis. Making use of runtime can disambiguate uncertainties through programs and ensure properties with more precision. Combining both static and dynamic analysis offer the possibility to further improve code quality while improving flexibility.
It requires type information (annotated or inferred). It can check properties on infinite domains in an exhaustive way. Prove to be efficient on simple programs but able to accept a large number of inputs. Type systems can be languages that don’t have explicit annotated types, it is nonetheless possible to use type rules e.g. mono-type in C with everything is an int, can check for null dereferencing (that is dereferencing 0). To improve robustness and flexibility most analysis tools have types that match all types and types that match no types, in practice is allow incremental typing and type inference.
Many tools exist to analyze programs statically, most of them only work on one language (typescript, compCert, spoon) while some try to be more agnostic (llvm, semantic, pandoc). Focusing on one language allow finer analysis but might not scale to multilanguage projects. While tools handling multiple language might work better on multilanguage projects, to leverage the quantity of work for each language such tools need an intermediate representations of programs.
Particularly suitable for highly dynamic and not very typified languages. Cannot provide absolute guarantees on an infinite domain. As close as possible to the actual use of the program. Effective on potentially complex programs but accepting few inputs.
JSFlow cite:hedin2014jsflow. cite:richards2010analysis. cite:andreasen2017survey. cite:jiang2006multiresolution. cite:beschastnikh2013unifying.
In many future work of articles in the field, and in some minor contributions. Supports static analysis by providing information that is easily accessible to the runtime. Supports dynamic analysis by directing it to the sensitive points detected during static analysis. Use tests to collect information at runtime and improve inferences from static analysis. Use static analysis to detect pieces of sensitive programs and test and instrument them to better understand them and detect bugs. cite:andreasen2017survey
Changer la syntaxe d’un programme tout en tentant concever la même semantique de façon à tester des cas particuliers et rendre le code plus robuste. cite:mirshokraie2013efficient
- {Symbolic, Concollic, Abstract} Execution
- Executing a program on abstract values, opposed to concrete execution.
- Mutation Testing
- modify tested code during tests to run tests faster, while keeping the bug kill high. Notion of killed and surviving mutant. Also a way of measuring tests quality through the introduction of bugs. Originally proposed by Hamlet in “Testing programs with the aid of a computer” IEEE SE 3 (1977).
- Search Based Software Engineering (SBSE)
- search algo are used to maximize test goals and reduce testing costs.
- Search Based Software Testing (SBST)
- is a branch of SBSE. expl in 7.1 of cite:anand2013orchestrated.
- Dynamic Symbolic Execution (DSE)
- can be mixed with SBST.
- {{Statement,Branch,Path} coverage, Mutation Adequacy}
- Related to the notion of test adequacy cite:zhu1997software.
- {Fonctional,Semantic} model
- ?? way of representing things
- {State Based} modeling
- ??
- The infeasibility problem of model based testing
- ??
- LMP
- ?? see dhondt
- Aspect Oriented Programming (AOP)
- ?? dhondt
- Depth First Order (DFO)
- Come from dataflow analysis domain.
- Co-evolution of code and test
- bidirectional
- corrective, perfective, and adaptive change
- as defined by Mockus et al. in “Identify reasons for software changes using historic databases”, 2000
- adequacy
- Memon et al. 2001
Java. While our ideas and the repair process easily gen- eralize to other languages and test frameworks, there is a substantial amount of engineering necessary to reimplement ReAssert for another language. – cite:daniel2010test
Discussion on the internship subject in relation to Research Questions (to focus objectives) then on the bibliographic report (constraints from head of M2 and methodology). For the methodology, the reading of paper is standard see RAS module and Martin Quinson personal page. Moreover I should use some search engine to find paper in a somewhat reproducible way then filter, exploring through related works is also useful.
- just want move function at this point
read Djamel E. Khelladi, Reda Bendraou, Regina Hebig, Marie-Pierre Gervais: A semi-automatic maintenance and co-evolution of OCL constraints with (meta)model evolution. JSS 2017.
challenges of OCL: > the existence of multiple and semantically different resolutions pas consistent avec UML dans certains cas (nombres de refs). > a resolution can be applicable only to a subset of OCL constraints
The 2018 paper is more mature.
read Djamel E. Khelladi, Roland Kretschmer, Alexander Egyed: Change Propagation-based and Composition-based Co-evolution of Transformations with Evolving Metamodels. MODELS 2018.
Diff on some kind of extended UML models (with OCL constraints) to mine transformation rules. Those rules can be composed and applied in particular patterns to properties. change propagation ~ co-evolution
diff should not be enough to grasp composed changes (with a naive diff a move is an add and a del)
Overall approach shown in figure 3 is realy interesting, might be adapted to what I want to do globaly, need to be adapted to code Taking tables and I will try to add things on code analysis and dynamic analysis.
read Mirzaaghaei, M., Pastore, F., & Pezzè, M. Automatic test case evolution. Software Testing, Verification and Reliability, 24(5), 386-411. 2014.
TestCareAssitant Good intro This article introduces eight test evolution algorithms that automatically generate test cases for the identified test evolution scenarios. The algorithms take as input the original and the modified versions of the software and the set of test cases used to validate the original version, and generate a set of test cases for the modified version.
Evolution of the tests of a given class based on the tests of the parent and sibling class.
Model based techniques use abstract models of either the software behaviour or its environment to generate test cases [5], while code based approaches generate test cases from the software source code [6, 7]. Although approaches of both types generate executable test cases with oracles that checks the runtime software behaviour, the two classes of approaches present different practical limitations: model based approaches need specifications that require much effort to be developed and kept up to date, while code based approaches produce test cases that may not be easily readable and may be hard to evaluate for developers [8].
- Utting M, Pretschner A, Legeard B. A taxonomy of model-based testing approaches. Software Testing, Verification
and Reliability August 2012; 22(5):297–312. DOI: 10.1002/stvr.456.
- Ali S, Briand LC, Hemmati H, Paanesar-Walawege RK. A systematic review of the application and empirical investigation
of search-based test-case generation. IEEE Transactions on Software Engineering 2010; 36(6):742 –762. DOI: 10.1109/TSE.2009.52.
- Cadar C, Godefroid P, Khurshid S, P˘as˘areanu CS, Sen K, Tillmann N, Visser W. Symbolic execution for software
testing in practice: preliminary assessment. ICSE’11: Proceedings of the 33rd International Conference on Software Engineering, Waikiki, Honoulu, Hawaii, USA, ACM, 2011; 1066–1071. DOI: 10.1145/1985793.1985995.
- Jagannath V, Lee YY, Daniel B, Marinov D. Reducing the costs of bounded-exhaustive testing. FASE ’09: Proceedings
of the 12th International Conference on Fundamental Approaches to Software Engineering, Amsterdam, Springer-Verlag, 2009; 171–185. DOI:10.1007/978-3-642-00593-0_12.
Automatic test case generation techniques usually do not identify the setup actions necessary to execute the test cases, and tend to generate a huge amount of test cases without distinguishing among valid and invalid inputs thus causing many false alarms. Furthermore, automatically generated test inputs are often hard to read and maintain, and their practical applicability is limited to either the regression testing or the detection of unexpected exception conditions [4].
- Robinson B, Ernst MD, Perkins JH, Augustine V, Li N. Scaling up automated test generation: automatically
generating maintainable regression unit tests for programs. ASE’11: Proceedings of the 26th International Conference on Automated Software Engineering, Lawrence, KS, USA, IEEE Computer Society, 2011; 23 –32. DOI: 10.1109/ASE.2011.6100059.
read Levin, S., & Yehudai, A. The co-evolution of test maintenance and code maintenance through the lens of fine-grained semantic changes. In IEEE International Conference on Software Maintenance and Evolution (ICSME) (pp. 35-46). IEEE. 2017.
- how to make evolution append
- what kind of change appened
Big data approach with spark.
- Corrective
- fix faults
- Perfective
- improve sys and design
- Adaptive
- introduce new features
read Zaidman, A., Van Rompaey, B., van Deursen, A., & Demeyer, S. Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empirical Software Engineering Journal, 16(3), 325-364. 2011.
Index test by functions it called during previous run. Here in JS functions are enough because it’s the main way of branching between complexe chunck of code. Using parameters of functions (maybe global variables values can be put in a similar data structure (not that asynchrony is a form of function call)) it is possible to more precise on the impact of some changes (a function can take different path depending on the context (parameters)). Use some metric and an order to get more relevant test first. Make a diff to get functions directily modified. Get test through the index with modified functions. Caution with memory shared with workers (multithreading).
Generate new tests consiting of a sequence of calls synthetised from in field execution traces that are not in unit tests execution traces.
Evolution based on types are difficult on loosely typed languages.
Move function to another file, move tests to relevent place (some kind of metric between functions and tests?)
Rename function, easy in most cases (almost work with standard tools in js)
Delete function, find tests only testing this function, if it test something else try to apply the same method as function moving.
Function member, think about how this
is handled.
Execute tests impacted by change then:
Find subseq of traces that are not executed anymore
Make a prototype out of the idea of general co-evolution using dynamic analysis. Read paper more in depth. Find other papers.
Use a counter of finished function, that is incremented when an instrumented function is finished and is reset to 0 when a call to an instrumented function is made, add a new column to the call table or a new kind of entry. Very low cost. 0 for a given call mean that its inside the previous function called what about async features. easier to put something in the frame? to match entrances and exitsA high-performance monitor would ideally be integrated in an existing JavaScript runtime, but they are fast moving targets and focused on advanced performance optimizations. For this reason we have instead chosen to implement our prototype in JavaScript. We believe that our JavaScript implementation finds a sweetspot between implementation effort and usability for research purposes. Thus, performance optimization is a non-goal in the scope of the current work
One promising approach is to use a hybrid analysis, where a static information flow analysis is used to approximate the locations in need of upgrade before entering a secret context.
Chugh et al. [6] present a hybrid approach to handling dynamic execution. Their work is staged where a dynamic residual is statically computed in the first stage, and checked at runtime in the second stage.
read pdf Mirshokraie, Shabnam, Ali Mesbah, and Karthik Pattabiraman. “Efficient JavaScript mutation testing.” 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation. IEEE, 2013.
read pdf Andreasen, Esben, et al. “A survey of dynamic analysis and test generation for JavaScript.” ACM Computing Surveys (CSUR) 50.5 (2017): 66.
Amazing to explain challenges of sloppy languages
read pdf Gyimesi, Péter, et al. “Bugsjs: A benchmark of javascript bugs.” 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, 2019.
read pdf Richards, Gregor, et al. “An analysis of the dynamic behavior of JavaScript programs.” ACM Sigplan Notices. Vol. 45. No. 6. ACM, 2010.
- need to identify nodes in traces (the host app should have that)
- need to piggy bag or do independanly transmit vector clock through between nodes
Partial orders of event can represent any program in parallel/event systems. Can simplify the behavior of a program in event based systems, the sequential representation of event with an automata is vastly more complicated than each equvalent automata.
Here the dynamic analysis comes on top of static analysis (SA), mainly to improve knowledge about symbols in the source code. That is in the case of a call to a function getting the position of its declaration. But it can also get things on access to variables or fields using for example Proxies (here I think about javascript, might be tricky on non-interpreted programs). This idea come from the fact that in the general case symbolic analysis on source code is difficult, semantic from github try to achieve that but is not very accurate. But there exist many static analyzer capable of linking symbols but they are language spécific (typescript SA from microsoft work pretty well but might be slow) In the context of co-evolution, shortening the loop between code update, test run and test fix might prove to be beneficial to the analysis of source code almost independant of programming languages. Simetrically improving knowledge on source code will allow to design better tests and dettect the impact of given changes. Obviously the limitations of testing (non exhaustive) and dynamic analysis (runtime overhead) apply to this method. But it is incremental, easy to implemente (juste instrument some code like declarations (see m1 internship))
let x = true
function f() { if(x) g()}
function g() {}
// TEST 1
f()
g()
// TEST 2
x = false
f()
g()
// TEST 1 f g g // TEST 2 f g
// TEST 1 :5:1 :2:0 :2:14 :3:0 :6:1 :3:0 // TEST 2 :9:1 :2:0 :10:1 :3:0
What can I get at runtime out of a stack trace?
- given single thread asynchrony (events)
- multi treading
Is trace + link + SA enough to differentiate a nested call from a sequential call? Is trace in / out of decl better?
- need to use try/finally, what overhead?
Using diffs and branches (calls, conditions) get lines of codes impacted by changes.
- Synthetize new tests from taces, with behavioral models for example. Even prefill function parameters
- Remove dead code, it would be more of an indication because this is no exostive method.
- Sort tests by comparing behavior models of tests and usage. Thus executing tests that have an actual use.
- Prioritarly execute tests impacted by recent changes.
- Provide goto declarations from symbols, and revertly.
- Statistics for given symbols (function usage (in tests, in field))
evaluate if following assumption can hold: changes handled by co-evolution are mostly sintactic not functional nor semantic
- title chosen
- plan at the section level
- coverage survey
- his survey on models’ co-evolution
- survey on types of tests
llvm semantic pandoc
function f(a){
g0()
g01(g02())
if(g1()){
g2()
g3()
}else{
g4()
if(g5())
return g51()
}
g6()
return g7()
}
Automata file:1:0:10:1 f { <start> -> 2:5 // g0 2:5 -> 3:10 // g0 -> g02 3:10 -> 3:6 3:6 -> 4:8 4:8 -> 5:7 5:7 -> 6:7 4:8 -> 8:7 8:7 -> 9:7 9:7 -> 10:14 10:14 -> <fin> 6:6 -> 12:5 12:5 -> 13:12 13:12 -> <end> }
f<start> -> 2:5 g0 ->* g0 2:5 -> 3:10 g02 ->* g02 3:10 -> 3:6 g01 ->* g01 3:6 -> 4:8 g1 ->* g1 4:8 -> 5:7 -> 8:7
function f(a){ instrument(g0,g01,g02,g1,g3,g4,g5,g51,g6,g7) ...
keep private data local. improve confidence of user about software quality. detect more bugs
- axis
- type of artifact
- vertex of a tetrahedron
in levin201X
classify what should be co-evolved or not?
Update string literals used in oracles.
Java. While our ideas and the repair process easily gen- eralize to other languages and test frameworks, there is a substantial amount of engineering necessary to reimplement ReAssert for another language. – cite:daniel2010test
ReAssert
Our work applies symbolic execution to the domain of test repair and attempts to find tests that pass. Most other ap- plications of symbolic execution take the opposite approach: they attempt to find test failures [5, 37, 39, 49]. Other re- searchers have applied symbolic execution to invariant de- tection [12, 29], security testing [19, 30, 53], string verifica- tion [54], and a host of other domains.
f(4)
function f(){}
f(9)
move + rename = no trivial co-evolution
g(4)
f(9)
function g(){}
but if we make use of the change of f(4)
to g(4)
it is possible to infer the relation f -> g
see related file
- Khelladi
- Tsantalis
- Coming
- RefMiner
- RefDiff
be cautious about limiting file system accesses
make the web IDE interactive between code and graph (and maybe changes (the graph might replace it completely))
Local Variables: eval: (require ‘ox-extra) eval: (ox-extras-activate ‘(ignore-headlines)) eval: (setq org-confirm-babel-evaluate nil) eval: (org-babel-do-load-languages ‘org-babel-load-languages ‘( (shell . t) (R . t) (perl . t) (ditaa . t) (typescript . t) (js . t) )) eval: (setq org-latex-listings ‘minted) eval: (add-to-list ‘org-latex-packages-alist ‘(“” “minted”)) eval: (setq org-src-fontify-natively t) eval: (setq org-image-actual-width ‘(600)) eval: (unless (boundp ‘org-latex-classes) (setq org-latex-classes nil)) eval: (setq org-latex-with-hyperref nil) eval: (add-to-list ‘org-latex-classes ‘(“llncs” “\documentclass{llncs}\n \[NO-DEFAULT-PACKAGES]\n \[EXTRA]\n” (“\section{%s}” . “\section*{%s}”) (“\subsection{%s}” . “\subsection*{%s}”) (“\subsubsection{%s}” . “\subsubsection*{%s}”) (“\paragraph{%s}” . “\paragraph*{%s}”) (“\subparagraph{%s}” . “\subparagraph*{%s}”))) eval: (add-to-list ‘org-latex-classes ‘(“sdm” “\documentclass{sdm}\n \[NO-DEFAULT-PACKAGES]\n \[EXTRA]\n” (“\section{%s}” . “\section*{%s}”) (“\subsection{%s}” . “\subsection*{%s}”) (“\subsubsection{%s}” . “\subsubsection*{%s}”) (“\paragraph{%s}” . “\paragraph*{%s}”) (“\subparagraph{%s}” . “\subparagraph*{%s}”))) eval: (defun delete-org-comments (backend) (loop for comment in (reverse (org-element-map (org-element-parse-buffer) ‘comment ‘identity)) do (setf (buffer-substring (org-element-property :begin comment) (org-element-property :end comment))”“))) eval: (add-hook ‘org-export-before-processing-hook ‘delete-org-comments) eval: (setq org-latex-pdf-process (list “latexmk -bibtex -shell-escape -f -pdf %F”)) End: