diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
index 08fc76730..027d689b5 100644
--- a/.github/workflows/deploy.yml
+++ b/.github/workflows/deploy.yml
@@ -8,10 +8,6 @@ on:
- '**/src/**'
- '**/pom.xml'
- 'pom.xml'
-
- # Publish `v1.2.3` tags as releases.
- tags:
- - v*
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
diff --git a/README.md b/README.md
index 036a6ce9a..90b90a5a5 100644
--- a/README.md
+++ b/README.md
@@ -6,37 +6,16 @@
[![Latest Release](https://img.shields.io/github/release/ArDoCo/Core.svg)](https://github.com/ArDoCo/Core/releases/latest)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7274034.svg)](https://doi.org/10.5281/zenodo.7274034)
-The goal of this project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating
-elements (inconsistencies).
+The goal of the ArDoCo project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating elements (inconsistencies).
An element can be any representable item of the model, like a component or a relation.
To do so, we first create trace links and then make use of them and other information to identify inconsistencies.
-ArDoCo is actively developed by researchers of
-the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_
-of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at
-the [KIT](https://www.kit.edu).
+ArDoCo is actively developed by researchers of the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_ of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at the [KIT](https://www.kit.edu).
-## User Interfaces
+This **Core** repository contains the framework and core definitions for the other approaches.
+As such, there is the definition of our pipeline and the data handling as well as the definitions for the various pipeline steps, inputs, outputs, etc.
-To be able to execute the core algorithms from this repository, you can write own user interfaces that (should) use
-the [ArDoCoRunner](https://github.com/ArDoCo/Core/blob/main/pipeline/pipeline-core/src/main/java/edu/kit/kastel/mcse/ardoco/core/execution/runner/ArDoCoRunner.java).
-
-We provide an example Command Line Interface (CLI) at [ArDoCo/CLI](https://github.com/ArDoCo/CLI) as well as a simple Graphical User Interface (GUI)
-at [ArDoCo/GUI](https://github.com/ArDoCo/GUI).
-
-Future user interfaces like an enhanced GUI or a web interface are planned.
-
-## Documentation
-
-For more information about the setup or the architecture have a look on the [Wiki](https://github.com/ArDoCo/Core/wiki).
-The docs are at some points deprecated, the general overview and setup should still hold.
-
-## Case Studies / Benchmarks
-
-To test the Core, you could use case studies and benchmarks provided in ..
-
-* [ArDoCo Benchmark](https://github.com/ArDoCo/Benchmark)
-* [SWATTR](https://github.com/ArDoCo/SWATTR)
+For more information about the setup, the project structure, or the architecture, please have a look at the [Wiki](https://github.com/ArDoCo/Core/wiki).
## Maven
@@ -45,7 +24,7 @@ To test the Core, you could use case studies and benchmarks provided in ..
+ +
+ ArDoCo (Architecture Documentation Consistency) is a framework to connect architecture documentation and models while identifying missing or deviating elements (inconsistencies). An element can be any representable item of the model, like a component or a relation. To do so, ArDoCo first creates trace links and then makes use of them and other information to identify inconsistencies. -You can find [ArDoCo on GitHub](https://github.com/ArDoCo). +You can find ArDoCo on the [website](https://ardoco.de) and [on GitHub](https://github.com/ArDoCo). Before contributing, please read the [Quickstart Guide](quickstart). -JavaDocs can be found [here](https://ardoco.github.io/Core-Docs/). + + +To get to know the project, please read the following pages: + +* [Core Pipeline Definition](pipeline) +* [Intermediate Artifacts](intermediate-artifacts) +* [Text Preprocessing Microservice](Text-Preprocessing-Microservice) +* [Traceability Link Recovery (TLR)](traceability-link-recovery) +* [Inconsistency Detection (ID)](inconsistency-detection) +* [Linking Sketches and Software Architecture (LiSSA)](LiSSA) + +## Project Structure + +* [Core](https://github.com/ArDoCo/Core): Core framework with framework and API definitions +* Pipelines + * [TLR](https://github.com/ArDoCo/TLR): Traceability Link Recovery (TLR) Modules + * [StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service): RESTful web service for text preprocessing + * [InconsistencyDetection](https://github.com/ArDoCo/InconsistencyDetection): Inconsistency Detection (ID) Modules + * [LiSSA](https://github.com/ArDoCo/LiSSA): Linking Sketches and Software Architecture Modules +* Testing and Evaluation + * [IntegrationTests](https://github.com/ArDoCo/IntegrationTests): Integration Tests + * [Benchmark](https://github.com/ArDoCo/Benchmark): Benchmarks + * [Evaluator](https://github.com/ArDoCo/Evaluator): Evaluation code that compares CSVs (e.g., output and gold standard) + * [SimpleTracelinkDiscovery](https://github.com/ArDoCo/SimpleTracelinkDiscovery): Baseline approach +* GUIs, CLIs, etc. + * [TraceView](https://github.com/ArDoCo/TraceView): WIP visualisation of the outputs for TLR and ID + * *outdated* [CLI](https://github.com/ArDoCo/CLI): Command Line Interface (*outdated*) +* [actions](https://github.com/ArDoCo/actions): Reusable GitHub Actions ## System Requirements -The `complete` profile includes all the requirements that the special profiles also need. This profile is activated by -default. +The project requires **JDK 21**. +Furthermore, we advise at least **4 GB of RAM**. -All profiles require JDK 21. +## Benchmarks -The dependencies of the other profiles at a glance: +You can test ArDoCo using the projects provided in our [Benchmark repository](https://github.com/ArDoCo/Benchmark). -* tlr: - -* inconsistency: - -* lissa (LInking Sketches and Software Architecture): Docker (local - or [remote](https://github.com/ArDoCo/Core/blob/lissa/stages/diagram-recognition/src/main/kotlin/edu/kit/kastel/mcse/ardoco/lissa/diagramrecognition/informants/DockerInformant.kt#L20-L23)) +## Related Publications -## Case Studies & Benchmarks +* J. Keim, S. Corallo, D. Fuchß, T. Hey, T. Telge und A. Koziolek. "Recovering Trace Links Between Software Documentation And Code". 2024. In: Proceedings of 46th IEEE International Conference on Software Engineering (ICSE 2024). [doi:10.5445/IR/1000165692](https://doi.org/10.5445/IR/1000165692/post) -You can test ArDoCo using our case studies and benchmarks provided in ... +* J. Keim, S. Corallo, D. Fuchß und A. Koziolek. "Detecting Inconsistencies in Software Architecture Documentation Using Traceability Link Recovery". 2023. In: IEEE 20th International Conference on Software Architecture (ICSA 2023). [doi:10.1109/ICSA56044.2023.00021](https://doi.org/10.1109/ICSA56044.2023.00021) -* [Case Studies](https://github.com/ArDoCo/SWATTR) -* [Benchmarks](https://github.com/ArDoCo/Benchmark) +* D. Fuchß, S. Corallo, J. Keim, J. Speit und A. Koziolek. "Establishing a Benchmark Dataset for Traceability Link Recovery between Software Architecture Documentation and Models". 2022. In: 2nd International Workshop on Mining Software Repositories for Software Architecture - Co-located with 16th European Conference on Software Architecture. -## Publications +* J. Keim, S. Schulz, D. Fuchß, C. Kocher, J. Speit, A. Koziolek. "Trace Link Recovery for Software Architecture Documentation". 2021. In: Software Architecture: 15th European Conference (ECSA 2021). [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7) -Trace Link Recovery for Software Architecture Documentation Keim, J.; Schulz, S.; Fuchß, D.; Kocher, C.; Speit, J.; -Koziolek, A. 2021. Software Architecture: 15th European Conference, ECSA 2021, Virtual Event, Sweden, September 13-17, -2021, Proceedings. Ed.: S. Biffl, 101–116, Springer -Verlag. [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7) +* J. Keim and A. Koziolek. "Towards Consistency Checking Between Software Architecture and Informal Documentation". 2019. In: IEEE 16th International Conference on Software Architecture Companion (ICSA-C). [doi:10.1109/ICSA-C.2019.00052](https://doi.org/10.1109/ICSA-C.2019.00052) -The initial version of ArDoCo is based on the master -thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194). + +The initial version of ArDoCo is based on the master thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194). ## Contact -This project is currently developed by researchers of the Karlsruhe Institute of Technology. +This project is currently developed by researchers of the Karlsruhe Institute of Technology (KIT). + +You find us on our websites: -You find us on our -websites: [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php), [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php), -and [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php) +* [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php), +* [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php), and +* [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php) diff --git a/docs/Inconsistency-Detection.md b/docs/Inconsistency-Detection.md new file mode 100644 index 000000000..ef8f23f23 --- /dev/null +++ b/docs/Inconsistency-Detection.md @@ -0,0 +1,12 @@ + +Currently, there are two kinds of inconsistencies that are supported by the approach: Missing Model Elements (MMEs) and Undocumented Model Elements (UMEs). + +Undocumented Model Elements (UMEs) are elements within the Software Architecture Model (SAM) that are not documented in the natural language Software Architecture Documentation (SAD). +Our heuristic looks for model elements that do not have any (or below a certain threshold, per default 1) trace links associated with them. +In the configuration options, you can fine tune the threshold as well as set up a regex-based whitelist. + +Missing Model Elements (MMEs) are architecture elements that are described within the SAD that cannot be traced to the SAM. +For this, we make use of the recommendations from the Recommendation Generator within the [Traceability Link Recovery (TLR)](traceability-link-recovery). +Each of these recommendations that are not linked with a model element are potential inconsistencies. +To further increase precision, we make use of filters. +For example, we use a filter to get rid of commonly used software (development) terminology that look similar to, e.g., components but rarely are model elements. diff --git a/docs/Intermediate-Artifacts.md b/docs/Intermediate-Artifacts.md new file mode 100644 index 000000000..da69ec7f5 --- /dev/null +++ b/docs/Intermediate-Artifacts.md @@ -0,0 +1,128 @@ + +Currently, there are three kinds of intermediate artifacts. +First, the input text has an internal representation (cf. [edu/kit/kastel/mcse/ardoco/core/api/text/Text.java](https://github.com/ArDoCo/Core/blob/main/framework/common/src/main/java/edu/kit/kastel/mcse/ardoco/core/api/text/Text.java)) to cover all the annotations from the preprocessing. +Second, there is the intermediate representation of software architecture models (SAMs) that we cover [below](#software-architecture-models). +Third, we create a uniform representation for code that we also explain [below](#code). + +```mermaid +classDiagram + class ModelElement + class Model + class Entity + class CodeModel + class ArchitectureModel + + ModelElement <|-- Entity + ModelElement <|-- Model + Model <|-- CodeModel + Model <|-- ArchitectureModel + Model "0..1" o--"*" Entity: elements +``` + +## Software Architecture Models + +```mermaid +classDiagram + class Entity + class ArchitectureItem + class Component + class Interface + class Signature + + Entity <|-- ArchitectureItem + ArchitectureItem <|-- Component + ArchitectureItem <|-- Interface + ArchitectureItem <|-- Signature + + Interface o-- "*" Signature: signatures + Interface "*" <-- "*" Component: provided + Interface "*" <-- "*" Component: required + Component "*" <-- Component: subcomponents +``` + +In this software model, each class is categorized as an ArchitectureItem, which inherits properties from Entity, including a name and identifier. +There are three types of ArchitectureItems: Component, Interface, and Signature. + +A Component represents various architectural elements in different modeling languages. +For instance, it corresponds to a UML Component. +In the PCM context, it encompasses both BasicComponent and CompositeComponent. +BasicComponents do not contain sub-components, while CompositeComponents may have sub-components. + +Components can either require or provide Interfaces. +Provided Interfaces are implemented by the Component, while Required Interfaces specify the functionality required by a Component. + +An Interface contains multiple method Signatures. +Signatures are linked to Interfaces in a composite relationship, meaning each Signature is associated with an Interface. + + +## Code + +```mermaid +classDiagram + class Entity + class CodeItem + class Module + class Package + class CompilationUnit + class CodeAssembly + class ComputationalObject + class ControlElement + class Datatype + class ClassUnit + class InterfaceUnit + + Entity <|-- CodeItem + CodeItem <|-- ComputationalObject + CodeItem <|-- Module + CodeItem <|-- Datatype + ComputationalObject <|-- ControlElement + Module <|-- Package + Module <|-- CompilationUnit + Module <|-- CodeAssembly + Datatype <|-- ClassUnit + Datatype <|-- InterfaceUnit + + Module "0..1" o--> "*" CodeItem: codeElements + ClassUnit "0..1" o--> "*" CodeItem: codeElements + InterfaceUnit "0..1" o--> "*" CodeItem: codeElements + Datatype "*" <-- "*" Datatype: implementedTypes + Datatype "*" <-- "*" Datatype: extendedTypes +``` + +The intermediate model for code is based on the source code package within the [Knowledge Discover Model (KDM)](https://www.omg.org/spec/KDM/1.3/PDF). + +The different classes in the code model inherit from CodeItem, which itself is a specialized Entity. +Thus, each class has a name and identifier. + +There are three kinds of source code elements: Module, Datatype, and ComputationalObject. + +Modules are typically logical components of the system with a certain level of abstraction. +A Module can contain CodeItems, and there are three differentiations of Modules: CompilationUnit, Package, and CodeAssembly. + +A CompilationUnit represents a source file where code is stored. +It includes a relative path to the file's location on disk and its programming language. +The CompilationUnit is partly based on the InventoryModel from KDM. + +A Package is a logical collection of source code elements (i.e., CodeItems). +Packages can also contain sub-Packages, similar to the structure commonly found in Java. + +A CodeAssembly consists of source code artifacts linked together to make them runnable. +For example, source code files together with their headers are grouped in a CodeAssembly. + +There are two kinds of Datatypes: CodeUnit and InterfaceUnit. +A CodeUnit is akin to a class in Java and can contain other CodeItems like methods and inner classes. +Similarly, an InterfaceUnit can also contain code elements like methods. + +The relationships implementedTypes and extendedTypes from the KDM model are present in the intermediate model. +A Datatype can implement an arbitrary number of extendedTypes relations, representing inheritance in object-oriented programming languages. + +The construction around extendedTypes and implementedTypes also enables interfaces to extend other interfaces, akin to Java. +Interfaces can also extend classes, a feature present in some programming languages like TypeScript. + +The KDM includes several primitive datatypes like boolean, which are not realized within this model as they are not currently needed. +If future work extends the approaches with a thorough comparison of datatypes, then the intermediate model may need further sub-classing of the KDM. + +Currently, there is only one type of ComputationalObject: the ControlElement. +The ControlElement represents callable parts with specific behaviors, such as functions, procedures, or methods. +Unlike the KDM, this work does not make a further distinction between CallableUnits and MethodUnits. +Additionally, it does not utilize parameters, return types, or similar elements of the KDM and therefore does not model them. diff --git a/docs/LiSSA.md b/docs/LiSSA.md index 4982a722c..daf7e7347 100644 --- a/docs/LiSSA.md +++ b/docs/LiSSA.md @@ -1,7 +1,8 @@ +# Linking Sketches and Software Architecture (LiSSA) + The LiSSA approach aims to connect sketches and informal diagrams (such as class diagrams, component diagrams, ...) with formal models like component models. -## Linking Sketches and Software Architecture (LiSSA) The following diagram shows the pipeline that is planned for the LiSSA approach. ```mermaid @@ -14,7 +15,7 @@ stateDiagram-v2 RecommendationGeneration ConnectionGeneration InconsistencyDetection - + DiagramDetection --> RecommendationGeneration TextPreprocessing --> TextExtraction ArchitectureModel --> RecommendationGeneration diff --git a/docs/Pipeline.md b/docs/Pipeline.md new file mode 100644 index 000000000..ee5900b99 --- /dev/null +++ b/docs/Pipeline.md @@ -0,0 +1,22 @@ + +```mermaid +classDiagram + class AbstractPipeline + class Pipeline + class PipelineStep + + Pipeline--> "*" AbstractPipeline + Pipeline..|>AbstractPipeline + PipelineStep..|>AbstractPipeline +``` + +For the pipeline definition, we use a composite to allow us to have a multi-level pipeline. +As such, a pipeline consists of an arbitrary number of either *PipelineStep*s or further *Pipeline*s. + +In our approach, we use three levels for our pipeline: +On the first level, the overall pipeline defines multiple *stages*, e.g., text preprocessing or element connection. +Each stage is another pipeline that then defines *agents* that have the purpose of initiating the processing and of collecting the information of the various heuristics. +Agents then use *Informants* as concrete PipelineSteps to execute the processing and heuristics. + +A pipeline step (i.e., an Informant) stores results within a repository that can be universally accessed by all pipeline steps, similarly to a blackboard in the blackboard pattern. +This way, each pipeline step and, thus, each heuristic can access the results of previous steps and provide its results for others. diff --git a/docs/Profiles.md b/docs/Profiles.md deleted file mode 100644 index 560a781a1..000000000 --- a/docs/Profiles.md +++ /dev/null @@ -1,32 +0,0 @@ -ArDoCo uses maven profiles to provide subsets of its functionality and speed up development time. - -## Current Profiles - -* **complete** (activated by default) -* **deployment** (profile for deployment to maven central) -* **tlr** (profile for traceability link recovery) -* **inconsistency** (profile for inconsistency detection) - -## Adding new profiles - -In order to add a new profile, you have to extend the profile section in the main pom.xml (as well as in all submodules -that contain submodules; i.e., stages, tests) - -```xml - -