Merge branch 'main' into dependabot/maven/com.networknt-json-schema-v…

…alidator-1.3.3
ArDoCo · Mar 7, 2024 · 2bb412f · 2bb412f
2 parents 0917bc1 + 461a859
commit 2bb412f
Show file tree

Hide file tree

Showing 22 changed files with 363 additions and 258 deletions.
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -8,10 +8,6 @@ on:
       - '**/src/**'
       - '**/pom.xml'
       - 'pom.xml'
-
-    # Publish `v1.2.3` tags as releases.
-    tags:
-      - v*
 
   # Allows you to run this workflow manually from the Actions tab
   workflow_dispatch:

diff --git a/README.md b/README.md
@@ -6,37 +6,16 @@
 [![Latest Release](https://img.shields.io/github/release/ArDoCo/Core.svg)](https://github.com/ArDoCo/Core/releases/latest)
 [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7274034.svg)](https://doi.org/10.5281/zenodo.7274034)
 
-The goal of this project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating
-elements (inconsistencies).
+The goal of the ArDoCo project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating elements (inconsistencies).
 An element can be any representable item of the model, like a component or a relation.
 To do so, we first create trace links and then make use of them and other information to identify inconsistencies.
 
-ArDoCo is actively developed by researchers of
-the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_
-of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at
-the [KIT](https://www.kit.edu).
+ArDoCo is actively developed by researchers of the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_ of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at the [KIT](https://www.kit.edu).
 
-## User Interfaces
+This **Core** repository contains the framework and core definitions for the other approaches.
+As such, there is the definition of our pipeline and the data handling as well as the definitions for the various pipeline steps, inputs, outputs, etc.
 
-To be able to execute the core algorithms from this repository, you can write own user interfaces that (should) use
-the [ArDoCoRunner](https://github.com/ArDoCo/Core/blob/main/pipeline/pipeline-core/src/main/java/edu/kit/kastel/mcse/ardoco/core/execution/runner/ArDoCoRunner.java).
-
-We provide an example Command Line Interface (CLI) at [ArDoCo/CLI](https://github.com/ArDoCo/CLI) as well as a simple Graphical User Interface (GUI)
-at [ArDoCo/GUI](https://github.com/ArDoCo/GUI).
-
-Future user interfaces like an enhanced GUI or a web interface are planned.
-
-## Documentation
-
-For more information about the setup or the architecture have a look on the [Wiki](https://github.com/ArDoCo/Core/wiki).
-The docs are at some points deprecated, the general overview and setup should still hold.
-
-## Case Studies / Benchmarks
-
-To test the Core, you could use case studies and benchmarks provided in ..
-
-* [ArDoCo Benchmark](https://github.com/ArDoCo/Benchmark)
-* [SWATTR](https://github.com/ArDoCo/SWATTR)
+For more information about the setup, the project structure, or the architecture, please have a look at the [Wiki](https://github.com/ArDoCo/Core/wiki).
 
 ## Maven
 
@@ -45,7 +24,7 @@ To test the Core, you could use case studies and benchmarks provided in ..
 <dependencies>
 	<dependency>
 		<groupId>io.github.ardoco.core</groupId>
-		<artifactId>pipeline</artifactId> <!-- or any other subproject -->
+		<artifactId>framework</artifactId> <!-- or any other subproject -->
 		<version>VERSION</version>
 	</dependency>
 </dependencies>
@@ -69,33 +48,8 @@ For snapshot releases, make sure to add the following repository
 </repositories>
 ```
 
-## Microservice for text preprocessing
-
-Text preprocessing works locally, but there is also the option to host a microservice for this.
-The benefit is that the models do not need to be loaded each time, saving some runtime (and local memory).
-
-The microservice can be found at [ArDoCo/StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service/).
-
-The microservice is secured with credentials and the usage of the microservice needs to be activated and the URL of the microservice configured.
-These settings can be provided to the execution via environment variables.
-To do so, set the following variables:
-
-```env
-NLP_PROVIDER_SOURCE=microservice
-MICROSERVICE_URL=[microservice_url]
-SCNLP_SERVICE_USER=[your_username]
-SCNLP_SERVICE_PASSWORD=[your_password]
-```
-
-The first variable `NLP_PROVIDER_SOURCE=microservice` activates the microservice usage.
-The next three variables configure the connection, and you need to provide the configuration for your deployed microservice.
-
-## Attribution
-
-The initial version of this project is based on the master
-thesis [Linking Software Architecture Documentation and Models](https://doi.org/10.5445/IR/1000126194).
-
-## Acknowledgements
-
-This work was supported by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by
-KASTEL Security Research Labs (46.23.01).
+## Relevant repositories
+The following is an excerpt of repositories that use this framework and implement the different approaches and pipelines of ArDoCo:
+* [ArDoCo/TLR](https://github.com/ArDoCo/TLR): implementing different traceability link recovery approaches
+* [ArDoCo/InconsistencyDetection](https://github.com/ArDoCo/InconsistencyDetection): implementing inconsistency detection approaches
+* [ArDoCo/LiSSA](https://github.com/ArDoCo/LiSSA): implementing processing of sketches and diagrams for, e.g., TLR
diff --git a/docs/Home.md b/docs/Home.md
@@ -1,49 +1,77 @@
+# ArDoCo
+
+<p align="center">
+ <img alt="ArDoCo" src="https://github.com/ArDoCo/.github/raw/main/profile/logo.png" height="210"/>
+</p>
+
 ArDoCo (Architecture Documentation Consistency) is a framework to connect architecture documentation and models while
 identifying missing or deviating elements (inconsistencies). An element can be any representable item of the model, like
 a component or a relation. To do so, ArDoCo first creates trace links and then makes use of them and other information
 to identify inconsistencies.
 
-You can find [ArDoCo on GitHub](https://github.com/ArDoCo).
+You can find ArDoCo on the [website](https://ardoco.de) and [on GitHub](https://github.com/ArDoCo).
 
 Before contributing, please read the [Quickstart Guide](quickstart).
 
-JavaDocs can be found [here](https://ardoco.github.io/Core-Docs/).
+<!-- JavaDocs can be found [here](https://ardoco.github.io/Core-Docs/). -->
+
+To get to know the project, please read the following pages:
+
+* [Core Pipeline Definition](pipeline)
+* [Intermediate Artifacts](intermediate-artifacts)
+* [Text Preprocessing Microservice](Text-Preprocessing-Microservice)
+* [Traceability Link Recovery (TLR)](traceability-link-recovery)
+* [Inconsistency Detection (ID)](inconsistency-detection)
+* [Linking Sketches and Software Architecture (LiSSA)](LiSSA)
+
+## Project Structure
+
+* [Core](https://github.com/ArDoCo/Core): Core framework with framework and API definitions
+* Pipelines
+  * [TLR](https://github.com/ArDoCo/TLR): Traceability Link Recovery (TLR) Modules
+  * [StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service): RESTful web service for text preprocessing
+  * [InconsistencyDetection](https://github.com/ArDoCo/InconsistencyDetection): Inconsistency Detection (ID) Modules
+  * [LiSSA](https://github.com/ArDoCo/LiSSA): Linking Sketches and Software Architecture Modules
+* Testing and Evaluation
+  * [IntegrationTests](https://github.com/ArDoCo/IntegrationTests): Integration Tests
+  * [Benchmark](https://github.com/ArDoCo/Benchmark): Benchmarks
+  * [Evaluator](https://github.com/ArDoCo/Evaluator): Evaluation code that compares CSVs (e.g., output and gold standard)
+  * [SimpleTracelinkDiscovery](https://github.com/ArDoCo/SimpleTracelinkDiscovery): Baseline approach
+* GUIs, CLIs, etc.
+  * [TraceView](https://github.com/ArDoCo/TraceView): WIP visualisation of the outputs for TLR and ID
+  * *outdated* [CLI](https://github.com/ArDoCo/CLI): Command Line Interface (*outdated*)
+* [actions](https://github.com/ArDoCo/actions): Reusable GitHub Actions
 
 ## System Requirements
 
-The `complete` profile includes all the requirements that the special profiles also need. This profile is activated by
-default.
+The project requires **JDK 21**.
+Furthermore, we advise at least **4 GB of RAM**.
 
-All profiles require JDK 21.
+## Benchmarks
 
-The dependencies of the other profiles at a glance:
+You can test ArDoCo using the projects provided in our [Benchmark repository](https://github.com/ArDoCo/Benchmark).
 
-* tlr: -
-* inconsistency: -
-* lissa (LInking Sketches and Software Architecture): Docker (local
-  or [remote](https://github.com/ArDoCo/Core/blob/lissa/stages/diagram-recognition/src/main/kotlin/edu/kit/kastel/mcse/ardoco/lissa/diagramrecognition/informants/DockerInformant.kt#L20-L23))
+## Related Publications
 
-## Case Studies & Benchmarks
+* J. Keim, S. Corallo, D. Fuchß, T. Hey, T. Telge und A. Koziolek. "Recovering Trace Links Between Software Documentation And Code". 2024. In: Proceedings of 46th IEEE International Conference on Software Engineering (ICSE 2024). [doi:10.5445/IR/1000165692](https://doi.org/10.5445/IR/1000165692/post)
 
-You can test ArDoCo using our case studies and benchmarks provided in ...
+* J. Keim, S. Corallo, D. Fuchß und A. Koziolek. "Detecting Inconsistencies in Software Architecture Documentation Using Traceability Link Recovery". 2023. In: IEEE 20th International Conference on Software Architecture (ICSA 2023). [doi:10.1109/ICSA56044.2023.00021](https://doi.org/10.1109/ICSA56044.2023.00021)
 
-* [Case Studies](https://github.com/ArDoCo/SWATTR)
-* [Benchmarks](https://github.com/ArDoCo/Benchmark)
+* D. Fuchß, S. Corallo, J. Keim, J. Speit und A. Koziolek. "Establishing a Benchmark Dataset for Traceability Link Recovery between Software Architecture Documentation and Models". 2022. In: 2nd International Workshop on Mining Software Repositories for Software Architecture - Co-located with 16th European Conference on Software Architecture.
 
-## Publications
+* J. Keim, S. Schulz, D. Fuchß, C. Kocher, J. Speit, A. Koziolek. "Trace Link Recovery for Software Architecture Documentation". 2021. In: Software Architecture: 15th European Conference (ECSA 2021). [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7)
 
-Trace Link Recovery for Software Architecture Documentation Keim, J.; Schulz, S.; Fuchß, D.; Kocher, C.; Speit, J.;
-Koziolek, A. 2021. Software Architecture: 15th European Conference, ECSA 2021, Virtual Event, Sweden, September 13-17,
-2021, Proceedings. Ed.: S. Biffl, 101–116, Springer
-Verlag. [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7)
+* J. Keim and A. Koziolek. "Towards Consistency Checking Between Software Architecture and Informal Documentation". 2019. In: IEEE 16th International Conference on Software Architecture Companion (ICSA-C). [doi:10.1109/ICSA-C.2019.00052](https://doi.org/10.1109/ICSA-C.2019.00052)
 
-The initial version of ArDoCo is based on the master
-thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194).
+
+The initial version of ArDoCo is based on the master thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194).
 
 ## Contact
 
-This project is currently developed by researchers of the Karlsruhe Institute of Technology.
+This project is currently developed by researchers of the Karlsruhe Institute of Technology (KIT).
+
+You find us on our websites:
 
-You find us on our
-websites: [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php), [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php),
-and [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php)
+* [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php),
+* [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php), and
+* [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php)
diff --git a/docs/Inconsistency-Detection.md b/docs/Inconsistency-Detection.md
@@ -0,0 +1,12 @@
+
+Currently, there are two kinds of inconsistencies that are supported by the approach: Missing Model Elements (MMEs) and Undocumented Model Elements (UMEs).
+
+Undocumented Model Elements (UMEs) are elements within the Software Architecture Model (SAM) that are not documented in the natural language Software Architecture Documentation (SAD).
+Our heuristic looks for model elements that do not have any (or below a certain threshold, per default 1) trace links associated with them.
+In the configuration options, you can fine tune the threshold as well as set up a regex-based whitelist.
+
+Missing Model Elements (MMEs) are architecture elements that are described within the SAD that cannot be traced to the SAM.
+For this, we make use of the recommendations from the Recommendation Generator within the [Traceability Link Recovery (TLR)](traceability-link-recovery).
+Each of these recommendations that are not linked with a model element are potential inconsistencies.
+To further increase precision, we make use of filters.
+For example, we use a filter to get rid of commonly used software (development) terminology that look similar to, e.g., components but rarely are model elements.
diff --git a/docs/Intermediate-Artifacts.md b/docs/Intermediate-Artifacts.md
@@ -0,0 +1,128 @@
+
+Currently, there are three kinds of intermediate artifacts.
+First, the input text has an internal representation (cf. [edu/kit/kastel/mcse/ardoco/core/api/text/Text.java](https://github.com/ArDoCo/Core/blob/main/framework/common/src/main/java/edu/kit/kastel/mcse/ardoco/core/api/text/Text.java)) to cover all the annotations from the preprocessing.
+Second, there is the intermediate representation of software architecture models (SAMs) that we cover [below](#software-architecture-models).
+Third, we create a uniform representation for code that we also explain [below](#code).
+
+```mermaid
+classDiagram
+    class ModelElement
+    class Model
+    class Entity
+    class CodeModel
+    class ArchitectureModel
+
+    ModelElement <|-- Entity
+    ModelElement <|-- Model
+    Model <|-- CodeModel
+    Model <|-- ArchitectureModel
+    Model "0..1" o--"*" Entity: elements
+```
+
+## Software Architecture Models
+
+```mermaid
+classDiagram
+    class Entity
+    class ArchitectureItem
+    class Component
+    class Interface
+    class Signature
+
+    Entity <|-- ArchitectureItem
+    ArchitectureItem <|-- Component
+    ArchitectureItem <|-- Interface
+    ArchitectureItem <|-- Signature
+
+    Interface o-- "*" Signature: signatures
+    Interface "*" <-- "*" Component: provided
+    Interface "*" <-- "*" Component: required
+    Component "*" <-- Component: subcomponents
+```
+
+In this software model, each class is categorized as an ArchitectureItem, which inherits properties from Entity, including a name and identifier.
+There are three types of ArchitectureItems: Component, Interface, and Signature.
+
+A Component represents various architectural elements in different modeling languages.
+For instance, it corresponds to a UML Component.
+In the PCM context, it encompasses both BasicComponent and CompositeComponent.
+BasicComponents do not contain sub-components, while CompositeComponents may have sub-components.
+
+Components can either require or provide Interfaces.
+Provided Interfaces are implemented by the Component, while Required Interfaces specify the functionality required by a Component.
+
+An Interface contains multiple method Signatures.
+Signatures are linked to Interfaces in a composite relationship, meaning each Signature is associated with an Interface.
+
+
+## Code
+
+```mermaid
+classDiagram
+    class Entity
+    class CodeItem
+    class Module
+    class Package
+    class CompilationUnit
+    class CodeAssembly
+    class ComputationalObject
+    class ControlElement
+    class Datatype
+    class ClassUnit
+    class InterfaceUnit
+
+    Entity <|-- CodeItem
+    CodeItem <|-- ComputationalObject
+    CodeItem <|-- Module
+    CodeItem <|-- Datatype
+    ComputationalObject <|-- ControlElement
+    Module <|-- Package
+    Module <|-- CompilationUnit
+    Module <|-- CodeAssembly
+    Datatype <|-- ClassUnit
+    Datatype <|-- InterfaceUnit
+
+    Module "0..1" o--> "*" CodeItem: codeElements
+    ClassUnit "0..1" o--> "*" CodeItem: codeElements
+    InterfaceUnit "0..1" o--> "*" CodeItem: codeElements
+    Datatype "*" <-- "*" Datatype: implementedTypes
+    Datatype "*" <-- "*" Datatype: extendedTypes
+```
+
+The intermediate model for code is based on the source code package within the [Knowledge Discover Model (KDM)](https://www.omg.org/spec/KDM/1.3/PDF).
+
+The different classes in the code model inherit from CodeItem, which itself is a specialized Entity.
+Thus, each class has a name and identifier.
+
+There are three kinds of source code elements: Module, Datatype, and ComputationalObject.
+
+Modules are typically logical components of the system with a certain level of abstraction.
+A Module can contain CodeItems, and there are three differentiations of Modules: CompilationUnit, Package, and CodeAssembly.
+
+A CompilationUnit represents a source file where code is stored.
+It includes a relative path to the file's location on disk and its programming language.
+The CompilationUnit is partly based on the InventoryModel from KDM.
+
+A Package is a logical collection of source code elements (i.e., CodeItems).
+Packages can also contain sub-Packages, similar to the structure commonly found in Java.
+
+A CodeAssembly consists of source code artifacts linked together to make them runnable.
+For example, source code files together with their headers are grouped in a CodeAssembly.
+
+There are two kinds of Datatypes: CodeUnit and InterfaceUnit.
+A CodeUnit is akin to a class in Java and can contain other CodeItems like methods and inner classes.
+Similarly, an InterfaceUnit can also contain code elements like methods.
+
+The relationships implementedTypes and extendedTypes from the KDM model are present in the intermediate model.
+A Datatype can implement an arbitrary number of extendedTypes relations, representing inheritance in object-oriented programming languages.
+
+The construction around extendedTypes and implementedTypes also enables interfaces to extend other interfaces, akin to Java.
+Interfaces can also extend classes, a feature present in some programming languages like TypeScript.
+
+The KDM includes several primitive datatypes like boolean, which are not realized within this model as they are not currently needed.
+If future work extends the approaches with a thorough comparison of datatypes, then the intermediate model may need further sub-classing of the KDM.
+
+Currently, there is only one type of ComputationalObject: the ControlElement.
+The ControlElement represents callable parts with specific behaviors, such as functions, procedures, or methods.
+Unlike the KDM, this work does not make a further distinction between CallableUnits and MethodUnits.
+Additionally, it does not utilize parameters, return types, or similar elements of the KDM and therefore does not model them.
diff --git a/docs/LiSSA.md b/docs/LiSSA.md
@@ -1,7 +1,8 @@
+# Linking Sketches and Software Architecture (LiSSA)
+
 The LiSSA approach aims to connect sketches and informal diagrams (such as class diagrams, component diagrams, ...) with
 formal models like component models.
 
-## Linking Sketches and Software Architecture (LiSSA)
 The following diagram shows the pipeline that is planned for the LiSSA approach.
 
 ```mermaid
@@ -14,7 +15,7 @@ stateDiagram-v2
     RecommendationGeneration
     ConnectionGeneration
     InconsistencyDetection
-    
+
     DiagramDetection --> RecommendationGeneration
     TextPreprocessing --> TextExtraction
     ArchitectureModel --> RecommendationGeneration