diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml index 08fc76730..027d689b5 100644 --- a/.github/workflows/deploy.yml +++ b/.github/workflows/deploy.yml @@ -8,10 +8,6 @@ on: - '**/src/**' - '**/pom.xml' - 'pom.xml' - - # Publish `v1.2.3` tags as releases. - tags: - - v* # Allows you to run this workflow manually from the Actions tab workflow_dispatch: diff --git a/README.md b/README.md index 036a6ce9a..90b90a5a5 100644 --- a/README.md +++ b/README.md @@ -6,37 +6,16 @@ [![Latest Release](https://img.shields.io/github/release/ArDoCo/Core.svg)](https://github.com/ArDoCo/Core/releases/latest) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7274034.svg)](https://doi.org/10.5281/zenodo.7274034) -The goal of this project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating -elements (inconsistencies). +The goal of the ArDoCo project is to connect architecture documentation and models with Traceability Link Recovery (TLR) while identifying missing or deviating elements (inconsistencies). An element can be any representable item of the model, like a component or a relation. To do so, we first create trace links and then make use of them and other information to identify inconsistencies. -ArDoCo is actively developed by researchers of -the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_ -of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at -the [KIT](https://www.kit.edu). +ArDoCo is actively developed by researchers of the _[Modelling for Continuous Software Engineering (MCSE) group](https://mcse.kastel.kit.edu)_ of _[KASTEL - Institute of Information Security and Dependability](https://kastel.kit.edu)_ at the [KIT](https://www.kit.edu). -## User Interfaces +This **Core** repository contains the framework and core definitions for the other approaches. +As such, there is the definition of our pipeline and the data handling as well as the definitions for the various pipeline steps, inputs, outputs, etc. -To be able to execute the core algorithms from this repository, you can write own user interfaces that (should) use -the [ArDoCoRunner](https://github.com/ArDoCo/Core/blob/main/pipeline/pipeline-core/src/main/java/edu/kit/kastel/mcse/ardoco/core/execution/runner/ArDoCoRunner.java). - -We provide an example Command Line Interface (CLI) at [ArDoCo/CLI](https://github.com/ArDoCo/CLI) as well as a simple Graphical User Interface (GUI) -at [ArDoCo/GUI](https://github.com/ArDoCo/GUI). - -Future user interfaces like an enhanced GUI or a web interface are planned. - -## Documentation - -For more information about the setup or the architecture have a look on the [Wiki](https://github.com/ArDoCo/Core/wiki). -The docs are at some points deprecated, the general overview and setup should still hold. - -## Case Studies / Benchmarks - -To test the Core, you could use case studies and benchmarks provided in .. - -* [ArDoCo Benchmark](https://github.com/ArDoCo/Benchmark) -* [SWATTR](https://github.com/ArDoCo/SWATTR) +For more information about the setup, the project structure, or the architecture, please have a look at the [Wiki](https://github.com/ArDoCo/Core/wiki). ## Maven @@ -45,7 +24,7 @@ To test the Core, you could use case studies and benchmarks provided in .. io.github.ardoco.core - pipeline + framework VERSION @@ -69,33 +48,8 @@ For snapshot releases, make sure to add the following repository ``` -## Microservice for text preprocessing - -Text preprocessing works locally, but there is also the option to host a microservice for this. -The benefit is that the models do not need to be loaded each time, saving some runtime (and local memory). - -The microservice can be found at [ArDoCo/StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service/). - -The microservice is secured with credentials and the usage of the microservice needs to be activated and the URL of the microservice configured. -These settings can be provided to the execution via environment variables. -To do so, set the following variables: - -```env -NLP_PROVIDER_SOURCE=microservice -MICROSERVICE_URL=[microservice_url] -SCNLP_SERVICE_USER=[your_username] -SCNLP_SERVICE_PASSWORD=[your_password] -``` - -The first variable `NLP_PROVIDER_SOURCE=microservice` activates the microservice usage. -The next three variables configure the connection, and you need to provide the configuration for your deployed microservice. - -## Attribution - -The initial version of this project is based on the master -thesis [Linking Software Architecture Documentation and Models](https://doi.org/10.5445/IR/1000126194). - -## Acknowledgements - -This work was supported by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by -KASTEL Security Research Labs (46.23.01). +## Relevant repositories +The following is an excerpt of repositories that use this framework and implement the different approaches and pipelines of ArDoCo: +* [ArDoCo/TLR](https://github.com/ArDoCo/TLR): implementing different traceability link recovery approaches +* [ArDoCo/InconsistencyDetection](https://github.com/ArDoCo/InconsistencyDetection): implementing inconsistency detection approaches +* [ArDoCo/LiSSA](https://github.com/ArDoCo/LiSSA): implementing processing of sketches and diagrams for, e.g., TLR \ No newline at end of file diff --git a/docs/Home.md b/docs/Home.md index 597487682..cd5ac064a 100644 --- a/docs/Home.md +++ b/docs/Home.md @@ -1,49 +1,77 @@ +# ArDoCo + +

+ ArDoCo +

+ ArDoCo (Architecture Documentation Consistency) is a framework to connect architecture documentation and models while identifying missing or deviating elements (inconsistencies). An element can be any representable item of the model, like a component or a relation. To do so, ArDoCo first creates trace links and then makes use of them and other information to identify inconsistencies. -You can find [ArDoCo on GitHub](https://github.com/ArDoCo). +You can find ArDoCo on the [website](https://ardoco.de) and [on GitHub](https://github.com/ArDoCo). Before contributing, please read the [Quickstart Guide](quickstart). -JavaDocs can be found [here](https://ardoco.github.io/Core-Docs/). + + +To get to know the project, please read the following pages: + +* [Core Pipeline Definition](pipeline) +* [Intermediate Artifacts](intermediate-artifacts) +* [Text Preprocessing Microservice](Text-Preprocessing-Microservice) +* [Traceability Link Recovery (TLR)](traceability-link-recovery) +* [Inconsistency Detection (ID)](inconsistency-detection) +* [Linking Sketches and Software Architecture (LiSSA)](LiSSA) + +## Project Structure + +* [Core](https://github.com/ArDoCo/Core): Core framework with framework and API definitions +* Pipelines + * [TLR](https://github.com/ArDoCo/TLR): Traceability Link Recovery (TLR) Modules + * [StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service): RESTful web service for text preprocessing + * [InconsistencyDetection](https://github.com/ArDoCo/InconsistencyDetection): Inconsistency Detection (ID) Modules + * [LiSSA](https://github.com/ArDoCo/LiSSA): Linking Sketches and Software Architecture Modules +* Testing and Evaluation + * [IntegrationTests](https://github.com/ArDoCo/IntegrationTests): Integration Tests + * [Benchmark](https://github.com/ArDoCo/Benchmark): Benchmarks + * [Evaluator](https://github.com/ArDoCo/Evaluator): Evaluation code that compares CSVs (e.g., output and gold standard) + * [SimpleTracelinkDiscovery](https://github.com/ArDoCo/SimpleTracelinkDiscovery): Baseline approach +* GUIs, CLIs, etc. + * [TraceView](https://github.com/ArDoCo/TraceView): WIP visualisation of the outputs for TLR and ID + * *outdated* [CLI](https://github.com/ArDoCo/CLI): Command Line Interface (*outdated*) +* [actions](https://github.com/ArDoCo/actions): Reusable GitHub Actions ## System Requirements -The `complete` profile includes all the requirements that the special profiles also need. This profile is activated by -default. +The project requires **JDK 21**. +Furthermore, we advise at least **4 GB of RAM**. -All profiles require JDK 21. +## Benchmarks -The dependencies of the other profiles at a glance: +You can test ArDoCo using the projects provided in our [Benchmark repository](https://github.com/ArDoCo/Benchmark). -* tlr: - -* inconsistency: - -* lissa (LInking Sketches and Software Architecture): Docker (local - or [remote](https://github.com/ArDoCo/Core/blob/lissa/stages/diagram-recognition/src/main/kotlin/edu/kit/kastel/mcse/ardoco/lissa/diagramrecognition/informants/DockerInformant.kt#L20-L23)) +## Related Publications -## Case Studies & Benchmarks +* J. Keim, S. Corallo, D. Fuchß, T. Hey, T. Telge und A. Koziolek. "Recovering Trace Links Between Software Documentation And Code". 2024. In: Proceedings of 46th IEEE International Conference on Software Engineering (ICSE 2024). [doi:10.5445/IR/1000165692](https://doi.org/10.5445/IR/1000165692/post) -You can test ArDoCo using our case studies and benchmarks provided in ... +* J. Keim, S. Corallo, D. Fuchß und A. Koziolek. "Detecting Inconsistencies in Software Architecture Documentation Using Traceability Link Recovery". 2023. In: IEEE 20th International Conference on Software Architecture (ICSA 2023). [doi:10.1109/ICSA56044.2023.00021](https://doi.org/10.1109/ICSA56044.2023.00021) -* [Case Studies](https://github.com/ArDoCo/SWATTR) -* [Benchmarks](https://github.com/ArDoCo/Benchmark) +* D. Fuchß, S. Corallo, J. Keim, J. Speit und A. Koziolek. "Establishing a Benchmark Dataset for Traceability Link Recovery between Software Architecture Documentation and Models". 2022. In: 2nd International Workshop on Mining Software Repositories for Software Architecture - Co-located with 16th European Conference on Software Architecture. -## Publications +* J. Keim, S. Schulz, D. Fuchß, C. Kocher, J. Speit, A. Koziolek. "Trace Link Recovery for Software Architecture Documentation". 2021. In: Software Architecture: 15th European Conference (ECSA 2021). [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7) -Trace Link Recovery for Software Architecture Documentation Keim, J.; Schulz, S.; Fuchß, D.; Kocher, C.; Speit, J.; -Koziolek, A. 2021. Software Architecture: 15th European Conference, ECSA 2021, Virtual Event, Sweden, September 13-17, -2021, Proceedings. Ed.: S. Biffl, 101–116, Springer -Verlag. [doi:10.1007/978-3-030-86044-8_7](https://doi.org/10.1007/978-3-030-86044-8_7) +* J. Keim and A. Koziolek. "Towards Consistency Checking Between Software Architecture and Informal Documentation". 2019. In: IEEE 16th International Conference on Software Architecture Companion (ICSA-C). [doi:10.1109/ICSA-C.2019.00052](https://doi.org/10.1109/ICSA-C.2019.00052) -The initial version of ArDoCo is based on the master -thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194). + +The initial version of ArDoCo is based on the master thesis [Linking Software Architecture Documentation and Models](https://publikationen.bibliothek.kit.edu/1000126194). ## Contact -This project is currently developed by researchers of the Karlsruhe Institute of Technology. +This project is currently developed by researchers of the Karlsruhe Institute of Technology (KIT). + +You find us on our websites: -You find us on our -websites: [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php), [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php), -and [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php) +* [Jan Keim](https://mcse.kastel.kit.edu/staff_Keim_Jan.php), +* [Sophie Corallo](https://mcse.kastel.kit.edu/staff_sophie_corallo.php), and +* [Dominik Fuchß](https://mcse.kastel.kit.edu/staff_dominik_fuchss.php) diff --git a/docs/Inconsistency-Detection.md b/docs/Inconsistency-Detection.md new file mode 100644 index 000000000..ef8f23f23 --- /dev/null +++ b/docs/Inconsistency-Detection.md @@ -0,0 +1,12 @@ + +Currently, there are two kinds of inconsistencies that are supported by the approach: Missing Model Elements (MMEs) and Undocumented Model Elements (UMEs). + +Undocumented Model Elements (UMEs) are elements within the Software Architecture Model (SAM) that are not documented in the natural language Software Architecture Documentation (SAD). +Our heuristic looks for model elements that do not have any (or below a certain threshold, per default 1) trace links associated with them. +In the configuration options, you can fine tune the threshold as well as set up a regex-based whitelist. + +Missing Model Elements (MMEs) are architecture elements that are described within the SAD that cannot be traced to the SAM. +For this, we make use of the recommendations from the Recommendation Generator within the [Traceability Link Recovery (TLR)](traceability-link-recovery). +Each of these recommendations that are not linked with a model element are potential inconsistencies. +To further increase precision, we make use of filters. +For example, we use a filter to get rid of commonly used software (development) terminology that look similar to, e.g., components but rarely are model elements. diff --git a/docs/Intermediate-Artifacts.md b/docs/Intermediate-Artifacts.md new file mode 100644 index 000000000..da69ec7f5 --- /dev/null +++ b/docs/Intermediate-Artifacts.md @@ -0,0 +1,128 @@ + +Currently, there are three kinds of intermediate artifacts. +First, the input text has an internal representation (cf. [edu/kit/kastel/mcse/ardoco/core/api/text/Text.java](https://github.com/ArDoCo/Core/blob/main/framework/common/src/main/java/edu/kit/kastel/mcse/ardoco/core/api/text/Text.java)) to cover all the annotations from the preprocessing. +Second, there is the intermediate representation of software architecture models (SAMs) that we cover [below](#software-architecture-models). +Third, we create a uniform representation for code that we also explain [below](#code). + +```mermaid +classDiagram + class ModelElement + class Model + class Entity + class CodeModel + class ArchitectureModel + + ModelElement <|-- Entity + ModelElement <|-- Model + Model <|-- CodeModel + Model <|-- ArchitectureModel + Model "0..1" o--"*" Entity: elements +``` + +## Software Architecture Models + +```mermaid +classDiagram + class Entity + class ArchitectureItem + class Component + class Interface + class Signature + + Entity <|-- ArchitectureItem + ArchitectureItem <|-- Component + ArchitectureItem <|-- Interface + ArchitectureItem <|-- Signature + + Interface o-- "*" Signature: signatures + Interface "*" <-- "*" Component: provided + Interface "*" <-- "*" Component: required + Component "*" <-- Component: subcomponents +``` + +In this software model, each class is categorized as an ArchitectureItem, which inherits properties from Entity, including a name and identifier. +There are three types of ArchitectureItems: Component, Interface, and Signature. + +A Component represents various architectural elements in different modeling languages. +For instance, it corresponds to a UML Component. +In the PCM context, it encompasses both BasicComponent and CompositeComponent. +BasicComponents do not contain sub-components, while CompositeComponents may have sub-components. + +Components can either require or provide Interfaces. +Provided Interfaces are implemented by the Component, while Required Interfaces specify the functionality required by a Component. + +An Interface contains multiple method Signatures. +Signatures are linked to Interfaces in a composite relationship, meaning each Signature is associated with an Interface. + + +## Code + +```mermaid +classDiagram + class Entity + class CodeItem + class Module + class Package + class CompilationUnit + class CodeAssembly + class ComputationalObject + class ControlElement + class Datatype + class ClassUnit + class InterfaceUnit + + Entity <|-- CodeItem + CodeItem <|-- ComputationalObject + CodeItem <|-- Module + CodeItem <|-- Datatype + ComputationalObject <|-- ControlElement + Module <|-- Package + Module <|-- CompilationUnit + Module <|-- CodeAssembly + Datatype <|-- ClassUnit + Datatype <|-- InterfaceUnit + + Module "0..1" o--> "*" CodeItem: codeElements + ClassUnit "0..1" o--> "*" CodeItem: codeElements + InterfaceUnit "0..1" o--> "*" CodeItem: codeElements + Datatype "*" <-- "*" Datatype: implementedTypes + Datatype "*" <-- "*" Datatype: extendedTypes +``` + +The intermediate model for code is based on the source code package within the [Knowledge Discover Model (KDM)](https://www.omg.org/spec/KDM/1.3/PDF). + +The different classes in the code model inherit from CodeItem, which itself is a specialized Entity. +Thus, each class has a name and identifier. + +There are three kinds of source code elements: Module, Datatype, and ComputationalObject. + +Modules are typically logical components of the system with a certain level of abstraction. +A Module can contain CodeItems, and there are three differentiations of Modules: CompilationUnit, Package, and CodeAssembly. + +A CompilationUnit represents a source file where code is stored. +It includes a relative path to the file's location on disk and its programming language. +The CompilationUnit is partly based on the InventoryModel from KDM. + +A Package is a logical collection of source code elements (i.e., CodeItems). +Packages can also contain sub-Packages, similar to the structure commonly found in Java. + +A CodeAssembly consists of source code artifacts linked together to make them runnable. +For example, source code files together with their headers are grouped in a CodeAssembly. + +There are two kinds of Datatypes: CodeUnit and InterfaceUnit. +A CodeUnit is akin to a class in Java and can contain other CodeItems like methods and inner classes. +Similarly, an InterfaceUnit can also contain code elements like methods. + +The relationships implementedTypes and extendedTypes from the KDM model are present in the intermediate model. +A Datatype can implement an arbitrary number of extendedTypes relations, representing inheritance in object-oriented programming languages. + +The construction around extendedTypes and implementedTypes also enables interfaces to extend other interfaces, akin to Java. +Interfaces can also extend classes, a feature present in some programming languages like TypeScript. + +The KDM includes several primitive datatypes like boolean, which are not realized within this model as they are not currently needed. +If future work extends the approaches with a thorough comparison of datatypes, then the intermediate model may need further sub-classing of the KDM. + +Currently, there is only one type of ComputationalObject: the ControlElement. +The ControlElement represents callable parts with specific behaviors, such as functions, procedures, or methods. +Unlike the KDM, this work does not make a further distinction between CallableUnits and MethodUnits. +Additionally, it does not utilize parameters, return types, or similar elements of the KDM and therefore does not model them. diff --git a/docs/LiSSA.md b/docs/LiSSA.md index 4982a722c..daf7e7347 100644 --- a/docs/LiSSA.md +++ b/docs/LiSSA.md @@ -1,7 +1,8 @@ +# Linking Sketches and Software Architecture (LiSSA) + The LiSSA approach aims to connect sketches and informal diagrams (such as class diagrams, component diagrams, ...) with formal models like component models. -## Linking Sketches and Software Architecture (LiSSA) The following diagram shows the pipeline that is planned for the LiSSA approach. ```mermaid @@ -14,7 +15,7 @@ stateDiagram-v2 RecommendationGeneration ConnectionGeneration InconsistencyDetection - + DiagramDetection --> RecommendationGeneration TextPreprocessing --> TextExtraction ArchitectureModel --> RecommendationGeneration diff --git a/docs/Pipeline.md b/docs/Pipeline.md new file mode 100644 index 000000000..ee5900b99 --- /dev/null +++ b/docs/Pipeline.md @@ -0,0 +1,22 @@ + +```mermaid +classDiagram + class AbstractPipeline + class Pipeline + class PipelineStep + + Pipeline--> "*" AbstractPipeline + Pipeline..|>AbstractPipeline + PipelineStep..|>AbstractPipeline +``` + +For the pipeline definition, we use a composite to allow us to have a multi-level pipeline. +As such, a pipeline consists of an arbitrary number of either *PipelineStep*s or further *Pipeline*s. + +In our approach, we use three levels for our pipeline: +On the first level, the overall pipeline defines multiple *stages*, e.g., text preprocessing or element connection. +Each stage is another pipeline that then defines *agents* that have the purpose of initiating the processing and of collecting the information of the various heuristics. +Agents then use *Informants* as concrete PipelineSteps to execute the processing and heuristics. + +A pipeline step (i.e., an Informant) stores results within a repository that can be universally accessed by all pipeline steps, similarly to a blackboard in the blackboard pattern. +This way, each pipeline step and, thus, each heuristic can access the results of previous steps and provide its results for others. diff --git a/docs/Profiles.md b/docs/Profiles.md deleted file mode 100644 index 560a781a1..000000000 --- a/docs/Profiles.md +++ /dev/null @@ -1,32 +0,0 @@ -ArDoCo uses maven profiles to provide subsets of its functionality and speed up development time. - -## Current Profiles - -* **complete** (activated by default) -* **deployment** (profile for deployment to maven central) -* **tlr** (profile for traceability link recovery) -* **inconsistency** (profile for inconsistency detection) - -## Adding new profiles - -In order to add a new profile, you have to extend the profile section in the main pom.xml (as well as in all submodules -that contain submodules; i.e., stages, tests) - -```xml - - - - new-profile-id - - false - - - framework - pipeline - stages - tests - - -``` - - diff --git a/docs/Quickstart.md b/docs/Quickstart.md index aecce250e..78a4a20aa 100644 --- a/docs/Quickstart.md +++ b/docs/Quickstart.md @@ -1,7 +1,5 @@ -The ArDoCo-Core is a maven project and can be embedded by using its specs (from -the [pom](https://github.com/ArDoCo/Core/blob/main/pom.xml)). -You can run and configure the execution with the CLI. +The ArDoCo is a maven project and can be embedded by using its specs (from the [pom](https://github.com/ArDoCo/Core/blob/main/pom.xml)). Please acknowledge the [code of conduct](https://github.com/ArDoCo/Core/blob/main/CODE_OF_CONDUCT.md). @@ -29,32 +27,10 @@ Follow the following steps to do so: Please use the provided [formatter](https://github.com/ArDoCo/Core/blob/main/formatter.xml) when contributing. -Additionally, make use of the spotless-plugin for maven to format your code. You can run it via mvn spotless: -apply ([more info](https://github.com/diffplug/spotless/tree/main/plugin-maven)). +Additionally, please make use of the spotless-plugin for maven to format your code. You can run it via `mvn spotless:apply` ([more info about spotless](https://github.com/diffplug/spotless/tree/main/plugin-maven)). -### Documentation -⚠️ WIP - -## Command Line Interface (CLI) - -[ArDoCo CLI](https://github.com/ArDoCo/CLI) contains a CLI that supports the execution of ArDoCo. - -It is necessary to specify an input model as well as a textual documentation. Usually, our model is an architectural -model. However, the model can also contain a (Java) code model that you can insert using -the [CodeModelExtractors](https://github.com/ArDoCo/Core/tree/main/framework/java-model-extractor). - -All results (trace links, inconsistencies, etc. between the input model and documentation) are written to the specified -output location. - -The [CLI](https://github.com/ArDoCo/CLI/blob/main/src/main/java/edu/kit/kastel/mcse/ardoco/core/pipeline/ArDoCoCLI.java) -is part of the [CLI project](https://github.com/ArDoCo/CLI) of ArDoCo. - -## Standard Configuration - -⚠️ WIP - -## Save Actions (Eclipse) +### Save Actions (Eclipse) Go to your Eclipse Workspace folder and open the file `.metadata/.plugins/org.eclipse.core.runtime/.settings/org.eclipse.jdt.ui.prefs`. diff --git a/docs/SAD-SAM-Code-TLR.md b/docs/SAD-SAM-Code-TLR.md deleted file mode 100644 index 2fe548032..000000000 --- a/docs/SAD-SAM-Code-TLR.md +++ /dev/null @@ -1,6 +0,0 @@ -# Traceability Link Recovery between Software Architecture Documentations (SADs) and Code via Software Architecture Models (SAMs) - -To recover trace links between SADs and code, we combine the traceability link recovery between [SAD-SAM](SAD-SAM-TLR.md) and [SAM-Code](SAM-Code-TLR.md). -Both approaches are executed and their resulting trace links combined via `TransitiveTraceLinks` that match the parts of the documentation to the parts of code -using the model. - diff --git a/docs/SAD-SAM-TLR.md b/docs/SAD-SAM-TLR.md deleted file mode 100644 index d894c0760..000000000 --- a/docs/SAD-SAM-TLR.md +++ /dev/null @@ -1,9 +0,0 @@ -# Traceability Link Recovery between Software Architecture Documentations (SADs) and Software Architecture Models (SAMs) - -To recover trace links between SADs and SAMs, we use a pipeline approach with different major processing steps: - -1. Model Extraction: Processes the architecture model. -2. Text Preprocessing: Processes the text initially, including basic text processing like tokenization, part-of-speech tagging, dependency parsing. -3. Text Extraction: Identification of potential parts of interest in the text. -4. Recommendation Generator: Further processing of interesting parts of text to generate recommendations for parts that should/could be model elements. -5. Connection Generator: Mapping of recommended parts to model parts. diff --git a/docs/SAM-Code-TLR.md b/docs/SAM-Code-TLR.md deleted file mode 100644 index 873819ffd..000000000 --- a/docs/SAM-Code-TLR.md +++ /dev/null @@ -1,11 +0,0 @@ -# Traceability Link Recovery between Software Architecture Models (SAMs) and Code - -The sub-project ARCOTL (Architecture-Code-Trace Links) aims to automatically generate trace links between code and a model of the architecture. -It supports multiple programming languages for the code (Java and Shell) and metamodels for the architecture model (PCM and UML). -To this end the project introduces intermediate models for the architecture (AMTL - Architecture Model for Trace Links) and Code (CMTL - Code Model for Trace -Links). -Trace links are created between instances of these metamodels. -The trace links each have exactly one architecture endpoint and one code endpoint. This is specified by the TLM (Trace Link Model). -The AMTL- and CMTL-instances get extracted from the architecture model and from the code. - - diff --git a/docs/Text-Preprocessing-Microservice.md b/docs/Text-Preprocessing-Microservice.md new file mode 100644 index 000000000..5d5466828 --- /dev/null +++ b/docs/Text-Preprocessing-Microservice.md @@ -0,0 +1,19 @@ + +Text preprocessing works locally, but there is also the option to host a **microservice** for this. +The benefit is that the models do not need to be loaded each time, saving some runtime (and local memory). + +The microservice can be found at [ArDoCo/StanfordCoreNLP-Provider-Service](https://github.com/ArDoCo/StanfordCoreNLP-Provider-Service/). + +The microservice is secured with credentials and the usage of the microservice needs to be activated and the URL of the microservice configured. +These settings can be provided to the execution via environment variables. +To do so, set the following variables: + +```env +NLP_PROVIDER_SOURCE=microservice +MICROSERVICE_URL=[microservice_url] +SCNLP_SERVICE_USER=[your_username] +SCNLP_SERVICE_PASSWORD=[your_password] +``` + +The first variable `NLP_PROVIDER_SOURCE=microservice` activates the microservice usage. +The next three variables configure the connection, and you need to provide the configuration for your deployed microservice. \ No newline at end of file diff --git a/docs/Traceability-Link-Recovery.md b/docs/Traceability-Link-Recovery.md new file mode 100644 index 000000000..fddf6de94 --- /dev/null +++ b/docs/Traceability-Link-Recovery.md @@ -0,0 +1,32 @@ + +There are currently three kinds of TLR approaches that we describe in their corresponding sections: + +* [Software Architecture Documentation (SAD) to Model (SAM)](#sad-sam) +* [Software Architecture Model (SAM) to Code](#sam-code) +* [SAD to Code via SAM](#sad-sam-code) + + +## SAD-SAM + +For Traceability Link Recovery between Software Architecture Documentations (SADs) and Software Architecture Models (SAMs), we use a pipeline approach with different major processing steps: + +1. Model Extraction: Processes the architecture model. +2. Text Preprocessing: Processes the text initially, including basic text processing like tokenization, part-of-speech tagging, dependency parsing. +3. Text Extraction: Identification of potential parts of interest in the text. +4. Recommendation Generator: Further processing of interesting parts of text to generate recommendations for parts that should/could be model elements. +5. Connection Generator: Mapping of recommended parts to model parts. + +## SAM-Code + +The project ARCOTL (Architecture-Code-Trace Links) aims to automatically generate trace links between Code and a Software Architecture Model (SAM). +It supports multiple programming languages for the code (Java and Shell) and metamodels for the architecture model (PCM and UML). +To this end the project introduces intermediate models for the architecture (AMTL - Architecture Model for Trace Links) and Code (CMTL - Code Model for Trace +Links). +Trace links are created between instances of these metamodels. +The trace links each have exactly one architecture endpoint and one code endpoint. This is specified by the TLM (Trace Link Model). +The AMTL- and CMTL-instances get extracted from the architecture model and from the code. + +## SAD-SAM-Code + +To recover trace links between SADs and code, we combine the traceability link recovery between [SAD-SAM](#sad-sam) and [SAM-Code](#sam-code). +Both approaches are executed, and their resulting trace links combined via `TransitiveTraceLinks` that match the parts of the documentation to the parts of code using the model. diff --git a/docs/_Footer.md b/docs/_Footer.md new file mode 100644 index 000000000..41b176f5a --- /dev/null +++ b/docs/_Footer.md @@ -0,0 +1 @@ +[ArDoCo](https://ardoco.de): **Ar**chitecture **Do**cumentation **Co**nsistency - Providing TLR and Inconsistency Detection between formal models and informal documentation diff --git a/docs/_Sidebar.md b/docs/_Sidebar.md new file mode 100644 index 000000000..f3f660e84 --- /dev/null +++ b/docs/_Sidebar.md @@ -0,0 +1,15 @@ +# Table of Contents + +1. [Home](ardoco) +2. [Quickstart](quickstart) +3. [Pipeline](pipeline) +4. [Intermediate Artifacts](intermediate-artifacts) + 1. [SAM](intermediate-artifacts#software-architecture-models) + 1. [Code](intermediate-artifacts#code) +5. [Text Preprocessing Microservice](Text-Preprocessing-Microservice) +6. [Traceability Link Recovery (TLR)](traceability-link-recovery) + 1. [SAD-SAM](traceability-link-recovery#sad-sam) + 1. [SAM-Code](traceability-link-recovery#sam-code) + 1. [SAD-SAM-Code](traceability-link-recovery#sad-sam-code) +7. [Inconsistency Detection (ID)](Inconsistency-Detection) +8. [LiSSA](lissa) \ No newline at end of file diff --git a/framework/common/pom.xml b/framework/common/pom.xml index 6ea57e8fd..3d76fb603 100644 --- a/framework/common/pom.xml +++ b/framework/common/pom.xml @@ -8,9 +8,13 @@ common - Common Utilities and Pipeline Definitions + ArDoCo Common Utilities and Pipeline Definitions + + com.fasterxml.jackson.core + jackson-annotations + com.fasterxml.jackson.core jackson-core @@ -24,9 +28,13 @@ commons-io - info.debatty - java-string-similarity - 2.0.0 + org.apache.commons + commons-compress + 1.26.0 + + + org.apache.commons + commons-lang3 org.apache.commons @@ -37,6 +45,11 @@ jena-arq 4.10.0 + + org.apache.jena + jena-core + 4.10.0 + org.apache.opennlp opennlp-tools @@ -46,6 +59,10 @@ org.eclipse.collections eclipse-collections + + org.eclipse.collections + eclipse-collections-api + org.eclipse.jgit org.eclipse.jgit @@ -60,16 +77,6 @@ jsoup 1.17.2 - - org.junit.jupiter - junit-jupiter-engine - test - - - org.slf4j - slf4j-simple - test - org.xerial sqlite-jdbc diff --git a/framework/pom.xml b/framework/pom.xml index f12825792..9c6f7b4ab 100644 --- a/framework/pom.xml +++ b/framework/pom.xml @@ -10,12 +10,8 @@ framework pom - InFormALin Framework - The goal of this project was to connect informal artifacts like architecture documentation and formal - artifacts like models. The InFormALin Framework is actively developed by researchers of the Modelling for - Continuous Software Engineering (MCSE) group of KASTEL - Institute of Information Security and Dependability at - the KIT. This work was supported by funding from the topic Engineering Secure Systems of the Helmholtz - Association (HGF) and by KASTEL Security Research Labs (46.23.01). + ArDoCo Framework + This framework contains the code for defining the architecture, data structures, and interfaces of the ArDoCo approach as well as the common code. common diff --git a/framework/text-provider-json/pom.xml b/framework/text-provider-json/pom.xml index 1e9115ceb..1e6d95b32 100644 --- a/framework/text-provider-json/pom.xml +++ b/framework/text-provider-json/pom.xml @@ -11,22 +11,22 @@ text-provider-json jar TextProvider JSON + Definition of the JSON schema for the TextProvider com.fasterxml.jackson.core - jackson-databind - ${jackson.version} + jackson-annotations - com.fasterxml.jackson.datatype - jackson-datatype-jsr310 + com.fasterxml.jackson.core + jackson-databind ${jackson.version} com.networknt json-schema-validator - 1.3.2 + 1.3.3 io.github.ardoco.core @@ -34,19 +34,12 @@ ${ardoco.version} - io.vertx - vertx-json-schema - 4.5.3 - - - org.junit.jupiter - junit-jupiter-engine - test + org.apache.commons + commons-lang3 - org.slf4j - slf4j-simple - test + org.eclipse.collections + eclipse-collections-api diff --git a/pipeline-core/pom.xml b/pipeline-core/pom.xml index acfd402ca..e8a773cc1 100644 --- a/pipeline-core/pom.xml +++ b/pipeline-core/pom.xml @@ -9,6 +9,8 @@ pipeline-core + Pipeline Core + This module contains the pipeline definition or ArDoCo. @@ -17,23 +19,13 @@ ${revision} - org.junit.jupiter - junit-jupiter-engine - test + org.eclipse.collections + eclipse-collections-api org.reflections reflections - - org.slf4j - log4j-over-slf4j - - - org.slf4j - slf4j-simple - test - diff --git a/pom.xml b/pom.xml index 2bacee417..687612ff2 100644 --- a/pom.xml +++ b/pom.xml @@ -7,15 +7,12 @@ ${revision} pom - ArDoCo (Core) - The Consistency Analyzer + ArDoCo - The Consistency Analyzer: Core Framework The goal of this project is to connect architecture documentation and models while identifying missing or deviating elements (inconsistencies). An element can be any representable item of the model, like a component or a relation. To do so, we first create trace links and then make use of them and other information to identify inconsistencies. ArDoCo is actively developed by researchers of the Modelling for Continuous Software - Engineering (MCSE) group of KASTEL - Institute of Information Security and Dependability at the KIT. - This work was supported by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) - and by KASTEL Security Research Labs - (46.23.01). + Engineering (MCSE) group of KASTEL - Institute of Information Security and Dependability at the KIT. https://github.com/ArDoCo/Core @@ -71,7 +68,7 @@ - 0.42.0-SNAPSHOT + 1.0.0-SNAPSHOT ${revision} UTF-8 UTF-8 @@ -145,7 +142,6 @@ 2.15.1 - org.apache.commons commons-lang3 @@ -173,8 +169,6 @@ eclipse-collections ${eclipse-collections.version} - - org.eclipse.collections eclipse-collections-api @@ -185,30 +179,6 @@ org.eclipse.jgit 6.8.0.202311291450-r - - org.junit.jupiter - junit-jupiter-api - ${junit.version} - test - - - org.junit.jupiter - junit-jupiter-engine - ${junit.version} - test - - - org.junit.jupiter - junit-jupiter-params - ${junit.version} - test - - - org.junit.vintage - junit-vintage-engine - ${junit.version} - test - org.mockito mockito-core @@ -253,6 +223,30 @@ error_prone_core ${error-prone.version} + + org.junit.jupiter + junit-jupiter-api + ${junit.version} + test + + + org.junit.jupiter + junit-jupiter-engine + ${junit.version} + test + + + org.junit.jupiter + junit-jupiter-params + ${junit.version} + test + + + org.junit.vintage + junit-vintage-engine + ${junit.version} + test + org.slf4j slf4j-api diff --git a/report/pom.xml b/report/pom.xml index 8ed3186d5..10b3820f1 100644 --- a/report/pom.xml +++ b/report/pom.xml @@ -10,6 +10,8 @@ report jar + Report + This module is used to create reports for SonarCloud and similar. true @@ -34,11 +36,6 @@ text-provider-json ${revision} - - org.junit.jupiter - junit-jupiter-engine - test -