From 5ad0ab9012f9e547eb7349a341d3069ac3476423 Mon Sep 17 00:00:00 2001 From: Daniel Hahne Date: Fri, 13 Sep 2024 13:01:18 +0200 Subject: [PATCH] WIP TODO: Check all documentation files for TODOs and resolve them --- .../src/test/resources/application.yaml | 2 +- docs/development/index.md | 44 ++- docs/development/trustcenter-agent.md | 34 +- docs/usage/clinical-domain-agent.md | 297 ++++++++++++++++++ docs/usage/index.md | 93 +++++- docs/usage/research-domain-agent.md | 110 +++++++ docs/usage/trustcenter-agent.md | 65 ++++ 7 files changed, 634 insertions(+), 11 deletions(-) create mode 100644 docs/usage/clinical-domain-agent.md create mode 100644 docs/usage/research-domain-agent.md create mode 100644 docs/usage/trustcenter-agent.md diff --git a/clinical-domain-agent/src/test/resources/application.yaml b/clinical-domain-agent/src/test/resources/application.yaml index e852e22c..048562ef 100644 --- a/clinical-domain-agent/src/test/resources/application.yaml +++ b/clinical-domain-agent/src/test/resources/application.yaml @@ -18,7 +18,7 @@ spring: runner: maxConcurrency: 128 maxProcesses: 4 - processTtlSeconds: 86400 + processTtl: P1D server: diff --git a/docs/development/index.md b/docs/development/index.md index aef6da34..b18e8184 100644 --- a/docs/development/index.md +++ b/docs/development/index.md @@ -7,6 +7,48 @@ intended to guide developers through the codebase, architecture, and development actively working to enhance and update this guide to provide a more comprehensive and up-to-date resource for developers. Your feedback is valuable in this ongoing improvement process.* +```mermaid +sequenceDiagram + box Clinical Domain + participant cd_hds + participant CDA + end + box Trustcenter Domain + participant TCA + participant gICS + participant gPAS + end + box Research Domain + participant RDA + participant rd_hds + end + + CDA ->> TCA: cd/consented-patients/{fetch,fetch-all} + TCA ->> gICS: {$allConsentsForDomain,$allConsentsForPerson} + gICS ->> TCA: [Patient ID] + TCA ->> CDA: [Patient ID] + + loop Patient ID + CDA ->> cd_hds: fetch Patient ID + cd_hds ->> CDA: Patient + CDA ->> CDA: deidentify Patient + CDA ->> TCA: cd/transport-mapping(Patient ID, [ID]) + TCA ->> gPAS: generate Secure ID + TCA ->> gPAS: generate ID Salt + TCA ->> gPAS: generate Date Shift Salt + TCA ->> TCA: generate [Transport ID] and Date Shift + TCA ->> CDA: mapName, [ID -> Transport ID] and Date Shift + CDA ->> RDA: process/{project}/patient(PatientBundle, mapName) + RDA ->> CDA: PROCESS_ID + RDA ->> TCA: rd/research-mapping(mapName) + TCA ->> RDA: [Transport ID -> Research ID], Date Shift Value + RDA ->> RDA: deidentify Patient + RDA ->> rd_hds: Bundle + CDA ->> RDA: status/PROCESS_ID + RDA ->> CDA: return Status + end +``` + ## Repository Structure The project follows a structured organization to enhance readability and maintainability. @@ -18,7 +60,7 @@ The project follows a structured organization to enhance readability and maintai Markdown files with examples and detailed documentation for users and developers. Includes user guides, developer guides, release steps, and more. -- [clinical-domain-agent/](clinical-domain-agent)` +- [clinical-domain-agent/](clinical-domain-agent) Java code, Dockerfile, CI config snippets, and Maven configuration (`pom.xml`) for the Clinical Domain Agent. diff --git a/docs/development/trustcenter-agent.md b/docs/development/trustcenter-agent.md index ede72202..1464236a 100644 --- a/docs/development/trustcenter-agent.md +++ b/docs/development/trustcenter-agent.md @@ -1,11 +1,34 @@ # Trustcenter Agent (TCA) +## Overview + +The TCA is required to ensure that + ## Consent -The TCA offers an endpoint to receive a cohort of consented patients from gICS. +The TCA offers an endpoint to receive a cohort of consented patients +from [gICS](https://www.ths-greifswald.de/forscher/gics/). ## De-identification +The de-identification process maps the original ids (oID) from the clinical domain to +pseudonyms (sID) in the research domain and shifts all dates by a random (but for each patient +fixed) amount. +In the clinical domain, before sending resources the oIDs are replaced with transport IDs (tID), +which are replaced with their corresponding sIDs in the research domain. +The dates are shifted two times. +First in the clinical domain and then a second time in the research domain. +The date shift value the research domain applies contains the negative date shift value of the +clinical domain, i.e. +the first date shift is undone. +This leads to a uniform distribution (w.r.t. all patients) of the date shift values. + +The de-identification process works by generating a pseudonym for the patientID, which can be used +to re-identify patients. +The other ids are hashed with sha256. +Therefore, a second pseudonym is generated that is used as salt for the hashing algorithm. +Next, for each id a random transport id (tID) is generated and a Map of the oIDs to + The role of the TCA for the de-identification consists of two parts: First, the Pseudonym Provider, provides a mechanism that replaces the IDs of the CDA domain with pseudonyms for the RDA domain, such that it is, without the TCA, impossible to re-identify @@ -15,7 +38,8 @@ Second, the Shifted Dates Provider, offers a way to time-shift dates. ### Pseudonym Provider We distinguish between the Patient ID (PID) and other Resource IDs (RID)s. -The original PID (oPID) from the CDA domain is sent to gPAS where a pseudonym or secure PID (sPID) +The original PID (oPID) from the CDA domain is sent +to [gPAS](https://www.ths-greifswald.de/forscher/gpas/) where a pseudonym or secure PID (sPID) is created. The oRIDs are hashed with sha256 to create secure RIDs (sRID)s. Since we have no influence about the ids' length we add salt to the hash function. @@ -34,9 +58,9 @@ sequenceDiagram TCA ->> gPAS: pseudomizeOrCreate oPID, Salt_oPID gPAS ->> TCA: oPID ➙ sPID, Salt_oPID ➙ Salt TCA ->> Keystore: idMapName & idMap: > - TCA ->> CDA: IdMapName & IdMap: > & oPID ➙ tPID - CDA ->> RDA: Transport IDs: > & tPID - RDA ->> TCA: Transport IDs: > & tPID + TCA ->> CDA: idMapName & IdMap: > & oPID ➙ tPID + CDA ->> RDA: idMapName & Bundles + RDA ->> TCA: idMapName TCA ->> Keystore: idMapName Keystore ->> TCA: idMap: Map & sPID TCA ->> RDA: idMap: Map & sPID diff --git a/docs/usage/clinical-domain-agent.md b/docs/usage/clinical-domain-agent.md new file mode 100644 index 00000000..72300063 --- /dev/null +++ b/docs/usage/clinical-domain-agent.md @@ -0,0 +1,297 @@ +# Clinical Domain Agent (CDA) + +The CDA's task is to select a cohort of patients, de-identify their data, and transfer it to the +research domain. + +The transfer process consists of four steps, namely CohortSelection, DataSelection, +De-identification, and Bundle sending. +During CohortSelection a list of patients that gave the necessary consents as required in the +process definition. +The DataSelection fetches the patient's data bundles from the HDS, +which are pseudonymized and date shifted during De-identification and finally sent to the research +domain. + +## Setup + +### Docker + +```shell +TODO +docker pull ftsnext/cda # or whatever +``` + +## Configuration + +The configuration is separated into server and project parts. +The server is configured once via its `application.yaml` while each project has its own config file. + +### Server Configuration + +TODO: add comments to yaml + +#### application.yml +```yaml +projects: + directory: "src/test/resources/projects" # path with project config files + +spring: + ssl: + bundle: + pem: + server: + keystore: + certificate: target/test-classes/server.crt + private-key: target/test-classes/server.key + truststore: + certificate: target/test-classes/ca.crt + client: + truststore: + certificate: target/test-classes/ca.crt + +# Concurrency settings +runner: + # The concurrency for a transfer process is: + # processConcurrency = maxConcurrency / maxProcesses + maxConcurrency: 128 + maxProcesses: 4 + processTtl: P1D # how long to keep the finished process in memory, ISO8601 + +server: + ssl: + bundle: server + +test: + webclient: + default: + ssl: + bundle: client + +security: + endpoints: + - path: /api/v2/** + role: client + +management: + endpoints: + web: + exposure: + include: [ "health", "info", "prometheus" ] + + metrics: + distribution: + slo: + http.server.requests: 25,100,250,500,1000,10000 + http.client.requests: 25,100,250,500,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000 + deidentify: 25,100,250,500,1000,10000 +``` + +### Project Configuration + +This section describes the configuration settings for a medical data processing system that handles +cohort selection, data selection, de-identification, and data transmission to a research domain. + +Yaml files put into the directory defined by `projects.directory` of the server configuration are +considered project configuration files. +They must fulfil the following schema: + +#### example.yml +```yaml +cohortSelector: + trustCenterAgent: + server: + baseUrl: http://tc-agent:8080 + domain: MII # domain in gICS + patientIdentifierSystem: "https://ths-greifswald.de/fhir/gics/identifiers/Pseudonym" + policySystem: "https://ths-greifswald.de/fhir/CodeSystem/gics/Policy" + policies: [ "IDAT_erheben", "IDAT_speichern_verarbeiten", "MDAT_erheben", "MDAT_speichern_verarbeiten" ] + +dataSelector: + everything: + fhirServer: + baseUrl: http://cd-hds:8080/fhir + resolve: + patientIdentifierSystem: http://fts.smith.care + +deidentificator: + deidentifhir: + tca: + server: + baseUrl: http://tc-agent:8080 + domains: + # domains in gPAS for pseudonym and salt storage + # the same domain may be used for all three + pseudonym: MII + salt: MII-ID-Salt + dateShift: MII-DateShift-Salt + maxDateShift: P14D # ISO 8601 + deidentifhirConfig: /app/config/deidentifhir/CDtoTransport.profile + scraperConfig: /app/config/deidentifhir/IDScraper.profile + +bundleSender: + researchDomainAgent: + server: + baseUrl: http://rd-agent:8080 + project: example # name of project in the research domain +``` + +The domains in gICS and gPAS must be present, i.e. they are not created automatically and the +transfer process will fail, if they are not present. + +The system consists of four main components: + +1. Cohort Selector +2. Data Selector +3. Deidentificator +4. Bundle Sender + +#### Cohort Selector + +The cohort selector component interfaces with the Trust Center Agent to manage patient cohorts and +their consent policies. + +```yaml +cohortSelector: + trustCenterAgent: + server: + baseUrl: http://tc-agent:8080 + domain: MII + patientIdentifierSystem: "https://ths-greifswald.de/fhir/gics/identifiers/Pseudonym" + policySystem: "https://ths-greifswald.de/fhir/CodeSystem/gics/Policy" + policies: [ "IDAT_erheben", "IDAT_speichern_verarbeiten", "MDAT_erheben", "MDAT_speichern_verarbeiten" ] +``` + +**Configuration Parameters:** + +- `baseUrl`: The Trust Center Agent server endpoint +- `domain`: The domain identifier in gICS +- `patientIdentifierSystem`: The system URI for patient pseudonyms +- `policySystem`: The system URI for consent policies +- `policies`: Array of required consent policies + +#### Data Selector + +The data selector component retrieves patient data from the FHIR server. + +```yaml +dataSelector: + everything: + fhirServer: + baseUrl: http://cd-hds:8080/fhir + resolve: + patientIdentifierSystem: http://fts.smith.care +``` + +**Configuration Parameters:** + +- `baseUrl`: The FHIR server endpoint +- `patientIdentifierSystem`: The system URI for resolving patient identifiers + +#### Deidentificator + +The Deidentificator component handles data de-identification using the Deidentifhir service. + +```yaml +deidentificator: + deidentifhir: + tca: + server: + baseUrl: http://tc-agent:8080 + domains: + pseudonym: MII + salt: MII-ID-Salt + dateShift: MII-DateShift-Salt + maxDateShift: P14D + deidentifhirConfig: /app/config/deidentifhir/CDtoTransport.profile + scraperConfig: /app/config/deidentifhir/IDScraper.profile +``` + +**Configuration Parameters:** + +- `baseUrl`: The Trust Center Agent server endpoint +- `domains`: Domain configurations for different aspects of de-identification +- `pseudonym`: Domain for pseudonym storage +- `salt`: Domain for pseudonym salt storage +- `dateShift`: Domain for date shift salt storage +- `maxDateShift`: Maximum date shift in ISO 8601 duration format +- `deidentifhirConfig`: Path to the Deidentifhir configuration profile +- `scraperConfig`: Path to the ID scraper configuration profile + +#### Bundle Sender + +The bundle sender component transmits the processed data to the research domain. + +```yaml +bundleSender: + researchDomainAgent: + server: + baseUrl: http://rd-agent:8080 + project: example +``` + +**Configuration Parameters:** + +- `baseUrl`: The Research Domain Agent server endpoint +- `project`: The target project name in the research domain + +## Usage + +For each project, the CDA offers an endpoint to start the transfer process. +The caller may add a list of patient IDs to be transferred to the request's body. +If no data is submitted all consents from gICS are fetched and processed. + +To start a transfer process for the project $PROJECT run + +```shell +curl -X POST -w "%header{Content-Location}" http://cd-agent:8080/api/v2/process/${PROJECT}/start +``` + +or + +```shell +TODO: check if correct +PATIENT_IDS=["id1", "id2", "id3"] +curl -X POST --data '${PATIENT_IDS}' -H "Content-Type: application/json" \ + -w "%header{Content-Location}" "${cd_agent_base_url}/api/v2/process/${1}/start" +``` + +The Content-Location field of the response's header contains a link to retrieve the process's +status. + +**Example** + +```shell +curl -sf "http://cd-agent:8080/api/v2/process/status/e17d319e-d967-467e-8c8a-0c464bb14951" +{"processId":"e17d319e-d967-467e-8c8a-0c464bb14951","phase":"COMPLETED","createdAt":[2024,11,13,8,35,35,262354492],"finishedAt":[2024,11,13,8,36,17,358171815],"totalPatients":100,"totalBundles":119,"deidentifiedBundles":118,"sentBundles":118,"skippedBundles":0} +``` + +### Status Fields Description + +processId +: process ID + +phase +: QUEUED, RUNNING, COMPLETED + +createdAt +: point in time when the process was created + +finishedAt +: point in time when the process finished + +totalPatients +: Total number of patients to be processed, may change while the process is running + +totalBundles +: Total number of bundles to be processed + +deidentifiedBundles +: Number of bundles after deidentification + +sentBundles +: Number of bundles sent to RDA + +skippedBundles +: Number of skipped bundles + +If the number of skippedBundles is greater than zero one should look into the logs to find the +cause. diff --git a/docs/usage/index.md b/docs/usage/index.md index b71a5521..07c8643f 100644 --- a/docs/usage/index.md +++ b/docs/usage/index.md @@ -1,7 +1,92 @@ # FHIR Transfer Services Usage Guide Welcome to the FHIR Transfer Services user guide! This guide will help you get started with running -the FHIR Transfer Services components: Clinical Domain Transfer Agent (CDA), Research Domain -Transfer Agent (RDA), and Trust Center Agent (TCA). As the three components are intended to be run -in different "domains" of a data integration center, running and configuring each component will be -described in respective documents. +the FHIR Transfer Services +components: [Clinical Domain Agent (CDA)](clinical-domain-agent), [Research Domain Agent (RDA)](research-domain-agent), +and [Trust Center Agent (TCA)](trustcenter-agent). +As the three components are intended to be run in different "domains" of a data integration center, +running and configuring each component will be described in respective documents. + +## Overview + +FTS is built for the transfer of FHIR data from the clinical domain to the research domain +while ensuring the Patients' anonymity. +Therefore, the data are de-identified by removing specific data, +replacing IDs with pseudonymized IDs, and shifting the dates by a random value. + +The following sequence diagram gives an overview of FTSnext's functionality. + +```mermaid +sequenceDiagram + CDA ->> TCA: request consented Patients + TCA ->> CDA: List of Patient IDs + + loop for each Patient ID + CDA ->> CDA: deidentify Patient + CDA ->> TCA: request Transport Mapping + TCA ->> CDA: Transport Mapping and Date Shift Value + CDA ->> RDA: Patient Bundle + RDA ->> TCA: request Research Mapping + TCA ->> RDA: Research Mapping + RDA ->> RDA: deidentify Patient + end +``` + +# Monitoring + +FTSnext provides a monitoring docker container with Grafana dashboards that show some metrics. +To work, the agent networks in `monitoring/compose.yaml` must be set accordingly. + +# Deployment + +For each agent, we offer a template docker setup for download. +For example, the cd-agent can be downloaded and unpacked with: + +```shell +TODO +curl download-link-to/cd-agent-template.zip +unzip cd-agent-template.zip +``` + +It will provide the following directory structure: + +```shell +cd-agent/ +├── application.yml +├── compose.yml +└── projects + ├── example + │ └── deidentifhir + └── example.yml +``` + +### compose.yml + +```yaml +name: clinical-domain + +services: + cd-agent: + image: ghcr.io/medizininformatik-initiative/fts/clinical-domain-agent:latest + ports: [ ":8080" ] + volumes: + - ./projects:/app/projects + healthcheck: + test: [ "CMD", "wget", "-qO-", "http://localhost:8080/actuator/health" ] + interval: 10s + timeout: 5s + retries: 3 + start_period: 60s +``` + +### application.yml + +```yaml + +``` + +### example.yml + +```yaml + +``` diff --git a/docs/usage/research-domain-agent.md b/docs/usage/research-domain-agent.md new file mode 100644 index 00000000..323bc62a --- /dev/null +++ b/docs/usage/research-domain-agent.md @@ -0,0 +1,110 @@ +# Research Domain Agent (RDA) + +## Configuration + + +```yaml +projects: + directory: "src/test/resources/projects" + +spring: + ssl: + bundle: + pem: + server: + keystore: + certificate: target/test-classes/server.crt + private-key: target/test-classes/server.key + truststore: + certificate: target/test-classes/ca.crt + client: + truststore: + certificate: target/test-classes/ca.crt + +server: + ssl: + bundle: server + +test: + webclient: + default: + ssl: + bundle: client + +security: + endpoints: + - path: /api/v2/** + role: cd-agent + +management: + endpoints: + web: + exposure: + include: [ "health", "info", "prometheus" ] + + metrics: + distribution.slo: + http.server.requests: 25,100,250,500,1000,10000 + http.client.requests: 25,100,250,500,1000,10000 + fetchResearchMapping: 5,10,25,100,250,500,1000,5000,10000 + deidentify: 25,100,250,500,1000,10000 + sendBundleToHds: 25,50,100,250,500,1000,2000,5000,10000 +``` + +### Project Configuration + +This section describes the changes to the configuration settings for the medical data processing +system. + +```yaml +deidentificator: + deidentifhir: + tca: + server: + baseUrl: http://tc-agent:8080 + deidentifhirConfig: /app/config/deidentifhir/TransportToRD.profile + +bundleSender: + fhirStore: + server: + baseUrl: http://rd-hds:8080/fhir +``` + +#### Deidentificator + +The deidentificator component has been updated to use a different Deidentifhir configuration +profile. + +```yaml +deidentificator: + deidentifhir: + tca: + server: + baseUrl: http://tc-agent:8080 + deidentifhirConfig: /app/config/deidentifhir/TransportToRD.profile +``` + +**Configuration Parameters:** + +- `baseUrl`: The Trust Center Agent server endpoint +- `deidentifhirConfig`: Path to the updated DEIDENTIFHIR configuration profile, + TransportToRD.profile + +This new profile, TransportToRD.profile, is likely tailored for the specific requirements of +transmitting de-identified data to the Research Domain. + +#### Bundle Sender + +The bundle sender component has been updated to use a FHIR store instead of the Research Domain +Agent. + +```yaml +bundleSender: + fhirStore: + server: + baseUrl: http://rd-hds:8080/fhir +``` + +**Configuration Parameters:** + +- `baseUrl`: The FHIR server endpoint for the Research Domain's FHIR store diff --git a/docs/usage/trustcenter-agent.md b/docs/usage/trustcenter-agent.md new file mode 100644 index 00000000..c28b95f5 --- /dev/null +++ b/docs/usage/trustcenter-agent.md @@ -0,0 +1,65 @@ +# Trustcenter Agent (TCA) + +The TCA is responsible for cohort selection and de-identification. +In the current implementation the Greifswald +tools [gICS](https://www.ths-greifswald.de/forscher/gics/) +and [gPAS](https://www.ths-greifswald.de/forscher/gpas/) +are utilized for cohort selection and de-identification, respectively. + +# Configuration + +```yaml +consent: + gics: + fhir: + baseUrl: http://gics:8080/ttp-fhir/fhir/gics + defaultPageSize: 200 + auth: + none: { } + +security: + endpoints: + - path: /api/v2/cd/** + role: cd + - path: /api/v2/rd/** + role: rd + +deIdentification: + keystoreUrl: redis://valkey:6379 + gpas: + fhir: + baseUrl: http://gpas:8080/ttp-fhir/fhir/gpas + auth: + none: { } + transport: + ttl: PT10M + +spring: + main: + allow-bean-definition-overriding: true + ssl: + bundle: + pem: + server: + keystore: + certificate: target/test-classes/server.crt + private-key: target/test-classes/server.key + truststore: + certificate: target/test-classes/ca.crt + client: + truststore: + certificate: target/test-classes/ca.crt + +server: + ssl: + bundle: server + +test: + webclient: + cd-agent: + ssl: + bundle: client + rd-agent: + ssl: + bundle: client +```