Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
TODO: Check all documentation files for TODOs and resolve them
  • Loading branch information
trobanga committed Nov 12, 2024
1 parent b616891 commit 795c6d5
Show file tree
Hide file tree
Showing 7 changed files with 537 additions and 11 deletions.
2 changes: 1 addition & 1 deletion clinical-domain-agent/src/test/resources/application.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ spring:
runner:
maxConcurrency: 128
maxProcesses: 4
processTtlSeconds: 86400
processTtl: P1D


server:
Expand Down
44 changes: 43 additions & 1 deletion docs/development/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,48 @@ intended to guide developers through the codebase, architecture, and development
actively working to enhance and update this guide to provide a more comprehensive and up-to-date
resource for developers. Your feedback is valuable in this ongoing improvement process.*

```mermaid
sequenceDiagram
box Clinical Domain
participant cd_hds
participant CDA
end
box Trustcenter Domain
participant TCA
participant gICS
participant gPAS
end
box Research Domain
participant RDA
participant rd_hds
end
CDA ->> TCA: cd/consented-patients/{fetch,fetch-all}
TCA ->> gICS: {$allConsentsForDomain,$allConsentsForPerson}
gICS ->> TCA: [Patient ID]
TCA ->> CDA: [Patient ID]
loop Patient ID
CDA ->> cd_hds: fetch Patient ID
cd_hds ->> CDA: Patient
CDA ->> CDA: deidentify Patient
CDA ->> TCA: cd/transport-mapping(Patient ID, [ID])
TCA ->> gPAS: generate Secure ID
TCA ->> gPAS: generate ID Salt
TCA ->> gPAS: generate Date Shift Salt
TCA ->> TCA: generate [Transport ID] and Date Shift
TCA ->> CDA: mapName, [ID -> Transport ID] and Date Shift
CDA ->> RDA: process/{project}/patient(PatientBundle, mapName)
RDA ->> CDA: PROCESS_ID
RDA ->> TCA: rd/research-mapping(mapName)
TCA ->> RDA: [Transport ID -> Research ID], Date Shift Value
RDA ->> RDA: deidentify Patient
RDA ->> rd_hds: Bundle
CDA ->> RDA: status/PROCESS_ID
RDA ->> CDA: return Status
end
```

## Repository Structure

The project follows a structured organization to enhance readability and maintainability.
Expand All @@ -18,7 +60,7 @@ The project follows a structured organization to enhance readability and maintai
Markdown files with examples and detailed documentation for users and developers. Includes user
guides, developer guides, release steps, and more.

- [clinical-domain-agent/](clinical-domain-agent)`
- [clinical-domain-agent/](clinical-domain-agent)
Java code, Dockerfile, CI config snippets, and Maven configuration (`pom.xml`) for the Clinical
Domain Agent.

Expand Down
34 changes: 29 additions & 5 deletions docs/development/trustcenter-agent.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,34 @@
# Trustcenter Agent (TCA)

## Overview

The TCA is required to ensure that

## Consent

The TCA offers an endpoint to receive a cohort of consented patients from gICS.
The TCA offers an endpoint to receive a cohort of consented patients
from [gICS](https://www.ths-greifswald.de/forscher/gics/).

## De-identification

The de-identification process maps the original ids (oID) from the clinical domain to
pseudonyms (sID) in the research domain and shifts all dates by a random (but for each patient
fixed) amount.
In the clinical domain, before sending resources the oIDs are replaced with transport IDs (tID),
which are replaced with their corresponding sIDs in the research domain.
The dates are shifted two times.
First in the clinical domain and then a second time in the research domain.
The date shift value the research domain applies contains the negative date shift value of the
clinical domain, i.e.
the first date shift is undone.
This leads to a uniform distribution (w.r.t. all patients) of the date shift values.

The de-identification process works by generating a pseudonym for the patientID, which can be used
to re-identify patients.
The other ids are hashed with sha256.
Therefore, a second pseudonym is generated that is used as salt for the hashing algorithm.
Next, for each id a random transport id (tID) is generated and a Map of the oIDs to

The role of the TCA for the de-identification consists of two parts:
First, the Pseudonym Provider, provides a mechanism that replaces the IDs of the CDA domain with
pseudonyms for the RDA domain, such that it is, without the TCA, impossible to re-identify
Expand All @@ -15,7 +38,8 @@ Second, the Shifted Dates Provider, offers a way to time-shift dates.
### Pseudonym Provider

We distinguish between the Patient ID (PID) and other Resource IDs (RID)s.
The original PID (oPID) from the CDA domain is sent to gPAS where a pseudonym or secure PID (sPID)
The original PID (oPID) from the CDA domain is sent
to [gPAS](https://www.ths-greifswald.de/forscher/gpas/) where a pseudonym or secure PID (sPID)
is created.
The oRIDs are hashed with sha256 to create secure RIDs (sRID)s.
Since we have no influence about the ids' length we add salt to the hash function.
Expand All @@ -34,9 +58,9 @@ sequenceDiagram
TCA ->> gPAS: pseudomizeOrCreate oPID, Salt_oPID
gPAS ->> TCA: oPID ➙ sPID, Salt_oPID ➙ Salt
TCA ->> Keystore: idMapName & idMap: <Map<oRID, sRID>>
TCA ->> CDA: IdMapName & IdMap: <Map<oRID, tRID>> & oPID ➙ tPID
CDA ->> RDA: Transport IDs: <Set<tRID>> & tPID
RDA ->> TCA: Transport IDs: <Set<tRID>> & tPID
TCA ->> CDA: idMapName & IdMap: <Map<oRID, tRID>> & oPID ➙ tPID
CDA ->> RDA: idMapName & Bundles
RDA ->> TCA: idMapName
TCA ->> Keystore: idMapName
Keystore ->> TCA: idMap: Map<tRID, sRID> & sPID
TCA ->> RDA: idMap: Map<tRID, sRID> & sPID
Expand Down
254 changes: 254 additions & 0 deletions docs/usage/clinical-domain-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Clinical Domain Agent (CDA)

The CDA's task is to select a cohort of patients, de-identify their data, and transfer it to the
research domain.

The transfer process consists of four steps, namely CohortSelection, DataSelection,
De-identification, and Bundle sending.
During CohortSelection a list of patients that gave the necessary consents as required in the
process definition.
The DataSelection fetches the patient's data bundles from the HDS,
which are pseudonymized and date shifted during De-identification and finally sent to the research
domain.

## Setup

### Docker

```shell
TODO
docker pull ftsnext/cda # or whatever
```

## Configuration

The configuration is separated into server and project parts.
The server is configured once via its `application.yaml` while each project has its own config file.

### Server Configuration

TODO: add comments to yaml
```yaml
projects:
directory: "src/test/resources/projects" # path with project config files

spring:
ssl:
bundle:
pem:
server:
keystore:
certificate: target/test-classes/server.crt
private-key: target/test-classes/server.key
truststore:
certificate: target/test-classes/ca.crt
client:
truststore:
certificate: target/test-classes/ca.crt

# Concurrency settings
runner:
# The concurrency for a transfer process is:
# processConcurrency = maxConcurrency / maxProcesses
maxConcurrency: 128
maxProcesses: 4
processTtl: P1D # how long to keep the finished process in memory, ISO8601

server:
ssl:
bundle: server

test:
webclient:
default:
ssl:
bundle: client

security:
endpoints:
- path: /api/v2/**
role: client

management:
endpoints:
web:
exposure:
include: [ "health", "info", "prometheus" ]

metrics:
distribution:
slo:
http.server.requests: 25,100,250,500,1000,10000
http.client.requests: 25,100,250,500,1000,2000,3000,4000,5000,6000,7000,8000,9000,10000
deidentify: 25,100,250,500,1000,10000
```
### Project Configuration
This section describes the configuration settings for a medical data processing system that handles
cohort selection, data selection, de-identification, and data transmission to a research domain.
Yaml files put into the directory defined by `projects.directory` of the server configuration are
considered project configuration files.
They must fulfil the following schema:

```yaml
cohortSelector:
trustCenterAgent:
server:
baseUrl: http://tc-agent:8080
domain: MII # domain in gICS
patientIdentifierSystem: "https://ths-greifswald.de/fhir/gics/identifiers/Pseudonym"
policySystem: "https://ths-greifswald.de/fhir/CodeSystem/gics/Policy"
policies: [ "IDAT_erheben", "IDAT_speichern_verarbeiten", "MDAT_erheben", "MDAT_speichern_verarbeiten" ]
dataSelector:
everything:
fhirServer:
baseUrl: http://cd-hds:8080/fhir
resolve:
patientIdentifierSystem: http://fts.smith.care
deidentificator:
deidentifhir:
tca:
server:
baseUrl: http://tc-agent:8080
domains:
# domains in gPAS for pseudonym and salt storage
# the same domain may be used for all three
pseudonym: MII
salt: MII-ID-Salt
dateShift: MII-DateShift-Salt
maxDateShift: P14D # ISO 8601
deidentifhirConfig: /app/config/deidentifhir/CDtoTransport.profile
scraperConfig: /app/config/deidentifhir/IDScraper.profile
bundleSender:
researchDomainAgent:
server:
baseUrl: http://rd-agent:8080
project: example # name of project in the research domain
```

The domains in gICS and gPAS must be present, i.e. they are not created automatically and the
transfer process will fail, if they are not present.

The system consists of four main components:

1. Cohort Selector
2. Data Selector
3. Deidentificator
4. Bundle Sender

#### Cohort Selector

The cohort selector component interfaces with the Trust Center Agent to manage patient cohorts and
their consent policies.

```yaml
cohortSelector:
trustCenterAgent:
server:
baseUrl: http://tc-agent:8080
domain: MII
patientIdentifierSystem: "https://ths-greifswald.de/fhir/gics/identifiers/Pseudonym"
policySystem: "https://ths-greifswald.de/fhir/CodeSystem/gics/Policy"
policies: [ "IDAT_erheben", "IDAT_speichern_verarbeiten", "MDAT_erheben", "MDAT_speichern_verarbeiten" ]
```

**Configuration Parameters:**

- `baseUrl`: The Trust Center Agent server endpoint
- `domain`: The domain identifier in gICS
- `patientIdentifierSystem`: The system URI for patient pseudonyms
- `policySystem`: The system URI for consent policies
- `policies`: Array of required consent policies

#### Data Selector

The data selector component retrieves patient data from the FHIR server.

```yaml
dataSelector:
everything:
fhirServer:
baseUrl: http://cd-hds:8080/fhir
resolve:
patientIdentifierSystem: http://fts.smith.care
```

**Configuration Parameters:**

- `baseUrl`: The FHIR server endpoint
- `patientIdentifierSystem`: The system URI for resolving patient identifiers

#### Deidentificator

The Deidentificator component handles data de-identification using the Deidentifhir service.

```yaml
deidentificator:
deidentifhir:
tca:
server:
baseUrl: http://tc-agent:8080
domains:
pseudonym: MII
salt: MII-ID-Salt
dateShift: MII-DateShift-Salt
maxDateShift: P14D
deidentifhirConfig: /app/config/deidentifhir/CDtoTransport.profile
scraperConfig: /app/config/deidentifhir/IDScraper.profile
```

**Configuration Parameters:**

- `baseUrl`: The Trust Center Agent server endpoint
- `domains`: Domain configurations for different aspects of de-identification
- `pseudonym`: Domain for pseudonym storage
- `salt`: Domain for pseudonym salt storage
- `dateShift`: Domain for date shift salt storage
- `maxDateShift`: Maximum date shift in ISO 8601 duration format
- `deidentifhirConfig`: Path to the Deidentifhir configuration profile
- `scraperConfig`: Path to the ID scraper configuration profile

#### Bundle Sender

The bundle sender component transmits the processed data to the research domain.

```yaml
bundleSender:
researchDomainAgent:
server:
baseUrl: http://rd-agent:8080
project: example
```

**Configuration Parameters:**

- `baseUrl`: The Research Domain Agent server endpoint
- `project`: The target project name in the research domain

## Usage

For each project, the CDA offers an endpoint to start the transfer process.
The caller may add a list of patient IDs to be transferred to the request's body.
If no data is submitted all consents from gICS are fetched and processed.

To start a transfer process for the project $PROJECT run

```shell
curl -X POST -w "%header{Content-Location}" http://cd-agent:8080/api/v2/process/${PROJECT}/start
```

or

```shell
TODO: check if correct
curl -X POST --data '["id1", "id2", "id3"]' -H "Content-Type: application/json" \
-w "%header{Content-Location}" "${cd_agent_base_url}/api/v2/process/${1}/start"
```

The Content-Location field of the response's header contains a link to retrieve the process's
status.
Loading

0 comments on commit 795c6d5

Please sign in to comment.