Skip to content

Commit

Permalink
Merge pull request #31 from swisscom/bugfix/retry-quits-after-n-tries…
Browse files Browse the repository at this point in the history
…-and-leaves-upload-file

Fix bug introduced in one of the recent refactorings: retry on server…
  • Loading branch information
rsteppac authored Feb 12, 2025
2 parents ce71497 + 6a3cf16 commit 46921c8
Show file tree
Hide file tree
Showing 6 changed files with 115 additions and 66 deletions.
58 changes: 42 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
The Swisscom Health Confidential Data Routing (CDR) Client

## Installation / Run the client
> Improvements for the installation are planned. For now, the client is only available as a jar file with manual steps required for the installation.
> Improvements for the installation are planned. For now, the client is only available as a jar file with manual steps
> required for the installation.
Pre-Requirements:
* Java 17 (or higher) installed
Expand All @@ -16,26 +17,43 @@ The application-customer.yaml file should contain the configuration for the clie
An example can be found [here](#application-customer-yaml-example).

Open a terminal and navigate to the directory where the jar file is located.
Run the following command to start the client (check the jar name and replace it in the command or rename the jar itself):
Run the following command to start the client (check the jar name and replace it in the command or rename the jar
itself):
> The -D parameters need to be placed before the "-jar cdr-client.jar".<p>
> The quotes are necessary for Windows, but not for Unix systems
```shell
java "-Dspring.profiles.active=customer" "-Dspring.config.additional-location=./application-customer.yaml" -jar cdr-client.jar
```

Check that no error messages are present in the terminal (or have a look at the "cdr-client.log" file that is created in the same folder as you've placed tha jar file)
and that the client is running.
Check that no error messages are present in the terminal (or have a look at the "cdr-client.log" file that is created
in the same folder as you've placed tha jar file) and that the client is running.

Configure an OS appropriate service to run the client as a background service.

## API
There is no endpoint offered here.

The CDR Client is triggered by a scheduler and downloads by the given delay time the files from the CDR API.
File uploads are triggered by the file system events.
Document downloads from the CDR API are triggered by a scheduler. Document uploads are triggered by file system
events (if available) and a scheduler.

### Functionality
For each defined connector the CDR Client calls the defined endpoint of the CDR API.
For each defined connector the CDR Client calls the CDR API endpoints for document download and document upload.

Optionally the client can be configured to renew its client credential on startup.

#### Client Credential Renewal

The client can be configured to renew its credential on startup. The default behavior is to not renew the credential.

Credential renewal only works if the client credential is configured in a properties or YAML file. If the client
detects multiple sources (system property, environment variable, etc.) for the credential or the source is not of
either of the two file types, then credential renewal won't be attempted. Likewise, if the user that runs the client
process does not have write permissions on the file.

During a successful client credential renewal all pre-existing client credentials are deleted. So, if you source the
client credential from a secure location (Hashicorp Vault, Azure Keyvault, etc., like you should) as part of your
deployment process, then, obviously, you should not enable credential renewal. A re-deployment would restore a
previous credential that is no longer valid, and you would be locked out of the client account until you manually
re-create a credential on the CDR website.

#### Document Download

Expand All @@ -44,7 +62,7 @@ The file is named after the received 'cdr-document-uuid' header that is a unique
After saving the file to the temporary folder, a delete request for the given 'cdr-document-uuid' is sent to the CDR API.
After successfully deleting the file in the CDR API, the file is moved to the connector defined 'target-folder'.

The temporary folders need to be monitored by another tool to make sure that no files are forgotten (should only happen if the move
The temporary folders need to be monitored make sure that no files get stranded there (should only happen if the move
to the destination folder is failing).

#### Document Upload
Expand All @@ -54,17 +72,18 @@ contents of every source folder at the configured interval and uploads all `.xml
event driven process listens for filesystem events from the same directories and uploads all `.xml` files as they
are created. The two approaches are combined so

* at start of the client all files that might have arrived while the client was not running are uploaded
* at the start of the client all files are uploaded that might have arrived while the client was not running
* folders on (remote) filesystems that do not support filesystem events can be used as source folders

If the filesystem that hosts a source folder supports filesystem events, then the polling process normally won't find
any files to process and immediately goes back to sleep. If the polling process wakes up right at the moment a new file
arrives, it might happen that both processes pick up the same file for processing. However, only one of the two will
continue to process the file, depending on which one is first to register the file for processing.

After the file is successfully uploaded it will be deleted.
If the upload failed with a response code of 4xx the file will be appended with '.error' and an additional file with the same name as the sent file, but with
the extension '.log' will be created and the received response body will be saved to this file.
After the file is successfully uploaded it is either deleted or archived, depending on the connector configuration.
If the upload failed with a response code of 4xx the file will be appended with '.error' and an additional file with
the same name as the file sent, but with
the extension '.log', will be created and the received response body will be saved to this file.
If the upload failed with a response code of 5xx the file will be retried indefinitely, assuming the root cause is
an infrastructure issue that will ultimately be resolved (and uploading another file would fail too, for the same
reason). See retry-delay in the [application-client.yaml](./src/main/resources/config/application-client.yaml) file.
Expand Down Expand Up @@ -126,10 +145,18 @@ client:
local-folder: /tmp/download/in-flight # temporary folder for files that are currently downloaded from CDR API
idp-credentials:
tenant-id: swisscom-health-tenant-id # provided by Swisscom Health
client-id: my-client-id # Self created through CDR UI
client-secret: my-secret # Self created through CDR UI
client-id: my-client-id # Self-service on CDR website
client-secret: my-secret # Self-service on CDR website
renew-credential-at-startup: false
cdr-api:
host: cdr.health.swisscom.ch
retry-delay:
- 1s # delay on first retry
- 2s
- 8s
- 32s
- 10m # delay after fifth retry and all following retries
file-busy-test-strategy: never_busy # valid values are `never_busy` and `file_size_changed`
customer:
- connector-id: 8000000000000 # provided by Swisscom Health
content-type: application/forumdatenaustausch+xml;charset=UTF-8
Expand All @@ -141,7 +168,6 @@ client:
target-folder: /tmp/download/8000000000000
source-folder: /tmp/upload/8000000000000
mode: production
```

Some information can also be set as environment variables. See [application-client.yaml](./src/main/resources/config/application-client.yaml) for variable names.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,11 @@ class CdrClientContext {

/**
* Creates a coroutine dispatcher for blocking I/O operations with limited parallelism.
*
* Note: As we use the non-blocking `delay()` to wait for the file-size-growth test and
* to wait in case a re-try of an upload is required, a lot more files than the configured
* thread pool size can be enrolled in the upload process at the same time. Only the
* blocking i/o operations are limited to the configured thread pool size.
*/
@OptIn(ExperimentalCoroutinesApi::class)
@Bean(name = ["limitedParallelismCdrUploadsDispatcher"])
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ class CdrApiClient(
}

else -> {
logger.error(t) { "Renewing client secret for client id '${cdrClientConfig.idpCredentials.clientId}' failed: ${t.message}" }
logger.error { "Renewing client secret for client id '${cdrClientConfig.idpCredentials.clientId}' failed: '$t'" }
RenewClientSecretResult.RenewError(message = t.message ?: "Unknown error", cause = t)
}
}
Expand Down Expand Up @@ -200,7 +200,7 @@ class CdrApiClient(
}.fold(
onSuccess = { it },
onFailure = { t ->
logger.error(t) { "Upload file '$file' failed: ${t.message}" }
logger.error { "Upload file '$file' failed: ${t.message}" }
UploadDocumentResult.UploadError(message = t.message ?: "Unknown error", t = t)
}
)
Expand Down Expand Up @@ -264,7 +264,7 @@ class CdrApiClient(
}.fold(
onSuccess = { it },
onFailure = { t ->
logger.error(t) { "Request file failed: ${t.message}" }
logger.error { "Request file failed: $t" }
DownloadDocumentResult.DownloadError(message = t.message ?: "Unknown error", t = t)
}
)
Expand Down Expand Up @@ -308,7 +308,7 @@ class CdrApiClient(
}.fold(
onSuccess = { it },
onFailure = { t ->
logger.error(t) { "Acknowledging pulled file with id '$downloadId' failed: '${t.message}'" }
logger.error { "Acknowledging pulled file with id '$downloadId' failed: '$t'" }
DownloadDocumentResult.DownloadError(message = t.message ?: "Unknown error", t = t)
}
)
Expand Down Expand Up @@ -430,7 +430,7 @@ class CdrApiClient(
}.fold(
onSuccess = { token: String -> token },
onFailure = { e ->
logger.error(e) { "Failed to get access token: ${e.message}" }
logger.error { "Failed to get access token: $e" }
""
}
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ import com.swisscom.health.des.cdr.client.config.CdrClientConfig
import com.swisscom.health.des.cdr.client.handler.CdrApiClient.UploadDocumentResult
import io.github.oshai.kotlinlogging.KotlinLogging
import io.micrometer.tracing.Tracer
import kotlinx.coroutines.delay
import kotlinx.coroutines.CancellationException
import org.springframework.stereotype.Component
import java.nio.file.Files
import java.nio.file.Path
Expand All @@ -14,6 +14,9 @@ import kotlin.io.path.deleteIfExists
import kotlin.io.path.moveTo
import kotlin.io.path.name
import kotlin.io.path.nameWithoutExtension
import kotlin.math.min
import kotlinx.coroutines.time.delay
import kotlin.io.path.exists


private val logger = KotlinLogging.logger {}
Expand All @@ -36,66 +39,79 @@ class RetryUploadFileHandling(
/**
* Retries the upload of a file until it is successful or a 4xx error occurred.
*/
@Suppress("NestedBlockDepth")
@Suppress("NestedBlockDepth", "LongMethod")
suspend fun uploadRetrying(file: Path, connector: CdrClientConfig.Connector) {
logger.debug { "Uploading file '$file'" }
var retryCount = -1
var retryCount = 0
var retryNeeded = false

// a successful rename of the file to upload should guarantee that we can also delete it after a successful upload,
// and thus prevent duplicate uploads of a file if we fail to delete or archive it after a successful upload
val uploadFile = file.moveTo(file.resolveSibling("${file.nameWithoutExtension}.upload"))

do {
if (retryNeeded) {
// FIXME: Cannot use delay() here because we might continue on another thread and we would either loose the span id or continue with
// the wrong span id (thread local)
delay(cdrClientConfig.retryDelay[retryCount].toMillis())
}
runCatching {
do {
val retryIndex = min(retryCount, cdrClientConfig.retryDelay.size - 1)

val response: UploadDocumentResult = cdrApiClient.uploadDocument(
contentType = connector.contentType.toString(),
file = uploadFile,
connectorId = connector.connectorId,
mode = connector.mode,
traceId = tracer.currentSpan()?.context()?.traceId() ?: ""
)

retryNeeded = when (response) {
is UploadDocumentResult.Success -> {
logger.debug { "File '${uploadFile.fileName}' successfully synchronized." }
deleteOrArchiveFile(uploadFile)
false
}
val response: UploadDocumentResult = cdrApiClient.uploadDocument(
contentType = connector.contentType.toString(),
file = uploadFile,
connectorId = connector.connectorId,
mode = connector.mode,
traceId = tracer.currentSpan()?.context()?.traceId() ?: ""
)

is UploadDocumentResult.UploadClientErrorResponse -> {
logger.error {
"File synchronization failed for '${uploadFile.fileName}'. Received a 4xx client error (response code: '${response.code}'). " +
"No retry will be attempted due to client-side issue."
retryNeeded = when (response) {
is UploadDocumentResult.Success -> {
logger.debug { "File '${uploadFile.fileName}' successfully synchronized." }
deleteOrArchiveFile(uploadFile)
false
}
renameFileToErrorAndCreateLogFile(uploadFile, response.responseBody)
false
}

is UploadDocumentResult.UploadServerErrorResponse -> {
logger.error {
"Failed to sync file '${uploadFile.fileName}', retry will be attempted in '${cdrClientConfig.retryDelay[retryCount + 1]}' - " +
"server response: '${response.responseBody}'"
is UploadDocumentResult.UploadClientErrorResponse -> {
logger.error {
"File synchronization failed for '${uploadFile.fileName}'. Received a 4xx client error (response code: '${response.code}'). " +
"No retry will be attempted due to client-side issue."
}
renameFileToErrorAndCreateLogFile(uploadFile, response.responseBody)
false
}
true
}

is UploadDocumentResult.UploadError -> {
logger.error {
"Failed to sync file '${uploadFile.fileName}', retry will be attempted in '${cdrClientConfig.retryDelay[retryCount + 1]}' - " +
"exception message: '${response.t?.message}'"
is UploadDocumentResult.UploadServerErrorResponse -> {
logger.error {
"Failed to sync file '${uploadFile.fileName}', retry will be attempted in '${cdrClientConfig.retryDelay[retryIndex]}' - " +
"server response: '${response.responseBody}'"
}
true
}

is UploadDocumentResult.UploadError -> {
logger.error {
"Failed to sync file '${uploadFile.fileName}', retry will be attempted in '${cdrClientConfig.retryDelay[retryIndex]}' - " +
"exception message: '${response.t?.message}'"
}
true
}
true
}
}

retryCount++
} while (retryNeeded && retryCount < cdrClientConfig.retryDelay.size)
if (retryNeeded) {
// FIXME: Cannot use delay() here because we might continue on another thread and we would either loose the span id or continue with
// the wrong span id (thread local)
delay(cdrClientConfig.retryDelay[retryIndex])
retryCount++
logger.info { "Retry attempt '#$retryCount' for file '${uploadFile.fileName}'" }
}
} while (retryNeeded)
}.fold(
onSuccess = { it },
onFailure = { t: Throwable ->
if (t is CancellationException) {
// we are getting shut down; moving the file back to its original location so it gets picked up again on restart
runCatching {if (uploadFile.exists()) uploadFile.moveTo(file) }
}
throw t
}
)
}

private fun deleteOrArchiveFile(file: Path): Unit = runCatching {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ class DocumentDownloadScheduler(
runCatching {
pullFileHandling.pullSyncConnector(connector)
}.onFailure {
logger.error(it) { "Error syncing connector '${connector.connectorId}'. Reason: ${it.message}" }
logger.error { "Error syncing connector '${connector.connectorId}'. Reason: $it" }
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions src/main/resources/config/application-dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ client:
base-path: documents
credential-api:
base-path: client-credentials
retry-template:
retries: 1
customer:
- connector-id: 1
content-type: application/forumdatenaustausch+xml;charset=UTF-8
Expand Down

0 comments on commit 46921c8

Please sign in to comment.