-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Daniel Murygin <[email protected]>
- Loading branch information
Showing
1 changed file
with
141 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
## Malware Scanner | ||
|
||
This proof of concept shows the use of Spring Boot, Kafka and iText together to execute long-running tasks asynchronously with a REST API. The implemented service checks whether PDF files contain IBANs that are suspected of being used for money laundering. The implementation makes it possible to add other checks for the PDF files later. | ||
|
||
### Built With | ||
|
||
* [Java](https://en.wikipedia.org/wiki/Java_(programming_language)) | ||
* [Spring Boot](https://spring.io/projects/spring-boot) | ||
* [Apache Kafka](https://kafka.apache.org/) | ||
* [H2 Database](https://www.h2database.com/html/main.html) | ||
* [Docker](https://www.docker.com/) | ||
* [Apache Maven](https://maven.apache.org/) | ||
|
||
### Prerequisites | ||
|
||
Kafka runs with [Docker Compose](https://docs.docker.com/compose/), which is integrated into Spring Boot. A working Docker setup must therefore be available to start the project. Java 21 and Maven are also required. | ||
|
||
### Build & Run | ||
|
||
1. Clone the repo | ||
```sh | ||
git clone https://github.com/murygin/malware-scanner.git | ||
``` | ||
2. Compile | ||
```sh | ||
./mvnw clean compile | ||
``` | ||
3. Run | ||
```sh | ||
./mvnw spring-boot:run | ||
``` | ||
|
||
### IBAN Blacklist | ||
|
||
The blacklist with the suspicious IBANs is configured in the file [src/main/resources/application.properties](src/main/resources/application.properties). The property in the file is `iban.blacklist`. IBANs are separated by commas. | ||
|
||
```properties | ||
iban.blacklist=BG18RZBB91550123456789,FO9264600123456789,GB33BUKB20201555555555 | ||
``` | ||
|
||
### Usage | ||
|
||
The API provides an endpoint for starting the check of a PDF file and an endpoint for loading the result. If the service is started with `./mvnw spring-boot:run` the base url is http://localhost:8080. | ||
|
||
#### `POST /check/files` | ||
|
||
Starts the check of a PDF file. The PDFs are checked asynchronously. The result is not returned directly in the response. The response contains a confirmation of the start with the ID of the check. The response header `Location` contains the URL for loading the result. | ||
|
||
Request: | ||
```json | ||
{ | ||
"url": "http://localhost:9090/pdf-with-iban.pdf", | ||
"file-type": "pdf" | ||
} | ||
``` | ||
Response: | ||
- Status: `202 Accepted` | ||
- Header: `Location: /check/files/b3a5896f-387b-4363-a631-cfbf467db1ce` | ||
```json | ||
{ | ||
"state": "CREATED", | ||
"results": [], | ||
"id": "b3a5896f-387b-4363-a631-cfbf467db1ce" | ||
} | ||
``` | ||
|
||
#### `GET /check/files/<UUID>` | ||
|
||
Loads the result of checking a PDF file. The PDFs are checked asynchronously. If the check has not yet been started, the status `CREATED` is returned. If the check is currently running, the status `RUNNING` is returned. When the check is completed, the status `FINISHED` and a result is returned. | ||
|
||
Response: | ||
- Status: `200 OK` | ||
```json | ||
{ | ||
"state": "FINISHED", | ||
"results": [ | ||
{ | ||
"state": "SUSPICIOUS", | ||
"name": "money-laundering", | ||
"details": "Unique IBANs: 111, suspicious IBANs: 2" | ||
} | ||
], | ||
"id": "b3a5896f-387b-4363-a631-cfbf467db1ce" | ||
} | ||
``` | ||
|
||
|
||
## How it works | ||
|
||
The REST endpoint `POST /check/files` can be used to trigger a PDF file check. When the endpoint is called, the method `create` is called in the controller. The Spring Boot REST Controller [o.d.m.rest.MalwareScannerController](src/main/java/org/dm/malwarescanner/rest/MalwareScannerController.java) contains the methods that are executed when the endpoints are called. The controller is only a facade and passes the calls on to the [o.d.m.service.CheckJobService](src/main/java/org/dm/malwarescanner/service/CheckJobService.java). | ||
|
||
If a new check is requested, the controller calls the method `createCheckJob` in the `CheckJobService`. The check is not started directly. The check is only triggered by the Kafka event. This has the advantage that the caller of the REST endpoint is not blocked and has to wait, but receives a response immediately. This method `createCheckJob` in `CheckJobService` creates a [o.d.m.model.CheckJob](src/main/java/org/dm/malwarescanner/model/CheckJob.java) with the status `CREATED` and saves it in the database. A [o.d.m.model.CheckEvent](src/main/java/org/dm/malwarescanner/model/CheckEvent.java) is then sent to [event streaming platform Kafka](https://kafka.apache.org/). | ||
|
||
The `checkEvents` are consumed by the [o.d.m.kafka.kafkaKafkaTopicListener](src/main/java/org/dm/malwarescanner/kafka/KafkaTopicListener.java). After receiving the event, the `KafkaTopicListener` set the status of the `CheckJob` to `RUNNING` and starts a check by calling the `checkPDFFile` method in the [o.d.m.service.IBANCheckService](src/main/java/org/dm/malwarescanner/service/IBANCheckService.java). | ||
|
||
The `checkPDFFile` method in the `IBANCheckService` loads the PDF file via the URL first and finds all IBANs in the file. Afterward it checks whether the IBANs found are in the blacklist, which contains the IBANs suspected of being used for money laundering. An instance of the [o.d.m.service.IBANFinder](src/main/java/org/dm/malwarescanner/service/IBANFinder.java) is created to search for IBANs in the PDF file. After calling the `run` method, the `IBANFinder` collects all IBANs found in a `Set`. The [iText library](https://kb.itextpdf.com/) is used to check the PDF files. After the check in the `IBANCheckService` is completed, an [o.d.m.model.CheckResultEvent](src/main/java/org/dm/malwarescanner/model/CheckResultEvent.java) is sent to Kafka. The `CheckResultEvent` is consumed by the `KafkaTopicListener`. The `KafkaTopicListener` takes the result of the check from the event and saves it in the `CheckJob` The status of the job is set to `FINISHED`. Now the result of the job can be loaded from the client via the REST endpoint `GET /check/files/<UUID>`. | ||
|
||
## Improvements & Enhancements | ||
|
||
- The service should only be able to be used if a client is authenticated. | ||
- Loading arbitrary external resources during the runtime of an application is a major security risk. The URL of the PDF files that the client sends to the service must not be trusted. The URL must be checked before it is processed. Only data from selected hosts should be loaded. | ||
- The API should be documented with Spring SpringDoc, OpenAPI and Swagger. | ||
- Test coverage should be improved. Integration tests are to be implemented for the controller calls and the processing of Kafka events. | ||
- A load test needs to be written to test how the system performs when many requests have to be processed simultaneously. | ||
- Other check handlers can be added that consume Kafka events and check, for example, whether an IBAN actually exists. | ||
- Exception handling should be improved if an invalid request body is sent to the `POST /check/files` endpoint. | ||
- Exception handling when executing file checks should be improved if errors occur during execution. | ||
|
||
## Articles | ||
|
||
With the articles in this section you can learn more about frameworks and systems that are used in this application. | ||
|
||
|
||
**Kafka** | ||
- [Apache Kafka Quickstart](https://kafka.apache.org/quickstart) | ||
- [Run Kafka Streams Demo Application](https://kafka.apache.org/documentation/streams/quickstart) | ||
- [Is a Key Required as Part of Sending Messages to Kafka?](https://www.baeldung.com/java-kafka-message-key) | ||
- [What should I use as the key for my Kafka message?](https://forum.confluent.io/t/what-should-i-use-as-the-key-for-my-kafka-message/312/2) | ||
|
||
**API Design** | ||
- [REST API Design for Long-Running Tasks](https://restfulapi.net/rest-api-design-for-long-running-tasks/) | ||
|
||
**Spring Boot** | ||
- [Docker Compose Support in Spring Boot 3.1](https://spring.io/blog/2023/06/21/docker-compose-support-in-spring-boot-3-1) | ||
- [Getting started with Spring Boot 3, Kafka over docker with docker-compose.yaml](https://www.geeksforgeeks.org/getting-started-with-spring-boot-3-kafka-over-docker-with-docker-composeyaml/) | ||
- [Building REST services with Spring](https://spring.io/guides/tutorials/rest) | ||
- [Spring Boot With H2 Database](https://www.baeldung.com/spring-boot-h2-database) | ||
- [Building REST services with Spring](https://spring.io/guides/tutorials/rest) | ||
- [Getting started with unit testing in spring boot](https://medium.com/javarevisited/getting-started-with-unit-testing-in-spring-boot-bada732a5baa) | ||
|
||
**IBAN** | ||
- [Register of countries using the IBAN standard](https://www.iban.com/structure) | ||
- [IBAN Validation API V4 Documentation](https://www.iban.com/validation-api) | ||
- [IBAN Validation and Calculation - openiban](https://openiban.com/) | ||
- [Global IBAN regex](https://gist.github.com/akndmr/7ba7af0c07a3ec517c651bc6f1c508d5) | ||
|
||
## Contact | ||
|
||
Daniel Murygin - [linkedin.com/in/murygin](https://www.linkedin.com/in/murygin/) - [email protected] | ||
|
||
Project Link: [https://github.com/murygin/malware-scanner](https://github.com/murygin/malware-scanner) |