Skip to content

Commit

Permalink
minor edits
Browse files Browse the repository at this point in the history
  • Loading branch information
arinkulshi-skylight committed Jan 9, 2025
1 parent 3c36b62 commit 39ff224
Show file tree
Hide file tree
Showing 4 changed files with 121 additions and 76 deletions.
133 changes: 84 additions & 49 deletions OCR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,14 @@ The **OCR Layer** in the ReportVision project processes document images, perform
1. [Introduction](#introduction)
2. [Installation](#installation)
3. [Running the Application](#running-the-application)
4. [Testing](#testing)
5. [End-to-End Benchmarking](#end-to-end-benchmarking)
6. [Dockerized Development](#dockerized-development)
7. [Development Tools](#development-tools)
8. [Contributing](#contributing)
4. [Development Tools](#development-tools)
5. [Testing](#testing)
6. [End-to-End Benchmarking](#end-to-end-benchmarking)
7. [Dockerized Development](#dockerized-development)
8. [Benchmarking](#end-to-end-benchmarking)
9. [Project Architecture](#project-architecture)
10. [API Endpoints](#api-endpoints)


---

Expand All @@ -23,8 +26,6 @@ The OCR layer uses **Poetry** for dependency management and virtual environment
- Support for benchmarking OCR accuracy.
- Configuration for different OCR models and segmentation templates.



### Installation

### Prerequisites
Expand Down Expand Up @@ -56,46 +57,6 @@ Run unit tests
poetry run pytest
```

### End to End Benchmarking


#### Overview
End-to-end benchmarking evaluates OCR accuracy by:

End-to-end benchmarking scripts can:

1. Segment and run OCR on a folder of images using given segmentation template and labels file.
2. Compare OCR outputs to ground truth data based on matching file names.
3. Write metrics (confidence, raw distance, Hamming distance, Levenshtein distance) as well as total metrics to a CSV file.


To run benchmarking:

1. Locate file `benchmark_main.py`
2. Ensure all the paths/folders exist by downloading from [Google Drive for all segmentation/label files](https://drive.google.com/drive/folders/1WS2FYn0BTxWv0juh7lblzdMaFlI7zbDd?usp=sharing)
3. Ensure `ground_truth` folder and files exist
4. Ensure `labels.json` is in the correct format (see `tax_form_segmented_labels.json` as an example)
5. When running make sure to pass arguments in this order:

* `/path/to/image/folder` (path to the original image files which we need to run OCR on)
* `/path/to/segmentation_template.png` (single file)
* `/path/to/labels.json` (single file)
* `/path/to/output/folder` (path to folder where the output would be. This should exist but can be empty)
* `/path/to/ground/truth_folder` (path to folder for metrics that we would compare against)
* `/path/to/csv_out_folder` (path to folder where all metrics would be. This should exist but can be empty)

By default, segmentation, OCR, and metrics computation are all run together. To disable one or the other, pass the `--no-ocr` or `--no-metrics` flags. You can change the backend model by passing `--model=...` as well.

Run notes:
* Benchmark takes one second per segment for OCR using the default `trocr` model. Please be patient or set a counter to limit the number of files processed.
* Only one segment can be input at a time


### Test Data Sets

You can run the script `pytest run reportvision-dataset-1/medical_report_import.py` to pull in all relevant data.


### Development Tools

Adding new dependencies
Expand Down Expand Up @@ -147,8 +108,6 @@ To build the OCR service into an executable artifact
poetry run build
```



### Dockerized Development

It is also possible to run the project in a collection of docker containers. This is useful for development and testing purposes as it doesn't require any additional dependencies to be installed.
Expand All @@ -169,6 +128,45 @@ The frontend container will automatically reload when changes are made to the fr
The OCR service container will restart automatically when changes are made to the OCR code. To access the API, navigate to http://localhost:8000/ in your browser.


### End to End Benchmarking

#### Overview
End-to-end benchmarking evaluates OCR accuracy by:

End-to-end benchmarking scripts can:

1. Segment and run OCR on a folder of images using given segmentation template and labels file.
2. Compare OCR outputs to ground truth data based on matching file names.
3. Write metrics (confidence, raw distance, Hamming distance, Levenshtein distance) as well as total metrics to a CSV file.


To run benchmarking:

1. Locate file `benchmark_main.py`
2. Ensure all the paths/folders exist by downloading from [Google Drive for all segmentation/label files](https://drive.google.com/drive/folders/1WS2FYn0BTxWv0juh7lblzdMaFlI7zbDd?usp=sharing)
3. Ensure `ground_truth` folder and files exist
4. Ensure `labels.json` is in the correct format (see `tax_form_segmented_labels.json` as an example)
5. When running make sure to pass arguments in this order:

* `/path/to/image/folder` (path to the original image files which we need to run OCR on)
* `/path/to/segmentation_template.png` (single file)
* `/path/to/labels.json` (single file)
* `/path/to/output/folder` (path to folder where the output would be. This should exist but can be empty)
* `/path/to/ground/truth_folder` (path to folder for metrics that we would compare against)
* `/path/to/csv_out_folder` (path to folder where all metrics would be. This should exist but can be empty)

By default, segmentation, OCR, and metrics computation are all run together. To disable one or the other, pass the `--no-ocr` or `--no-metrics` flags. You can change the backend model by passing `--model=...` as well.

Run notes:
* Benchmark takes one second per segment for OCR using the default `trocr` model. Please be patient or set a counter to limit the number of files processed.
* Only one segment can be input at a time


### Test Data Sets

You can run the script `pytest run reportvision-dataset-1/medical_report_import.py` to pull in all relevant data.


## Project Architecture

The OCR Layer is organized as follows:
Expand Down Expand Up @@ -199,3 +197,40 @@ The OCR Layer is organized as follows:

- **`poetry.lock`**: Lock file generated by Poetry to ensure dependency consistency.

## API Endpoints

The OCR service exposes the following API endpoints:

#### Health Check
- **`GET /`**
- **Description**: Returns the status of the OCR service.
- **Response**: Status message indicating the service's health.

#### Image Alignment
- **`POST /image_alignment/`**
- **Description**: Aligns a source image with a segmentation template.
- **Request Body**:
- `source_image` (Base64-encoded string): The source image to align.
- `segmentation_template` (Base64-encoded string): The segmentation template to align with.
- **Response**:
- Base64-encoded string of the aligned image.

#### Image File to Text
- **`POST /image_file_to_text/`**
- **Description**: Processes an image file and a segmentation template to extract text based on labeled regions.
- **Request Body**:
- `source_image` (file): The uploaded source image file.
- `segmentation_template` (file): The uploaded segmentation template file.
- `labels` (JSON string): Defines labeled regions in the segmentation template.
- **Response**:
- JSON object containing text extracted from labeled regions.

#### Image to Text
- **`POST /image_to_text`**
- **Description**: Processes Base64-encoded images and extracts text from labeled regions.
- **Request Body**:
- `source_image` (Base64-encoded string): The source image.
- `segmentation_template` (Base64-encoded string): The segmentation template.
- `labels` (JSON string): Defines labeled regions in the segmentation template.
- **Response**:
- JSON object containing text extracted from labeled regions.
10 changes: 5 additions & 5 deletions backend/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Backend Middleware - Spring Boot Application

This document provides a guide for the **Backend Middleware** of the ReportVision project. This middleware bridges the **frontend React app** with the **OCR backend**
This document provides a guide for the **Backend Middleware** of the ReportVision project. This middleware bridges the **frontend app** with the **OCR backend**

---

Expand All @@ -18,7 +18,7 @@ This document provides a guide for the **Backend Middleware** of the ReportVisio

The backend of ReportVision is a **Spring Boot** application designed to:
- Serve as middleware connecting the frontend with OCR.
- Manage template storage
- Manage storage of template in the DB
- Act as a middle layer to pass data for OCR extraction


Expand Down Expand Up @@ -57,15 +57,15 @@ docker exec -it <CONTAINER_ID> /bin/bash

## Project Architecture

The backend is organized into the following key directories and files:
The backend is organized into the following directories and files:

- **`src/main/java/gov/cdc/reportvision/`**:
- **`controllers/`**: handle API requests from the frontend.
- **`services/`**: service layer for managing templates, data extraction, and interactions with the OCR backend.
- **`models/`**: Data models representing application entities
- **`repositories/`**: Interfaces for database operations,
- **`config/`**: Configuration files for security, database connections, and CORS policies.
- **`utils/`**: Utility classes for tasks like validation, logging, and file manipulation.
- **`utils/`**: Utility classes for validation, logging, and file manipulation.
- **`src/test/`**: Includes unit and integration tests for the backend.
- **`Dockerfile`**: Docker configuration file for containerizing the application.

Expand Down Expand Up @@ -104,7 +104,7 @@ The backend middleware exposes the following RESTful API endpoints:
#### Health Check
- **`GET /api/health`**
- **Description**: Returns the status of the backend server.
- **Response**: A status message indicating the server's health.
- **Response**: Status message indicating the server's health.
#### Template Management
- **`POST /api/templates`**
Expand Down
16 changes: 7 additions & 9 deletions frontend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,9 @@ Welcome to the **Frontend React App** for the ReportVision project. This guide p
## Table of Contents
1. [Introduction](#introduction)
2. [Setup and Installation](#setup-and-installation)
3. [Development Workflow](#development-workflow)
4. [Testing and E2E Commands](#testing-and-e2e-commands)
5. [Frontend Architecture](#project-architecture)
8. [Troubleshooting](#troubleshooting)
3. [Testing](#testing)
4. [Frontend Architecture](#project-architecture)
5. [Troubleshooting](#troubleshooting)



Expand All @@ -33,7 +32,6 @@ Make sure you have the following installed on your machine:
```shell
git clone https://github.com/CDCgov/ReportVision.git
cd ReportVision/frontend

2. Install Dependencies:

```shell
Expand All @@ -52,7 +50,7 @@ npm run dev
npm run tests
```

### Testing and E2E Commands
### Testing


Runs the end-to-end tests.
Expand All @@ -67,7 +65,7 @@ Starts the interactive UI mode.
npx playwright test --ui
```

Runs the tests only on Desktop Chrome.
Runs the tests only on Chrome.

```shell
npx playwright test --project=chromium
Expand All @@ -93,7 +91,7 @@ npx playwright codegen

#### Fast Refresh

Currently, two official plugins are available:
Currently, two plugins are available:

- [@vitejs/plugin-react](https://github.com/vitejs/vite-plugin-react/blob/main/packages/plugin-react/README.md) uses [Babel](https://babeljs.io/) for Fast Refresh
- [@vitejs/plugin-react-swc](https://github.com/vitejs/vite-plugin-react-swc) uses [SWC](https://swc.rs/) for Fast Refresh
Expand All @@ -110,7 +108,7 @@ Currently, two official plugins are available:

### Description of Key Directories and Files in the frontend:
- **`public/`**: Holds public static files like images, logos, and `index.html`. These files are directly served by the development and production servers.
- **`src/`**: Contains the core application code, including React components, pages, styles, and utilities.
- **`src/`**: Contains the application code, including React components, pages, styles, and utilities.
- **`components/`**: Houses UI components.
- **`pages/`**: Organizes page-level components corresponding to application routes.
- **`styles/`**: Includes global and component-specific styles.
Expand Down
38 changes: 25 additions & 13 deletions user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ ReportVision is a tool that automates the reading and extracting of labs from PD
2. Extract Data based on selected annotations
3. Conversion of Extracted Data to PDF's

Please see "how to" instructions in order to understand features of the Application in more detail.

### Getting Started

#### Prerequisites

1. [Python3.8](https://www.python.org/downloads/)
2. [Node23.1](https://nodejs.org/en/download)
3. [Tesseract5.5](https://formulae.brew.sh/formula/tesseract) (brew install tesseract)
1. [Python 3.8](https://www.python.org/downloads/)
2. [Node 23.1](https://nodejs.org/en/download)
3. [Tesseract 5.5](https://formulae.brew.sh/formula/tesseract) (brew install tesseract)
4. [Java21](https://www.oracle.com/java/technologies/downloads/)
5. [PostgreSQL](https://www.postgresql.org/)
6. [Docker](https://www.docker.com/) (required for DB and middleware set up)
Expand All @@ -30,21 +32,31 @@ ReportVision is a tool that automates the reading and extracting of labs from PD

![](arcdiagram.png)

A React-based Single Page Application: This serves as the front-end user interface for the application.
The **ReportVision** application is composed of the following core components:

## Components

### 1. **React-Based Single Page Application (SPA)**
- **Purpose**: Serves as the user interface for the application.

### 2. **ReportVision Middleware**
- **Purpose**: Acts as middleware to handle communication between the UI, OCR API, and data storage.

### 3. **OCR API**
- **Purpose**: Performs Optical Character Recognition (OCR) on provided input.

### 4. **Data Storage (PostgreSQL)**
- **Purpose**: Stores saved templates and extracted data.

ReportVision Middleware: Acts as middleware to handle communication between the UI, OCR API, and data storage.
Responsible for coordinating requests, processing logic, and integrating with other components.

OCR API: Runs the Optical Character Recognition (OCR) process.
Receives data from the backend, performs OCR on the provided input, and returns the extracted information to the backend.
## Infrastructure and Cloud Components

Data Storage (Postgres):A managed database for data persistence.
Stores data processed by the backend and results generated by the OCR API.
Handles both structured and unstructured data related to the application.
### Hosting
- The application is hosted in **Azure**

### Infrastructure aod Cloud Components
### Infrastructure Guide
- For detailed information on how the application is deployed and managed in Azure, refer to our [Infrastructure Guide](./infrastructure/README.md).

The application is hosted in Azure. Please see our infrastructure guide here to learn more



Expand Down

0 comments on commit 39ff224

Please sign in to comment.