This project implements a pipeline for detecting tables in images and recognizing their structure using the Table Transformer (TATR) model. It includes table detection, structure recognition, and OCR capabilities.
This tool utilizes state-of-the-art deep learning models to automate the process of extracting tabular data from images. It employs the Table Transformer (TATR) model for both table detection and structure recognition, followed by Optical Character Recognition (OCR) to extract the text content. This pipeline is particularly useful for digitizing printed documents, analyzing scanned reports, or processing image-based datasets containing tabular information.
The workflow consists of three main steps:
- Table Detection: Identifies and localizes tables within the input image.
- Structure Recognition: Analyzes the structure of detected tables, identifying rows, columns, and individual cells.
- OCR: Extracts the text content from each cell of the recognized table structure.
The results are visualized at each step and the final extracted data is saved in a convenient CSV format for further analysis or integration into other workflows.
- Table detection in images
- Table structure recognition
- Optical Character Recognition (OCR) for table contents
- Visualization of detected tables and recognized structures
- Export of table contents to CSV
- Python 3.7+
- CUDA-capable GPU (recommended for faster processing)
For a complete list of required packages, see requirements.txt
.
-
Clone this repository:
git clone https://github.com/yourusername/table-transformer-project.git cd table-transformer-project
-
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Prepare your input image(s) containing tables
-
Update the
image_path
andoutput_dir
variables in the script:image_path = "/path/to/your/image.png" # Replace with your image path output_dir = "/path/to/output/directory" # Replace with your desired output directory
-
Run the script:
python table_transformer_script.py
-
Check the output directory for results:
detected_tables.jpg
: Visualization of detected tablestable_1.jpg
,table_2.jpg
, etc.: Cropped images of individual tablestable_structure.jpg
: Visualization of recognized table structureoutput.csv
: Extracted table contents in CSV format
This project is licensed under the Apache License 2.O
- Microsoft for the Table Transformer model
- The EasyOCR team for their OCR engine
- Hugging Face for their Transformers library
Contributions are welcome! Please feel free to submit a Pull Request.