Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 337 Bytes

README.md

File metadata and controls

12 lines (7 loc) · 337 Bytes

PDF Processing - German Dissertations

Tried different approaches to text extraction from PDF files. Yolov5 trained on DocLayNet dataset was giving the best results.

Processing took 926.5365602970123 seconds, for 25 (minfied) PDF and 1068 Pages. -> ~ 0,87 Seconds/Page

pip install -r requirements.txt
python src/main.py