OCR-tesseract-SE

OCR program based on Pytesseract - a wrapper for Tesseract. It includes language models to enhance the OCR performance.

Getting started

Install Tesseract
- For Mac users: brew install tesseract
- For Windows users: The latest installer can be downloaded from here.
- For Linux users: sudo apt install tesseract-ocr -y
Add tesseract path to system environment variable
Download language models here.
Google colab notebook

Usage

Run OCR_singleImage.ipynb to produce a searchable Pdf and OCR text as an output.
Run OCR_multipleImage.ipynb to process multiple images in batches.
Run OCR_textExtractor.ipynb to perform text detection using pytesseract.

For non-technical users

If you are from non-technical background, and would like to set up pytesseract on your computer from scratch, please refer to instructions here: Mac, Windows. The guide also includes instructions to set up python and virtual environment.

Acknowledgements

Our implementation is based on Google's tesseract OCR engine.

Contact

Ekta Vats ([email protected])
Centre for Digital Humanities
Uppsala University
Sweden

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Getting_started_JupyterLab.txt		Getting_started_JupyterLab.txt
How_to_install_tesseract_Mac.txt		How_to_install_tesseract_Mac.txt
How_to_install_tesseract_windows.txt		How_to_install_tesseract_windows.txt
OCRWorkshop.ipynb		OCRWorkshop.ipynb
OCR_multipleImages.ipynb		OCR_multipleImages.ipynb
OCR_singleImage.ipynb		OCR_singleImage.ipynb
OCR_textExtractor.ipynb		OCR_textExtractor.ipynb
README.md		README.md
fac_00168_arsberattelse_1939_sid-04.jpg		fac_00168_arsberattelse_1939_sid-04.jpg
ocr-tesseract-se_manual.pdf		ocr-tesseract-se_manual.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-tesseract-SE

Getting started

Usage

For non-technical users

Acknowledgements

Contact

About

Releases

Packages

Languages

biyanto/OCR-tesseract-SE

Folders and files

Latest commit

History

Repository files navigation

OCR-tesseract-SE

Getting started

Usage

For non-technical users

Acknowledgements

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages