Examples to implement OCR(Optical Character Recognition) using tesseract using Python
-
Install tesserct-ocr using this command:
- On Ubuntu
sudo apt-get install tesseract-ocr
- On Mac
brew install tesseract
- On Windows, download installer from here
- On Ubuntu
-
Install python binding for tesseract, pytesseract, using this pip command:
pip install pytesseract
-
Install image processing library in python, pillow using this pip command:
pip install pillow
For working with pdf files:
-
Install imagemagick using this command:
- On Ubuntu
sudo apt-get install imagemagick
- For other platforms, download installer from here
- On Ubuntu
-
Install python binding for imagemagick, wand, using this pip command:
pip install wand