This is a term project from my undergraduation. It does not exhibit my current programming skills
TEXTUAL VIDEO TO SPEECH INTERFACE
Objective is the development of an interface to textual information for the visually impaired that uses video, image processing, optical-character-recognition (OCR) and text-to-speech (TTS). The video provides a sequence of low resolution images in which text must be detected, rectified and converted into high resolution rectangular blocks that are capable of being analyzed via off-the-shelf OCR. To achieve this, various problems related to feature detection, mosaicing, binarization, and systems integration were solved in the development of the system.
For getting the image sequences, frames are cut at regular interval from the video, then pre-processes that image to get a clearer image. After that, using image stiching tool of OpenCV Python
, a single image of the whole text is generated. Thereafter, that image is given to the OCR (Tesseract)
, which further give it’s output to the Google Text To Speech engine (gTTS)
to make a final audio speech output.
- Programming Language : Python (OpenCV)
- User Interface : Python Flask, HTML
- Optical Character Recognition : Pytesseract
- Text to Speech : gTTS
An example of input and output can be found here.
Terminal python app.py Browser http://127.0.0.3:5000/ or localhost:5000
Note : 5000 is Port Number used in Code. If not working, change it in app.py
- app.py Main python code that implements the features
- index.html First Screen of the Software
- index1.html Webpage that shows the final image, text & mp3
- python
- pip
- OpenCV
- Pyttesseract
- gTTS
- Pillow
- imutils
- numpy
- matplotlib
- flask
- werkzeug
- Install os package :
pip install os
- Install logging package :
pip install logging
- Browse .mp4 video and type of text
- Press upload button
- Wait for process to complete
- Output shown on screen
- Output saved in 'static/video_name.mp4/' folder