GitHub - deepshig/Textual-Video-to-Speech-Interface: An interface to extract text from a video and convert it to speech

This is a term project from my undergraduation. It does not exhibit my current programming skills

				TEXTUAL VIDEO TO SPEECH INTERFACE

Objective

Objective is the development of an interface to textual information for the visually impaired that uses video, image processing, optical-character-recognition (OCR) and text-to-speech (TTS). The video provides a sequence of low resolution images in which text must be detected, rectified and converted into high resolution rectangular blocks that are capable of being analyzed via off-the-shelf OCR. To achieve this, various problems related to feature detection, mosaicing, binarization, and systems integration were solved in the development of the system.

For getting the image sequences, frames are cut at regular interval from the video, then pre-processes that image to get a clearer image. After that, using image stiching tool of OpenCV Python, a single image of the whole text is generated. Thereafter, that image is given to the OCR (Tesseract), which further give it’s output to the Google Text To Speech engine (gTTS) to make a final audio speech output.

Project Description

Application Flow

Use Case Diagram

Sequence Diagram

Activity Diagram

Technologies Used

Programming Language : Python (OpenCV)
User Interface : Python Flask, HTML
Optical Character Recognition : Pytesseract
Text to Speech : gTTS

An example of input and output can be found here.

I. Code Compile & Run

Terminal python app.py Browser http://127.0.0.3:5000/ or localhost:5000

Note : 5000 is Port Number used in Code. If not working, change it in app.py

II. File list

app.py Main python code that implements the features
index.html First Screen of the Software
index1.html Webpage that shows the final image, text & mp3

III. Dependencies

python
pip
OpenCV
Pyttesseract
gTTS
Pillow
imutils
numpy
matplotlib
flask
werkzeug
Install os package : pip install os
Install logging package : pip install logging

II. Using the Software

Browse .mp4 video and type of text
Press upload button
Wait for process to complete
Output shown on screen
Output saved in 'static/video_name.mp4/' folder

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
templates		templates
uploads		uploads
README.md		README.md
app.py		app.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Objective

Project Description

Application Flow

Use Case Diagram

Sequence Diagram

Activity Diagram

Technologies Used

I. Code Compile & Run

II. File list

III. Dependencies

II. Using the Software

About

Releases

Packages

Languages

deepshig/Textual-Video-to-Speech-Interface

Folders and files

Latest commit

History

Repository files navigation

Objective

Project Description

Application Flow

Use Case Diagram

Sequence Diagram

Activity Diagram

Technologies Used

I. Code Compile & Run

II. File list

III. Dependencies

II. Using the Software

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages