LiptoSpeech

Lip reading using End to End Sentence Level Model

Problem Statement:

Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction

Input : A Video file of a person speaking some word or phrase.
Output : The predicted word or phrase the person was speaking.

Dataset:

GRID-Corpus - http://spandh.dcs.shef.ac.uk/gridcorpus/
LRW - https://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrw1.html

Technologies and frameworks:

- Tensorflow 1.2.1
- Keras
- Opencv3
- python 3.6

Preprocess the dataset:

python Videoprocess.py id2_vcd_swwp2s.mpg

Dlib Predictor Model is used to landmark the facial points which can be found in predictor directory predictor/shape_predictor_68_face_landmarks.dat.bz2

MouthExtract folder contains the preprocessed dataset

Prediction:

python predict.py <path to the video>
Example: python predict.py PredictVideo/patrick.m4v

Important:

Please note that the video should be in 25 fps for the model to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LiptoSpeech

Problem Statement:

Dataset:

Technologies and frameworks:

Preprocess the dataset:

Prediction:

Important:

Files

README.md

Latest commit

History

README.md

File metadata and controls

LiptoSpeech

Problem Statement:

Dataset:

Technologies and frameworks:

Preprocess the dataset:

Prediction:

Important: