This lip reader project utilizes a complex neural network to generate captions for videos based on lip movements. The trained model can infer and comprehend spoken language from lip movements in video frames. The lip reading system has been trained on a substantial dataset of videos with corresponding captions, enabling it to recognize and interpret lip movements accurately.
- Python 3.x
- TensorFlow (version 2.x)
- OpenCV
- NumPy
- Pandas
To use the lip reader model for generating captions on new videos:
- Clone this repository to your local machine.
- Prepare your video files and corresponding alignments (captions).
- Customize the
load_video()
andload_alignments()
functions in theload_data()
method to read your data correctly. - Update the path to your trained model in the
predict()
method. - Utilize the provided code to preprocess video frames, perform inference, and generate captions.
Contributions to this project are welcome! If you encounter any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.