ASRMODEL

This repository implements an Automatic Speech Recognition Model. The model predicts the phonemes in a recording. An audio of a person saying a word passes through the network, which outputs the phonetic transcription. The model makes use of RNNs and Connection Temporal Classification dynamic programming algotithm to generate labels.
The model consists of the following components:

1-D CNN layer 1D CNNs capture the structural dependence between adjacent vectors in the input.
Bidirectional LSTM layers (Bi-LSTMS) to capture long-term contextual dependencies.
Pyramidal Bi-LSTMS (PBLSTMs) reduce the time resolution of the input by a factor of 2.

Usage

All scripts should be executed from the parent folder LAS.
Download and unzip the dataset into a data folder.
Add a wandb key to track the model using wandb. Use python main.py to run the model. Use the config.yaml file to change model parameters

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
notebook		notebook
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASRMODEL

Contents

src

config

Notebook

Usage

About

Releases

Packages

Languages

MusinguziDenis/ASRMODEL

Folders and files

Latest commit

History

Repository files navigation

ASRMODEL

Contents

src

config

Notebook

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages