NeedleASR is an enhanced version of Needle (Necessary element of deep learning), a deep learning framework developed as part of CMU's course 10-714 Deep Learning Systems: Algorithms and Implementation [Link]. This framework is designed to support Automatic Speech Recognition (ASR) based on Connectionist Temporal Classification (CTC).
-
NDArray Backend:
NeedleASR abstracts the array data in
NDArray
(N-Dimensional Array), which is similar to Numpy'sndarray
.NDArray
supports array operations on both CPU and CUDA backends. TheNDArray
structure supports operations such as dimension permutation, reshaping, and broadcasting. These operations are facilitated by its fields—strides
,shape
, andoffset
—which enable efficient manipulation of array without changing its compact layout. In addition, theNDArray
backend also accelerates matrix multiplication through tiling and vectorization. -
Automatic Differentiation:
NeedleASR includes support for the
Tensor
data structure, which extendsNDArray
by integrating operations and automatic differentiation capabilities. TheTensor
in NeedleASR is designed to build and manage computational graphs dynamically, enabling the framework to automatically compute gradients along these graphs. -
Neural Network Modules:
NeedleASR provides comprehensive support for modules commonly used in deep neural networks, including:
-
Basic: Linear, Flatten, Residual, Dropout.
-
Activation: ReLU, Tanh, Sigmoid.
-
Normalization: Batch Normalization, Layer Normalization.
-
Convolution: 2D Convolution.
-
Sequence: Embedding, RNN Cell, RNN, LSTM Cell, LSTM.
-
Transformer: Transformer Encoder/Decoder, Multihead Attention Layer.
Beyond neural network layers, NeedleASR offers a range of features for model training:
-
Initialization: Xavier Uniform/Normal Initialization, Kaiming Uniform/Normal Initialization.
-
Optimization: Stochastic Gradient Descent, Adam.
-
Data: Dataset, Dataloader, Data Transforms.
-
-
CTC-Based ASR:
A key enhancement of NeedleASR over Needle is its support for CTC-based ASR with beam search decoding. NeedleASR supports CTC loss and its backpropagation, enabling align-free training between speech features and text transcripts. The CTC loss computation is implemented in the log domain to ensure numerical stability. Additionally, NeedleASR facilitates beam search decoding during inference stage.
We provide a detailed example in Report for training a CTC-based Transformer ASR system. To train deep learning models with NeedleASR, follow these steps:
-
Installtion:
Run the following commands to set up NeedleASR:
git clone https://github.com/Kelton8Z/needleASR.git pip3 install pybind11 Levenshtein librosa soundfile cd /content/needleASR rm -rf build/ make clean make
-
Create a Training Script:
We provide an example training script for ASR systems in
apps/train_DEBUG.py
. In this script, users should:- Set the Device and Backend:
NeedleASR supports multiple backends, including
NDArray
on CPU or CUDA GPU, as well asnumpy.ndarray
on CPU. The device can be defined as:needle.cpu()
forNDArray
on CPU,needle.cuda()
forNDArray
on GPU, orneedle.cpu_numpy()
for NumPy on CPU.
- Define the Dataset, DataLoader, and Model: Prepare the dataset and dataloader, and define the ASR model architecture.
- Define the Training and Evaluation Processes: Implement the training loop, evaluation logic, and necessary metrics for the model.
- Set the Device and Backend:
NeedleASR supports multiple backends, including
-
Visualization with TensorBoard:
The example script in
apps/train_DEBUG.py
also includes support for visualizing results using TensorBoard, making it easy to track training progress and performance metrics.
If you want to add more features based on NeedleASR, there are a few notes you should remind:
-
Iterating Over
NDArray
:Direct iteration over an
NDArray
is not supported:# This will not work: for a in A: # A: NDArray ...
Use iteration with an index range instead:
# Supported: for i in range(len(A)): # A: NDArray ...
-
Shape Requirements for New
NDArray
:The
shape
must always be a tuple, even for a 1D vector.# Correct: extended_symbols = array_api.full((2 * len(target) + 1,), self.blank, device=target.device) # Note the comma # Incorrect: array_api.full(2 * len(target) + 1, self.blank, device=target.device) # Missing tuple
-
Behavior of
NDArray.__getitem__
:The
__getitem__
operation creates anNDArray
with the same number of dimensions as the original.A = NDArray(shape=(3, 4)) sliced = A[1, :3] # Resulting shape: (1, 3), not (3,)
-
Compact Operation Restrictions:
The
compact()
operation only supports positive offsets. -
Comparing
NDArray
with Scalars:Direct comparison between an
NDArray
and a scalar is not supported:a = NDArray([1.0]) if a == 1.0: # This will not work ...
To compare, convert the
NDArray
to a scalar usingnumpy()
:if float(a.numpy()) == 1.0: # Supported ...
The implementation of Needle was started from the assignments designed by Professor Tianqi Chen and Zico Kolter as a part of course 10-714 Deep Learning Systems: Algorithms and Implementation [Link].
Qingzheng Wang provided the implementation of Needle and also collaborated with Kelton Zhang on the implementation of features for CTC-based ASR.
Qingzheng Wang is with CMU Information Networking Institute, and Kelton Zhang is with CMU Department of Electrical and Computer Engineering.