You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a masking layer is used for speech utterances of variable length, an input dimension mis-match error is thrown. The following is the edited model from the test_keras.py to reproduce the error.
I think the problem is that internally Keras tries to apply the mask to the loss which has no time dimension anymore (since CTC returns a shape of (batch, 1)). You could do a layer at the end of your network that removes the mask (a layer that returns None in compute_mask).
Alternatively, implement the CTC loss as a layer and use the mask to compute the activation sequence lengths while ignoring the mask as described above.
When a masking layer is used for speech utterances of variable length, an input dimension mis-match error is thrown. The following is the edited model from the test_keras.py to reproduce the error.
model = Sequential() model.add(Masking(mask_value=0., input_shape=(frame_len, nb_feat))) model.add(LSTM(inner_dim, return_sequences = True)) model.add(BatchNormalization()) model.add(TimeDistributed(Dense(nb_output)))
ValueError: GpuElemwise. Input dimension mis-match. Input 1 (indices start at 0) has shape[1] == 80, but the output's size on that axis is 16.
Please suggest how can a Masking layer be used when using CTC loss with Keras.
The text was updated successfully, but these errors were encountered: