Abstract:
This paper evaluates the potential of convolutional neural networks in classifying short audio clips of environmental sounds. A deep model consisting of 2 convolutional layers with max-pooling and 2 fully connected layers is trained on a low level representation of audio data (segmented spectrograms) with deltas. The accuracy of the network is evaluated on 3 public datasets of environmental and urban recordings. The model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches.
Paper:
- Author version of the paper: Environmental Sound Classification with Convolutional Neural Networks.
Citing:
K. J. Piczak. Environmental Sound Classification with Convolutional Neural Networks. In Proceedings of the IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1-6, IEEE, 2015.
Supplementary materials:
Related work: