An Improved Acoustic Scene Classification Method Using Convolutional Neural Networks (CNNs)
Predicting acoustic environment by analyzing and classifying sound recording of the scene is an emerging research area. This paper presents and compares different acoustic scene classification (ASC) methods to differentiate between different acoustic environments. In particular, two deep learning techniques of classifica-tion i.e. Deep Neural Network (DNN) and Convolution Neural Network (CNN) have been applied using a combination of Mel-Frequency Cepstral Coefficients (MFCCs) and Log Mel energies as features. DNN and CNN are state-of-the-art techniques which are being used widely in speech recognition, computer vision, and natural language processing applications. These techniques have recently achieved great success in the field of audio classification for various applications. Both techniques have been implemented and tuned by performing a variety of experiments with different hyper parameters, hidden layers and units on public benchmark datasets provided in the DCASE 2017 challenge. The proposed method uses frame level randomization of the combined acoustic features i.e. MFCC and log mel energy, for training of model to achieve higher accuracy with DNN and CNN. It has reported higher accuracy than the previous work done on public benchmark datasets provided in the DCASE 2017 challenge. It is observed that DNN achieved 83.45% and CNN achieved 83.65% accuracy that is higher than the previous work done on public benchmark datasets provided in the DCASE 2017 challenge.
Battaglino, Daniele, Ludovick Lepauloux, Nicholas Evans, France Mougins, and France Biot. Acoustic scene classification using convolutional neural networks. DCASE2016 Challenge, Tech. Rep, 2016.
A. J. Eronenet al., “Audio-based context recognition,”IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 321–329, Jan. 2006.
S. Chu, S. Narayanan, C.-C. J. Kuo, and M. J. Mataric, “Where am I? Scene recognition for mobile robots using audio features,” inProc. IEEE Int. Conf. Multimedia Expo, 2006, pp. 885–888.
R. Radhakrishnan, A. Divakaran, and P. Smaragdis, “Audio analysis for surveillance applications,” inProc. IEEE Workshop Appl. Signal Process. Audio Acoust.,2005, pp. 158–161.
T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, “Con-text-dependent sound event detection,” EURASIP J. Audio, Speech, Music Process., vol. 2013, 2013, Art. no. 1.
J. Schmidhuber, “Deep learning in neural networks: An overview,” CoRR, vol. abs/1404.7828, 2014.
Hertel, Lars, Huy Phan, and Alfred Mertins. "Classifying variable-length audio files with all-convolutional networks and masked global pooling." arXiv preprint arXiv:1607.02857 (2016).
B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music signal analysis in python,” in Proceedings of the 14th Python in Sci-ence Conference, 2015.
Mafra, Gustavo, Ngoc Duong, Alexey Ozerov, and Patrick Pérez. "Acoustic scene classification: An evaluation of an ex-tremely compact feature representation." In Detection and Clas-sification of Acoustic Scenes and Events 2016. 2016.
Takahashi, Gen, Takeshi Yamada, Shoji Makino, and Nobutaka Ono. "Acoustic scene classification using deep neural network and frame-concatenated acoustic feature." Detection and Classification of Acoustic Scenes and Events (2016).
Xu, Yong, Qiang Huang, Wenwu Wang, and Mark D. Plumbley. "Hierarchical learning for DNN-based acoustic scene classification." arXiv preprint arXiv:1607.03682 (2016).
Patiyal, Rohit, and Padmanabhan Rajan. "Acoustic Scene Classification Using Deep Learning."
Kong, Qiuqiang, Iwnoa Sobieraj, Wenwu Wang, and Mark D. Plumbley. "Deep neural network baseline for DCASE chal-lenge 2016." Proceedings of DCASE 2016 (2016).
Mun, Seongkyu, Sangwook Park, Younglo Lee, and Hanseok Ko. Deep Neural Network Bottleneck Feature for Acoustic Scene Classification. DCASE2016 challenge technical report, 2016.
Santoso, Andri, Chien-Yao Wang, and Jia-Ching Wang. "Acoustic Scene Classification Using Network-In-Network Based Convolutional Neural Network."
Lidy, Thomas, and Alexander Schindler. "CQT-based con-volutional neural networks for audio scene classification and domestic audio tagging." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016), Budapest, Hungary, Tech. Rep (2016).
Phan, Huy, Lars Hertel, Marco Maass, Philipp Koch, and Alfred Mertins. "CNN-LTE: a Class of 1-X Pooling Convolu-tional Neural Networks on Label Tree Embeddings for Audio Scene Recognition." arXiv preprint arXiv:1607.02303 (2016).
Eghbal-Zadeh, Hamid, et al. "CP-JKU submissions for DCASE-2016: A hybrid approach using binaural i-vectors and deep convolutional neural networks." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016).
Han, Yoonchang, and Kyogu Lee. Convolutional neural network with multiple-width frequency-delta data augmentation for acoustic scene classification. DCASE2016 Challenge, Tech. Rep, 2016.
Valenti, Michele, Aleksandr Diment, Giambattista Parascandolo, Stefano Squartini, and Tuomas Virtanen. "DCASE 2016 acoustic scene classification using convolutional neural networks." In Proc. Workshop Detection Classif. Acoust. Scenes Events, pp. 95-99. 2016.
Kim, Jaehun, and Kyogu Lee. "Empirical study on ensemble method of deep neural networks for acoustic scene classification." Proc. of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016).
Bae, S.H., Choi, I. and Kim, N.S., 2016, September. Acoustic scene classification using parallel combination of LSTM and CNN. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016).
J. Schlter and S. Bck, “Improved musical onset detection with convolutional neural networks,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2014, pp. 6979–6983.
O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NNHMM model for speech recognition,” in 2012 IEEE international conference on Acoustics, speech and signal processing (ICASSP). IEEE, 2012, pp. 4277–4280.
H. Phan, L. Hertel, M. Maass, and A. Mertins, “Robust audio event recognition with 1-max pooling convolutional neural networks,” arXiv preprint arXiv:1604.06338, 2016.
K. J. Piczak, “Environmental sound classification with convolutional neural networks,” in 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Sept 2015, pp. 1–6
Bisot, Victor, Romain Serizel, Slim Essid, and Gael Richard. "Supervised nonnegative matrix factorization for acoustic scene classification." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016).
Park, Sangwook, Seongkyu Mun, Younglo Lee, and Hanseok Ko. "Score fusion of classification systems for acoustic scene classification." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016).
Marchi, Erik, Dario Tonelli, Xinzhou Xu, Fabien Ringeval, Jun Deng, and B. Schuller. "The up system for the 2016 DCASE challenge using deep recurrent neural network and multiscale kernel subspace learning." IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) (2016).
- There are currently no refbacks.