5
CONCLUSION
In this paper, we propose a feature optimization
method called multi-scale channel feature fusion. It
involves extracting logmel features in both the time
and frequency domains for sound classification.
Subsequently, an attention mechanism is employed
to fuse the original features, time-domain features,
and frequency-domain features across channels,
enabling efficient classification of environmental
sounds. By comparing our method with current
state-of-the-art approaches in terms of accuracy and
parameter size, we provide a comprehensive
evaluation of the advantages and disadvantages of
our proposed method.
ACKNOWLEDGMENTS
This work was financially supported by Major
scientific and technological innovation Project of
Shandong Key R & D Plan "Smart and Healthy Air
Industry Project based on New Generation
Information Technology" , and Key R & D projects
of Shandong Province (2020JMRH0201).
REFERENCES
Temko, A., Nadeu, C. Acoustic Event Detection in
Meeting-Room Environments[J]. Pattern Recognition
Letters 2009, 30 (14), 1281–1288. https://doi.org/
10.1016/j.patrec.2009.06.009.
Gupta, S.; Karanath, A.; Mahrifa, K.; Dileep, A.;
Thenkanidiyoor, V. Segment-Level Probabilistic
Sequence Kernel and Segment-Level Pyramid Match
Kernel Based Extreme Learning Machine for
Classification of Varying Length Patterns of Speech[J].
International Journal of Speech Technology 2019, 22
(1), 231–249. https://doi.org/10.1007/s10772-018-0958
7-1.
Stowell, D.; Giannoulis, D.; Benetos, E.; Lagrange, M.;
Plumbley, M. Detection and Classification of Acoustic
Scenes and Events[J]. IEEE Transactions On
Multimedia 2015, 17 (10), 1733–1746.
https://doi.org/10.1109/TMM.2015.2428998.
Piczak, K.; ACM. ESC: Dataset for Environmental Sound
Classification[C]; 2015; pp 1015–1018.
https://doi.org/ 10.1145/2733373.2806390.
Piczak KJ. Environmental Sound Classification With
Convolutional Neural Networks.[C] In: Erdogmus D,
Akcakaya M, Kozat S, Larsen J, eds. 2015 IEEE
International Workshop on Machine Learning for
Signal Processing. IEEE International Workshop on
Machine Learning for Signal Processing. IEEE Signal
Processing Soc; Northeast Univ; Intel; 2015.
Zhang, Z.; Xu, S.; Cao, S.; Zhang, S. Deep Convolutional
Neural Network with Mixup for Environmental Sound
Classification[J]; Pattern Recognition and Computer
Vision, PT II, Eds.; 2018; Vol. 11257, pp 356–367.
https://doi.org/10.1007/978-3-030-03335-4_31.
Dai W, Dai C, Qu S, et al. Very Deep Convolutional
Neural Networks for Raw Waveforms[C], IEEE
International Conference on Acoustics, 2017,421-425.
Su, Y.; Zhang, K.; Wang, J.; Zhou, D.; Madani, K.
Performance Analysis of Multiple Aggregated
Acoustic Features for Environment Sound
Classification[J]. Applied Acoustics 2020, 158.
https://doi.org/10.1016/j.apacoust.2019.107050.
Mushtaq, Z.; Su, S. Environmental Sound Classification
Using a Regularized Deep Convolutional Neural
Network with Data Augmentation[J]. Applied
Acoustics 2020, 167. https://doi.org/10.1016/
j.apacoust.2020.107389.
Zhang, Z.; Xu, S.; Zhang, S.; Qiao, T.; Cao, S. Attention
Based Convolutional Recurrent Neural Network for
Environmental Sound Classification[J].
Neurocomputing 2021, 453, 896–903. https://doi.org/
10.1016/j.neucom.2020.08.069.
Mushtaq, Z.; Su, S.; Tran, Q. Spectral Images Based
Environmental Sound Classification Using CNN with
Meaningful Data Augmentation[J]. Applied Acoustics
2021, 172. https://doi.org/10.1016/j.apacoust.
2020.107581.
Luz, J.; Oliveira, M.; Araujo, F.; Magalhaes, D. Ensemble
of Handcrafted and Deep Features for Urban Sound
Classification[J]. Applied Acoustics 2021, 175.
https://doi.org/10.1016/j.apacoust.2020.107819.
Inik, O. CNN Hyper-Parameter Optimization for
Environmental Sound Classification[J]. Applied
Acoustics 2023, 202. https://doi.org/10.1016/
j.apacoust.2022.109168.
ANIT 2023 - The International Seminar on Artificial Intelligence, Networking and Information Technology