architectures, such as transformer-based models,
which have shown promise in handling complex
audio signals. Additionally, exploring data
augmentation techniques or fine-tuning YAMNet on
domain-specific urban sound datasets could further
enhance the model's capacity of distinguishing
between classes that are acoustically quite similar.
REFERENCES
Nogueira, A. F. R., Oliveira, H. S., Machado, J. J. M., &
Tavares, J. M. R. S. (2022). Sound classification and
processing of urban environments: A systematic
literature review. Sensors, 22(22), 8608.
https://doi.org/10.3390/s22228608
Salamon, J., & Bello, J. P. (2015). Unsupervised feature
learning for urban sound classification. In 2015 IEEE
International Conference on Acoustics, Speech and
Signal Processing (ICASSP) (pp. 171-175). IEEE.
https://doi.org/10.1109/ICASSP.2015.7177954
Heittola, T., Mesaros, A., Eronen, A. J., & Virtanen, T.
(2013). Context-dependent sound event detection.
EURASIP Journal on Audio, Speech, and Music
Processing, 2013(1), 1-15.
https://doi.org/10.1186/1687-4722-2013-1
Cakir, E., Heittola, T., Huttunen, H., & Virtanen, T. (2015).
Polyphonic sound event detection using multi-label
deep neural networks. In 2015 International Joint
Conference on Neural Networks (IJCNN) (pp. 1-7).
IEEE. https://doi.org/10.1109/IJCNN.2015.7280518
Mesaros, A., Heittola, T., & Virtanen, T. (2016). Metrics for
polyphonic sound event detection. Applied Sciences,
6(6), 162. https://doi.org/10.3390/app6060162
Parascandolo, G., Huttunen, H., & Virtanen, T. (2016).
Recurrent neural networks for polyphonic sound event
detection in real life recordings. arXiv preprint
arXiv:1604.00861.
https://doi.org/10.48550/arXiv.1604.00861
Çakır, E., Parascandolo, G., Heittola, T., Huttunen, H., &
Virtanen, T. (2017). Convolutional recurrent neural
networks for polyphonic sound event detection.
IEEE/ACM Transactions on Audio, Speech, and
Language Processing, 25(6), 1291-1303.
https://doi.org/10.1109/TASLP.2017.2690575
Xu, Y., Kong, Q., Wang, W., & Plumbley, M. D. (2017).
Large-scale weakly supervised audio classification
using gated convolutional neural network. arXiv
preprint arXiv:1705.02304.
https://doi.org/10.48550/arXiv.1705.02304
Adavanne, S., Politis, A., Nikunen, J., & Virtanen, T. (2018).
Sound event localization and detection of overlapping
sources using convolutional recurrent neural networks.
IEEE Journal of Selected Topics in Signal Processing,
13(1), 34-48.
https://doi.org/10.1109/JSTSP.2018.2885636
Turpault, N., Serizel, R., Shah, A., & Salamon, J. (2019).
Sound event detection in domestic environments with
weakly labeled data and soundscape synthesis. In
Proceedings of the Detection and Classification of
Acoustic Scenes and Events 2019 Workshop
(DCASE2019) (pp. 253-
257).http://dcase.community/documents/challenge201
9/technical_reports/DCASE2019_Turpault_42.pdf
Salamon, J., Jacoby, C., & Bello, J. P. (2014, November). A
dataset and taxonomy for urban sound research. In
Proceedings of the 22nd ACM international conference
on Multimedia (pp. 1041-1044).
Tena, A., Claria, F., & Solsona, F. (2022). Automated
detection of COVID-19 cough [Image of YAMNet
Body Architecture]. Biomedical Signal Processing and
Control, 71, 103175.
https://doi.org/10.1016/j.bspc.2021.103175
Xie, J., Chen, B., Gu, X., Liang, F., & Xu, X. (2019). Self-
attention-based BiLSTM model for short text fine-
grained sentiment classification. IEEE Access, 7,
180558-180570.