
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 776–780.
Gong, Y., Chung, Y.-A., and Glass, J. (2021). Ast: Audio
spectrogram transformer.
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang,
S., Wang, L., and Chen, W. (2021). Lora: Low-rank
adaptation of large language models.
HuggingFace (2024a). Peft documentation. Accessed:
2024-01-09.
HuggingFace (2024b). Transformers documentation. Ac-
cessed: 2024-01-09.
Jordal, I. (2024). Audiomentations documentation. Ac-
cessed: 2024-01-09.
K
¨
ummritz, S. (2024). The sound of surveillance: Enhanc-
ing machine learning-driven drone detection with ad-
vanced acoustic augmentation. Drones, 8(3).
Librosa (2024). Librosa documentation. Accessed: 2024-
01-09.
LightningAI (2024). Torchmetrics documentation. Ac-
cessed: 2024-01-09.
Liu, H., Tam, D., Muqeeth, M., Mohta, J., Huang,
T., Bansal, M., and Raffel, C. (2022). Few-shot
parameter-efficient fine-tuning is better and cheaper
than in-context learning.
Matplotlib (2024). Matplotlib documentation. Accessed:
2024-01-09.
MIT (2024). Ast finetuned audioset model. Accessed:
2024-01-09.
NumPy (2024). Numpy documentation. Accessed: 2024-
01-09.
Pangarkar, T. (2024). Drone analytics statistics 2024 — best
aerial technology. Accessed: 2025-01-01.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., K
¨
opf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library.
Piczak, K. J. (2015). Esc: Dataset for environmental sound
classification. In Proceedings of the 23rd ACM Inter-
national Conference on Multimedia, MM ’15, page
1015–1018, New York, NY, USA. Association for
Computing Machinery.
PyTorch (2024a). Pytorch documentation. Accessed: 2024-
01-09.
PyTorch (2024b). Torchaudio documentation. Accessed:
2024-01-09.
Qiu, Z., Liu, W., Feng, H., Xue, Y., Feng, Y., Liu, Z., Zhang,
D., Weller, A., and Sch
¨
olkopf, B. (2024). Controlling
text-to-image diffusion by orthogonal finetuning.
Schmitt, M. and Schuller, B. (2019). End-to-end audio
classification with small datasets – making it work.
In 2019 27th European Signal Processing Conference
(EUSIPCO), pages 1–5.
Scikitlearn (2024). Scikit-learn documentation. Accessed:
2024-01-09.
Telebot (2024). Telebot documentation. Accessed: 2024-
01-09.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2023). Attention is all you need.
Wang, M. (2023). A Large-Scale UAV Audio Dataset and
Audio-Based UAV Classification Using CNN. PhD
thesis, Purdue University.
Wang, Y., Chu, Z., Ku, I., Smith, E. C., and Matson, E. T.
(2022). A large-scale uav audio dataset and audio-
based uav classification using cnn. In 2022 Sixth
IEEE International Conference on Robotic Comput-
ing (IRC), pages 186–189.
Warden, P. (2018). Speech commands: A dataset for
limited-vocabulary speech recognition.
Weights&Biases (2024). Weights & biases documentation.
Accessed: 2024-01-09.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C.,
Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz,
M., Davison, J., Shleifer, S., von Platen, P., Ma, C.,
Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S.,
Drame, M., Lhoest, Q., and Rush, A. (2020). Trans-
formers: State-of-the-art natural language processing.
In Liu, Q. and Schlangen, D., editors, Proceedings of
the 2020 Conference on Empirical Methods in Nat-
ural Language Processing: System Demonstrations,
pages 38–45, Online. Association for Computational
Linguistics.
Xu, L., Xie, H., Qin, S.-Z. J., Tao, X., and Wang, F. L.
(2023). Parameter-efficient fine-tuning methods for
pretrained language models: A critical review and as-
sessment.
Zaman, K., Sah, M., Direkoglu, C., and Unoki, M. (2023).
A survey of audio classification using deep learning.
IEEE Access, 11:106620–106649.
Zhang, Q., Chen, M., Bukharin, A., Karampatziakis, N.,
He, P., Cheng, Y., Chen, W., and Zhao, T. (2023).
Adalora: Adaptive budget allocation for parameter-
efficient fine-tuning.
4,500 Seconds: Small Data Training Approaches for Deep UAV Audio Classification
71