One-Shot Learning, Video-to-Audio Commentary System for Football and/or Soccer Games
Khushi Mahajan, Reshma Merin Thomas, Sheiley Patel, Anvitha Reddy Thupally, Bonaventure Chidube Molokwu
2025
Abstract
Automated real-time sports commentary poses a considerable problem at the convergence of Computer Vision and Natural Language Processing (NLP), especially in dynamic settings such as football. This research introduces a novel deep learning-based system for generating natural language commentary with synchronized audio output, detecting, tracking, and semantically interpreting football match events. For the purpose of object detection, our proposed system leverages the capabilities of YOLOv9 (You Look Only Once - version-9); for the maintenance of temporal identity - ByteTrack; and to map visual cues - a homography-based spatial transformer is used. A rule-based module using proximity and trajectory transition logic identifies possession, passes, duels, and goals. Commentary is synthesized by using a template-matching natural language generator. The Google Text-to-Speech (gTTS) engine renders it in an audible way. The fundamental problem that we address is Artificial Intelligence (AI) systems that are lacking in modularity and interpretability that can bridge visual perception with Natural Language Generation in sports broadcasting. Prior studies detected or classified through isolated Machine-Learning (ML) models yet our work proposes a framework that is unified explainable and real-time. This research has implications in accessible broadcasting as well as performance analytics. AI-powered sports media production also is being impacted.
DownloadPaper Citation
in Harvard Style
Mahajan K., Thomas R., Patel S., Thupally A. and Molokwu B. (2025). One-Shot Learning, Video-to-Audio Commentary System for Football and/or Soccer Games. In Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-772-6, SciTePress, pages 335-342. DOI: 10.5220/0013692000003985
in Bibtex Style
@conference{webist25,
author={Khushi Mahajan and Reshma Thomas and Sheiley Patel and Anvitha Thupally and Bonaventure Molokwu},
title={One-Shot Learning, Video-to-Audio Commentary System for Football and/or Soccer Games},
booktitle={Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2025},
pages={335-342},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013692000003985},
isbn={978-989-758-772-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - One-Shot Learning, Video-to-Audio Commentary System for Football and/or Soccer Games
SN - 978-989-758-772-6
AU - Mahajan K.
AU - Thomas R.
AU - Patel S.
AU - Thupally A.
AU - Molokwu B.
PY - 2025
SP - 335
EP - 342
DO - 10.5220/0013692000003985
PB - SciTePress