Beyond Sight: VQA for Car Parking Detection Using YOLOv8

Sphooti Kulkarni

, Anushree Bashetti

, Samarth Nasabi

and Kaushik Mallibhat

School of Electronics and Communication, KLE Technological University, Hubballi, India

Keywords:

VQA, YOLO, Object Detection.

Abstract:

Visual Question Answering (VQA), is an interesting application area of Artiﬁcial Intelligence that can enable

the machines to understand the image content and answer questions about the image. VQA integrates vision-

based techniques with the Natural Language Processing techniques. The VQA model uses the visual elements

of the image and information from the question to generate the best possible answer. The paper demonstrates

the use of state of art, object detection algorithm -You Only Look Once (YOLO) for the identiﬁcation of free

and occupied car slots in a car parking system. Existing VQA systems for parking often struggle with some

limitations including real-time application in dynamic and varying lighting conditions, night or bad weather,

and cannot handle user queries related to parking availability, thus impacting their overall usability and effec-

tiveness in practical applications. In this paper, a car parking VQA model has been designed where both image

and question of the user feed as an input to the proposed system. The image is captured by a camera installed

in the parking lot on a real time basis, and users select a question from the provided question menu. The sys-

tem provides a user-friendly, menu-based question-answering system that allows users to select questions of

interest and receive relevant responses based on the detected parking slot information. The proposed approach

utilizes a YOLOv8 model, trained on the annotated PKLot dataset to detect and count both parked and vacant

slots in real-time. This detection is integrated with a menu-based question-answer system, allowing users

to interact with the model and receive accurate slot information based on their selected queries. The model

performs well in both day and night, even in low- light conditions, due to the diverse PKLot dataset used for

training, which covers various days and weather conditions. To improve efﬁciency and accuracy, some images

are converted to grayscale, and a custom dataset is created. This preprocessing optimizes performance for

nighttime monochromatic images, enhancing results under varying lighting. Trained on the PKLot dataset,

the model achieved a mean Average Precision (mAP) of 0.994 for vacant and 0.993 for parked slots. The

robust performance stems from the diversity of PKLot images and strategic preprocessing. In simpler terms,

To enhance the efﬁciency of parking, a novel approach has been adopted, which combines computer vision

with questionnaires presented in a user-friendly menu style, inspired by natural language processing (NLP).

This technique enables the model to accurately detect if parking spaces are available or occupied, ultimately

making the parking experience more convenient and hassle-free.

1 INTRODUCTION

The term ”Visual Question Answering” refers to the

process of observing an image or visual input and pro-

viding an appropriate response to a question related to

that visual content. It is important to consider how a

computer perceives an image or video through com-

puter vision, and then relates the question to the image

in order to provide an answer using natural language

https://orcid.org/0009-0006-5170-2880

https://orcid.org/0009-0009-2342-2306

https://orcid.org/0009-0003-0179-6493

https://orcid.org/0000-0002-3610-0332

processing. VQA has gained popularity in the ﬁelds

of natural language processing and computer vision

because humans have the ability to view objects in

images and comprehend their characteristics and be-

haviors. VQA is a multi-discipline research topic that

has grown in popularity in natural language process-

ing and computer vision because humans often see

objects in images and understand how they interact

with their qualities and states when they look at them

(Lu, Ding, et al. 2023). VQA has various practi-

cal applications, ranging from medical to fashion. It

tackles the complex task of combining image analysis

with question understanding to generate accurate re-

sponses. In this case, it’s used to determine the status

834

Kulkarni, S., Bashetti, A., Nasabi, S. and Mallibhat, K.

Beyond Sight: VQA for Car Parking Detection Using YOLOv8.

DOI: 10.5220/0013604100004664

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 3rd International Conference on Futuristic Technology (INCOFT 2025) - Volume 2, pages 834-843

ISBN: 978-989-758-763-4

of parking lots. As the number of cars grows, effec-

tive parking management becomes crucial for places

like universities, malls, and airports. The aim is to

develop a model that accurately detects vacant and

occupied parking spaces. In (Hela, Arakkal, et al.

2022), the slot detection exhibited inaccuracies and

errors when operating under low- light conditions.

The model is also trained to identify parking spaces

at night by analyzing black-and-white images (gray

scale images). Surprisingly, the research unveiled a

striking similarity between nighttime CCTV images

and their corresponding color images after being con-

verted to black and white. As a result, the author

can reliably detect and tally occupied and unoccupied

parking spaces both in daylight conditions, utilizing

RGB images captured by the camera, and at night,

using grayscale images.

The novelty of this work lies in assigning unique

numbers to each parking slot, allowing precise track-

ing of individ- ual slot occupancy. Additionally, the

model is designed to perform slot detection during

both day and night, using RGB and grayscale images,

ensuring continuous monitoring. This capability ad-

dresses a signiﬁcant challenge in parking lot manage-

ment by providing reliable performance throughout

the day and applicability of the system in real-world

parking lot management scenarios, providing contin-

uous and accurate detection in varying lighting condi-

tions.

1.1 Motivation

Parking-related issues have recently gained signiﬁ-

cant attention among the general public. In urban

cities, the need to address these problems has become

crucial. The rising population and vehicle numbers

have led to a shortage of parking spaces, creating sig-

niﬁcant challenges in ﬁnding suitable parking. This

issue is further worsened by the ineffective manage-

ment of existing parking facilities. This paper ad-

dresses parking as a management issue rather than

just a scarcity of spaces. The lack of information on

parking availability leads to resource loss and man-

agement challenges. Additionally, detecting empty

parking spaces at night is crucial. To address this

issue, researchers are working on a solution that in-

volves analyzing grayscale images of parking spots

captured by cameras. This data is then used as input

for their model. While ground sensors are commonly

used to differentiate between parking spaces, they re-

quire continuous maintenance and installation, which

can be expensive.

Figure 1: Car parking lot status determined using YOLOv8

2 LITERATURE SURVEY

Several literature surveys have explored the use of

machine learning and deep learning models in vari-

ous scenarios. In more extensive complexes where

self-assisted parking is not feasible and human assis-

tance is insufﬁcient, alterna- tive methods are neces-

sary. Some approaches involve ultrasonic, geomag-

netic, infrared, ground sensors, vision-based systems

that surpass traditional sensors, and other conven-

tional methods (Ding, and Yang, 2019). The study

of the outdoor parking lot parking space identiﬁcation

method is now one of the signiﬁcant research projects

utilizing deep learning to handle parking space detec-

tion (Li , 2022). In one study, researchers utilized

the Markov Random Field framework and support

vector machines to determine slot availability. In-

stead of employing image segmentation, they opted

for different machine learning algorithms. In (Zhang,

Zhao, et al. 2023), Parking spot detection is done us-

ing YOLOv5s on the PKLot dataset which achieved

99.6% accuracy. Also the custom dataset was cre-

ated using the data augmentation operations such as

changing bright- ness, contrast, standardization etc.

by collecting images from nine different surveillance

video clips and achieved accuracy of 80.38%. The

drawbacks of the paper is that the model was only

able to detect occupied slots and not vacant slots.

In (Sairam, et al. 2020), both the occupied and va-

cant slots in parking lots are detected using YOLO

and Mask R-CNN (Region-based Convolutional Neu-

ral Network) algorithm and achieved an accuracy of

94%. The improvements suggested to this paper were

that instead of using bounding boxes, masks can be

used for IoU (Intersection over Union) calculations as

they are more precise and easy way to examine the de-

gree of overlap/intersection between predicted object

and ground truth object. In (Singh, and Christoforou,

2021), detections were made using a public dataset

with images from surveillance cameras in various

conditions. Models like MobileNet and ResNet50

Beyond Sight: VQA for Car Parking Detection Using YOLOv8

835

were used for classiﬁcation. The focus was on im-

proving the ability to detect parking spaces without

relying on visible parking lines. The authors in the

paper, used VQA-based car parking occupancy de-

tection using YOLOv5 and intent classiﬁcation us-

ing RASA NLU. The model has also been tested in

low-light conditions, but the predictions were incor-

rect. In (Huang, He, et al. 2021), the model is speciﬁ-

cally trained to detune the car at night using Yolov3

and the MobileNet v2 network. In (Amato, Fabio, et

al. 2016), the authors employed the encoder-decoder

structure of SID deep learning approaches to fuse the

YOLO model, which was trained with night- time

data, with the SSD (Single-Slot Detector) model in

order to identify employing smart camera networks.

In (Duy-Linh, Xuan-Thuy, et al. 2023), it enhances

the backbone, neck, and head modules of the network

to improve YOLOv5 for parking lot identiﬁcation in

smart parking systems. These changes aim to sim-

plify processes, improve workﬂow, and detection per-

formance.

Recent advancements in automatic car parking

systems highlight the versatility of YOLOv8 across

diverse applica- tions. In one study, an ”Advanced

Car Parking System” integrated with Arduino Uno

and IoT demonstrated real-time parking availabil-

ity, enhancing user convenience while reducing traf-

ﬁc congestion (Sharmila, Rohinth, et al. 2024). A

comparative survey found YOLOv8 to be the most

efﬁcient for object detection in smart car parking

systems, particularly when combined with OpenCV

and EasyOCR for improved image enhancement and

number plate identiﬁcation (Naik, Borkar, et al.

2024). Additionally, research employing YOLOv5

and CNN for Automated Number Plate Recogni-

tion (ANPR) has improved efﬁciency in park- ing

management (Surve, Shirsat, et al. 2023). Other

studies have developed smart parking systems using

YOLOv5 and ResNet50, achieving high mean Av-

erage Precision (mAP) for parking space detection

on low-computation devices (Balusamy, Shanmugam,

et al. 2024). Notably, a novel approach utilizing

YOLOv8 for real-time identiﬁcation of vacant and oc-

cupied parking slots demonstrated signiﬁ- cant im-

provements in parking space management (Shankar,

Singh, et al. 2024). Moreover, the integration of

YOLOv8 with Optical Character Recognition (OCR)

technologies has shown substantial advancements in

character recognition accuracy for number plate de-

tection (Sarhan, Rahem, et al. 2024). Collectively,

these ﬁndings suggest that YOLOv8 not only excels

in parking-related tasks but also showcases its adapt-

ability and efﬁciency across various domains.

Object detection techniques are categorized into

traditional methods and deep learning-based ap-

proaches. Tradi- tional methods, such as Viola-Jones,

SIFT, and HOG, are slower due to computational lim-

itations. In contrast, deep learning mimics human

brain analysis, recognizing complex patterns in im-

ages and text for accurate predictions, and automating

tasks like image description and audio transcription.

1. Introduced in 2001 by Paul Viola and Michael

Jones, the Viola–Jones object detection framework

is a machine learning system primarily designed for

face detection but adaptable to various objects. While

it may not match the accuracy of convolutional neu-

ral networks, its efﬁciency and compact size make it

suitable for scenarios with limited computational re-

sources (Viola and Jones, 2001). Another approach

in computer vision is the HOG, which counts gra-

dient orientations in speciﬁc image areas, though

it faces challenges like slow processing speed and

limitations with scale and light variations in human

detection (Dalal and Triggs, 2005). Additionally,

David Lowe’s SIFT algorithm, dating back to 1999,

is widely used for identifying, characterizing, and

matching local features in images. Its applications

span object recognition, image stitching, 3D model-

ing, gesture recognition, video tracking, wildlife iden-

tiﬁcation, robotic mapping, and navigation, ensuring

reliable object identiﬁcation through feature descrip-

tions extracted from training images (Paolo, Andrea

Prati, et al 2012).

2. R-CNN (Regions with CNN Features) revolu-

tionized object detection by combining CNNs with

region pro- posals, signiﬁcantly improving accuracy

(Gandhi, 2018). The SSD excels in balancing speed

and efﬁciency for real-time object recognition, mak-

ing a notable impact on applications requiring quick

and precise identiﬁcation. YOLO is a real-time object

recognition technique known for efﬁciently locating

and identifying multiple objects within an image or

video frame, streamlining the process for enhanced

speed and effectiveness (Ding and Yang, 2019).

2.1 Objectives

The project focuses on the following objectives. Pro-

pose and implement a deep learning algorithm for

the detec- tion of vacant and occupied slots in a car

parking system. The developed solution should be

able to detect the slots both for daylight and low-

light scenes. Further employ a menu-based question-

answering system, facilitated by a questionnaire, to

understand the user requirements and answer accord-

ingly.

INCOFT 2025 - International Conference on Futuristic Technology

836

2.2 Datasets

Several available datasets include CNRPark+EXT,

CARPK, NDISPark, Parking Space Detection and

Classiﬁcation, and PKLot available on different web-

sites. Out of these datasets, the CNRPark+Ext dataset

contains car parking lot images captured by various

cameras with different perspectives and under differ-

ent weather conditions. Also, the Car Parking Lot

Dataset (CARPK) consists of approximately 90,000

parking lot car images collected by means of drones

in order to have clearly captured images. NDISPark

dataset includes 250 images of several parking areas

of different situations in real-time captured using dif-

ferent cameras under various weather conditions and

different angles. The Parking Space Detection and

Classiﬁcation dataset comprised images of parking lot

spaces along with corresponding bounding boxes and

annotations. For the model training, PKLot dataset

is selected, which includes 12,416 images sourced

from surveillance camera frames. These images cap-

ture different parking lots under varied weather con-

ditions, including sunny, cloudy, and rainy days. The

dataset categorizes images into either occupied or un-

occupied, which aids in training the model to differ-

entiate between these two classes effectively. To en-

hance the diversity of the training data, a subset of the

PKLot dataset is strategically employed. Out of the

9656 images selected for training, 1242 were trans-

formed to grayscale. This transformation emulates

night-time captures from CCTV cameras, further en-

riching the dataset with varied lighting conditions.

The remaining images retain their RGB format, con-

tributing to a well- balanced and representative train-

ing dataset. The dataset consisted of 6767 training

images, 1927 validation images, and 962 images for

testing purpose.

2.3 Proposed Pipeline

The ﬁrst step in the approach is to perform object

detection on images capturing car parking lots from

the collected PKLot dataset. This phase employs

YOLOv8, a state-of-the-art object detection model.

The YOLOv8 model is used for object detection due

to its efﬁciency and high accuracy in detecting ob-

jects in real-time. The table I presents a comparison

of different YOLO models. YOLOv8 is particularly

well-suited for parking slot detection, as it can rapidly

process images while maintaining high detection ac-

curacy. For the object detection process, two distinct

classes have been deﬁned: ”parked” to indicate oc-

cupied slots and ”vacant” for unoccupied slots. The

YOLOv8 model requires images and corresponding

Figure 2: RGB image from PKLot dataset and converted

grayscale image

annotation ﬁles in the YOLO format, detailing class

labels and bounding box coordinates (x, y, width,

height). This YOLO algorithm systematically divides

the input image into a grid of cells, generating pre-

cise bounding boxes for detected objects. The bound-

ing boxes are used to localize objects in the image.

Each predicted bounding box is associated with a con-

ﬁdence score, coordinates, and class predictions (va-

cant or parked). YOLOv8’s performance is evaluated

using the mAP metric, with results showing an mAP

of 0.993 for vacant slots and 0.988 for parked slots,

indicating high accuracy in detecting parking slots un-

der varied conditions. After the successful execution

of object detection, the methodology transitions to the

object counting phase. This step involves counting

the detected objects categorized as parked and vacant

slots separately. The counting process is crucial for

generating answers to user queries and provides key

insights into the current status of the parking lot.

With both object detection and counting achieved,

the subsequent step involves integrating these results

with a menu-based Question Answering system. To

facilitate user interaction, a user-friendly menu featur-

ing a set of ques- tions has been introduced. The users

can select a question of interest, prompting the system

to provide a relevant response. The range of ques-

tions includes various aspects related to parking, such

as the count of parked and vacant slots, and the over-

all status of the parking lot. This system is designed

to enhance the informativeness and accessibility of

the parking lot management system, making it eas-

ier for users to quickly retrieve essential information

Beyond Sight: VQA for Car Parking Detection Using YOLOv8

837

Figure 3: Proposed pipeline

about park- ing availability. The overall methodology

integrates object detection, counting, and question-

answering seamlessly. Once the YOLOv8 model de-

tects and classiﬁes parking slots, the results are passed

to the question answering system, which maps the

detected classes (vacant and parked) to the appropri-

ate answers. For example, if the user asks, ”How

many vacant slots are there?”, the system counts the

instances of vacant slots (class 0) and provides the

result. The Questions are displayed in Table II. To

make this interaction seamless, the system is designed

with a user-friendly menu, enhancing accessibility

and making the solution more intuitive. This inte-

gration of object detection, counting, and interactive

querying provides an effective and user-centric solu-

tion that facilitates real-time parking lot analysis and

decision-making.

3 IMPLEMENTATION AND

RESULTS

In order to train the model, the required dataset

must contain the images of parking lots depicting

clearly the parked and vacant slots. The downloaded

dataset named PKLot contains a separate set of im-

ages for training, validation and testing. The anno-

tation ﬁle (.txt ﬁle) contains [class, x center, y cen-

ter, width, height] which is in YOLO format and the

data is normalized. The dataset images and annota-

tion ﬁles are uploaded in ROBOFLOW in order to get

annotated dataset where image is annotated with its

corresponding annotations .txt ﬁle resulting in forma-

tion of bounding boxes along with classes shown in

Table 1: Comparison Table

Method Accuracy Ref

YOLOv5 90% 13

CCN 100% -

YOLOv5 94.31% 2

Custom-YOLOv5M mAP@0.5:

97%

YOLOv8 94.27% 16

YOLOv8 98.70% 15

YOLOv8 (Our

model)

mAP@0.5:99.3%-

Table 2: Menu of questions provided to the user

Sl.No. Questions

1 How many cars are parked?

2 How many vacant slots are available?

3 What is the status of the parking lot?

4 Is this slot parked or vacant?

5 Can I park my car?

6 Can I know which all slots are occu-

pied?

7 Can I know which all slots are va-

cant?

Fig. 4. ( class0 – vacant & class1 – parked)

The image shown in Fig. 4 is a sample image de-

picting how an image looks after it gets annotated in

the ROBOFLOW platform, where there are 37 parked

slots and 3 vacant slots. Similar annotations are done

for all 9656 input images. Now a code snippet is gen-

erated which will be compiled in google colab for fur-

ther training of the model. The annotated dataset is

INCOFT 2025 - International Conference on Futuristic Technology

838

Figure 4: Sample annotated image of PKLot Dataset from

Roboﬂow platform

executed in google colab environment using YOLOv8

with a code snippet and the model is now trained with

5 epochs and imagsz=700 which took around 0.707

hours resulting in a) GPU mem b) box loss c) cls loss

d) dﬂ loss e) instances and f) size as shown in Fig. 5

Fig. 5 provides an overview of the model’s per-

formance metrics after 5 epochs of training using

YOLOv8 on the parking slot dataset. The graphics

processing unit (GPU) memory usage during training

was 11.5 GB, indicating the computational resources

required for processing. The box loss (0.734) rep-

resents the error in predicting the bound- ing boxes

for the parked and vacant slots, while the class loss

(0.4071) captures the error in classifying the park-

ing spaces as either vacant or occupied. Additionally,

the distribution focal loss (DFL) (0.8912) corresponds

to the model’s conﬁdence in assigning appropriate

bounding boxes during object detection. These losses

indicate the training per- formance, with lower values

corresponding to better predictions. The model han-

dled a total of 318 instances across 1922 images, with

a resolution size of 704x1024 for the input images.

The model’s precision and recall values for detecting

both vacant (class 0) and parked (class 1) slots were

remarkably high, with a precision (P) of 0.99 and re-

call (R) of 0.993 for parked slots, and a precision of

0.993 and recall of 0.99 for vacant slots. The mAP50

(mean Average Precision at 50% IoU threshold) for

both classes was nearly perfect, reaching 0.993 and

0.994, respectively. For more stringent evaluation,

the mAP50-95 metric, which considers IoU thresh-

olds between 50% and 95%, showed a slightly lower

value of 0.88, which is still an excellent result, in-

dicating the model’s robustness in detecting parking

slots under varied conditions and overlapping bound-

ing boxes.

The line graphs shown in Fig. 6 depict the re-

sults of different parameters accessed for object de-

tection technique resulting in how the model is get-

ting trained in batch wise along with the annotated

set of images. The train box loss, train class loss,

and train DFL loss all show a steady decline as the

number of epochs increases, indicating that the model

improves its accuracy in predicting bounding boxes

and classifying parked or vacant slots over time. Sim-

ilarly, the validation box loss, validation class loss,

and validation DFL loss follow a similar downward

trend, suggesting that the model generalizes well to

unseen data. The performance metrics, such as preci-

sion and recall, also show an upward trend, stabilizing

as the model progresses through the epochs, reﬂect-

ing improvements in the model’s ability to correctly

detect and classify parking slots. The mAP50 and

mAP50-95 metrics demonstrate consistent improve-

ment, indicating that the model is becoming more

proﬁcient in handling varying degrees of overlap be-

tween predicted and actual bounding boxes. These re-

sults demonstrate the efﬁcacy of the YOLOv8 model

in parking slot detection, providing high accuracy

for both parked and vacant slot classiﬁcation, while

also maintaining strong performance across a range

of challenging IoU thresholds. The decreasing loss

metrics and increasing precision and recall highlight

the robustness of the model in detecting parking slots

in both training and validation phases.

To detect parking slots and count the number of

parked and vacant classes, simple YOLO command

is used i.e. results[0].boxes.cls which gives classes

(0 and 1 in our case) for each detected object us-

ing which the total parked and vacant slots can be

counted. After this part, there is a ﬁxed set of ques-

tions which the user is going to ask, the menu- based

question answer approach makes the user select the

question as per the requirement and our model pro-

cesses the image captured at that point of time and

generates the output. In order to make it user friendly,

GUI (Graphical User Interface) has been designed

and implemented using Visual Studio Code software

as shown in Fig. 8.

3.1 Measuring Outputs

The performance of the car parking slot detection

model is evaluated using a variety of metrics to ensure

a compre- hensive understanding of its capabilities.

Key performance metrics such as precision, recall,

and F1 score are utilized to assess the model’s perfor-

mance, and their respective curves (see Fig. 9) pro-

vide valuable insights into how well the model iden-

tiﬁes parking slots at different conﬁdence levels. The

Precision-Conﬁdence Curve plots the model’s preci-

sion against its conﬁdence score. Precision is the

ratio of true positive detections to the total number

of detec- tions (both true positives and false posi-

tives). As seen in the Fig. 9, the precision remains

high across most conﬁdence levels, indicating that

the model consistently detects parking slots with high

accuracy. A similar trend is observed in the Recall-

Conﬁdence Curve, which plots the recall against con-

Beyond Sight: VQA for Car Parking Detection Using YOLOv8

839

Figure 5: Model training with 5 epochs

Figure 6: Results during training phase

ﬁdence scores. Recall measures the model’s ability to

correctly identify all positive instances (in this case,

both vacant and parked slots) by considering the ratio

of true positives to the total number of actual posi-

tive instances. In this case, the recall is high, sug-

gesting that the model can accurately detect a large

proportion of the parking slots in the images. The F1-

Conﬁdence Curve is a combined measure of precision

and recall, offering a balanced evaluation by consider-

ing both false positives and false nega- tives. The F1

score provides an overall assessment of the model’s

capability to accurately predict parking slots under

varying conﬁdence levels. As seen in the plot, the

model achieves a high F1 score, demonstrating that it

maintains an optimal balance between precision and

recall. The Precision-Recall Curve further quantiﬁes

the trade-off between precision and recall, allowing

for a visual comparison of the model’s performance.

The curve clearly indicates that the model achieves a

high level of performance for both the ”parked” and

”vacant” classes, as indicated by the respective preci-

Figure 7: Results after training

sion and recall values. The mAP value suggests that

the model exhibits excellent precision and recall at

various conﬁdence thresholds, making it well-suited

for real-time car parking slot detection. Additionally,

the Confusion Ma- trix (see Fig. 10) provides a de-

tailed breakdown of the model’s classiﬁcation results,

showing the true positives, false positives, true neg-

atives, and false negatives. In the matrix, the rows

represent the actual classes (parked, vacant, and back-

ground), while the columns represent the predicted

classes. The values in the matrix show the propor-

tion of cor- rect and incorrect classiﬁcations made

by the model. The values on the diagonal indicate

correct classiﬁcations, with 99% accuracy for both

parked and vacant classes. The off-diagonal values

indicate misclassiﬁcations, with a small percentage of

false positives for ”vacant” slots and false negatives

INCOFT 2025 - International Conference on Futuristic Technology

840

Figure 8: Images of GUI for user interaction selecting dif-

ferent questions from the menu

for ”parked” slots. This detailed analysis helps in

identifying speciﬁc areas where the model can be im-

proved, such as minimizing misclassiﬁcations and en-

suring better handling of background noise. Together,

these evaluation metrics offer a comprehensive under-

standing of the model’s performance, ensuring that

the car parking slot detection system is both accurate

and reliable across various conditions.

Figure 9: Precision-Conﬁdence, F1-Conﬁdence, Recall-

Conﬁdence and Precision-Recall Curve

Figure 10: Confusion Matrix

3.2 Limitations

The model’s performance in detecting parking slots

raises several critical considerations, particularly in

environ- ments where the parking area is surrounded

by or covered by trees, as trees can obstruct cam-

era views, complicating overhead monitoring. Cam-

eras on tall buildings may struggle with clear images.

Proper camera placement is cru- cial, as low angles

can miss sections of the lot, while high angles may

hinder depth perception, making it difﬁcult to distin-

guish between closely parked cars and vacant slots.

These factors signiﬁcantly impact detection accuracy.

The challenge is illustrated in Fig. 11, where the

model struggles to accurately predict all parked and

vacant slots due to the dataset’s capture angle. The

camera’s position signiﬁcantly affects prediction ac-

curacy. To address this, training the model on a more

diverse dataset with images from various angles and

perspectives can enhance its generalization and ro-

bustness. Exposure to different viewpoints during

training will enable the model to adapt to variations

in camera positions, improving its ability to predict

parking slot occupancy in real-world scenarios. Ulti-

mately, diverse data collection is crucial for optimiz-

ing performance across varying conditions.

Figure 11: Image showing improper predictions

Beyond Sight: VQA for Car Parking Detection Using YOLOv8

841

4 CONCLUSION

In this study, we proposed a robust real-time VQA

model for car parking slot detection, utilizing the

state-of-the- art YOLOv8 object detection frame-

work. The model successfully detected and classiﬁed

parked and vacant parking slots with an impressive

mAP of 0.993 at an IoU of 0.5. Through the applica-

tion of the PKLot dataset, the research demonstrated

the critical importance of both high-quality and suf-

ﬁciently diverse datasets, alongside effective prepro-

cessing techniques, in achieving optimal model per-

formance. The integration of a menu-based question-

answering interface further enhanced the usability of

the system, allowing for interactive user queries. Our

results emphasize that YOLOv8 is capable of deliv-

ering accurate and reliable results even in scenarios

with limited data. The results under- score that both

the quality and quantity of the dataset, along with

the chosen processing techniques, are key factors for

the model’s success. These ﬁndings validate the ro-

bustness of the YOLOv8 model across varied condi-

tions, show- ing its capability to deliver precise and

reliable car parking analysis in real-time applications.

The metrics, including precision, recall, and F1-score,

collectively highlight the efﬁciency and potential of

this model for real-time parking lot monitoring. Fu-

ture work will aim at further enhancing model accu-

racy, exploring the integration of additional features,

and optimizing the system for scalability across larger

datasets and varied environmental conditions.

4.1 Future scope

For future work, the proposed car parking slot detec-

tion system could be expanded in several ways to en-

hance its applicability and performance. Firstly, the

system could be improved by integrating real-time

video processing to track vehicle movement dynam-

ically, allowing for real-time parking slot updates.

This would enable the system to monitor multiple

camera feeds simultaneously in large parking areas,

ensuring up-to-the-minute information on slot avail-

ability. Additionally, the model can be further op-

timized to handle more challenging conditions such

as adverse weather (rain, fog, snow) by incorporating

specialized datasets for these conditions. Advanced

techniques like multi- sensor fusion, which combines

data from cameras, LiDAR, or radar, can be explored

to enhance detection in areas where visibility is lim-

ited. The integration of cloud-based solutions for cen-

tralized parking lot management could also facilitate

easier scalability and allow for larger-scale deploy-

ment across various environments, including urban

and rural settings. Another advancement is creating

a model that can identify free parking spaces with-

out relying on visible parking lines. This capabil-

ity would enhance the system’s versatility, allowing

it to function in environments like regular streets or

unmarked lots, thereby expanding its applicability to

various real-world scenarios. Lastly, the system could

be adapted to incorporate payment and reservation

features for users, providing a seamless end-to-end

solution for parking lot management. The addition

of predictive analytics and AI-driven recommenda-

tions, which predict future slot availability based on

usage patterns, could also be an interesting avenue for

future research and development. This would make

the system more intelligent and user-friendly, offer-

ing a holistic approach to smart parking solutions. By

addressing these areas, the system could evolve into

a more robust, accurate, and adaptable solution for

smart parking management.

REFERENCES

Lu, Siyu & Ding, Yueming & Liu, Mingzhe & Yin, Zheng-

tong & Yin, Lirong & Zheng, Wenfeng. (2023). Multi-

scale Feature Extraction and Fusion of Image and Text

in VQA. International Journal of Computational Intel-

ligence Systems. 16. 10.1007/s44196-023- 00233-6.

Hela, Vikash & Arakkal, Lubna & Kalady, Saidalavi.

(2022). CarParkingVQA: Visual Question Answering

application on Car parking occupancy detection. 1-6.

10.1109/INDICON56171.2022.10039925

Ding, Xiangwu & Yang, Ruidi. (2019). Vehicle and Parking

Space Detection Based on Improved YOLO Network

Model. Journal of Physics: Conference Series. 1325.

012084. 10.1088/1742-6596/1325/1/012084

Li, Borui. (2022). Detection of Parking Spaces in Open

Environments with Low Light and Severe Weather.

10.2991/978-2-494069-31-2 354

Zhang, Xin, Wen Zhao, and Yueqiu Jiang. ”Study on

Parking Space Recognition Based on Improved Im-

age Equalization and YOLOv5.” Electronics 12.15

(2023): 3374.

Sairam, Bandi, et al. ”Automated vehicle parking slot detec-

tion system using deep learning.” 2020 Fourth inter-

national conference on computing methodologies and

communication (ICCMC). IEEE, 2020.

Singh, C., & Christoforou, C. (2021). Detection Of

Vacant Parking Spaces Through The Use Of

Convolutional Neural Network. The Interna-

tional FLAIRS Conference Proceedings, 34.

https://doi.org/10.32473/ﬂairs.v34i1.128470

Huang, Shan & he, ye & Chen, Xiao-an. (2021). M-YOLO:

A Nighttime Vehicle Detection Method Combining

Mobilenet v2 and YOLO v3. Journal of Physics:

Conference Series. 1883. 012094. 10.1088/1742-

6596/1883/1/012094

INCOFT 2025 - International Conference on Futuristic Technology

842

Amato, Giuseppe & Carrara, Fabio & Falchi, Fab-

rizio & Gennaro, Claudio & Vairo, Claudio.

(2016). Car parking occupancy detection using smart

camera networks and Deep Learning. 1212-1217.

10.1109/ISCC.2016.7543901

Nguyen, Duy-Linh & Vo, Xuan-Thuy & Priadana, Adri &

Jo, Kang-Hyun. (2023). YOLO5PKLot: A Parking

Lot Detection Network Based on Improved YOLOv5

for Smart Parking Management System. 10.1007/978-

981-99-4914-4 8

P. Sharmila, P. Rohinth, G. Sarvesh, P. Priyadhar-

shan, M. Balasakthishwaran and S. Gandhi, ”Next-

Gen Parking Solutions,” 2024 International Confer-

ence on Communication, Computing and Internet of

Things (IC3IoT), Chennai, India, 2024, pp. 1-6, doi:

10.1109/IC3IoT60841.2024.10550402.

V. Naik, V. Borkar, K. M. C. Kumar and P. Shetgaonkar,

”A Survey On Various Smart Car Parking Systems,”

2024 3rd International Conference for Innovation in

Technology (INOCON), Bangalore, India, 2024, pp.

1-6, doi: 10.1109/INOCON60754.2024.10511587.

D. P. Surve, K. Shirsat and S. Annappanavar, ”“Parkify”

Automated Parking System Using Machine Learn-

ing,” 2023 International Conference on Integrated

Intelligence and Communication Systems (ICIICS),

Kalaburagi, India, 2023, pp. 1-7, doi: 10.1109/ICI-

ICS59993.2023.10421723.

D. Balusamy, G. Shanmugam, B. Rameshkumar, A.

Palanisamy, G. Kingston and L. Ravi, ”Autonomous

Parking Space Detection for Electric Vehicles Based

on Advanced Custom YOLOv5,” 2024 3rd Interna-

tional Conference on Artiﬁcial Intelligence For Inter-

net of Things (AIIoT), Vellore, India, 2024, pp. 1-5,

doi: 10.1109/AIIoT58432.2024.10574648.

V. Shankar, V. Singh, V. Choudhary, S. Agrawal

and B. Awadhiya, ”An Enhanced Car Parking

Detection Using YOLOv8 Deep Learning Capa-

bilities,” 2024 First International Conference on

Electronics, Communication and Signal Processing

(ICECSP), New Delhi, India, 2024, pp. 1-5, doi:

10.1109/ICECSP61809.2024.10698155.

Sarhan, A., Abdel-Rahem, R., Darwish, B. et al. Egyptian

car plate recognition based on YOLOv8, Easy-OCR,

and CNN. Journal of Electrical Systems and Inf Tech-

nol 11, 32 (2024). https://doi.org/10.1186/s43067-

024-00156-y

P. Viola and M. Jones, ”Rapid object detection using

a boosted cascade of simple features,” Proceed-

ings of the 2001 IEEE Computer Society Confer-

ence on Computer Vision and Pattern Recognition.

CVPR 2001, Kauai, HI, USA, 2001, pp. I-I, doi:

10.1109/CVPR.2001.990517.

N. Dalal and B. Triggs, ”Histograms of oriented gradients

for human detection,” 2005 IEEE Computer Society

Conference on Computer Vision and Pattern Recog-

nition (CVPR’05), San Diego, CA, USA, 2005, pp.

886-893 vol. 1, doi: 10.1109/CVPR.2005.177.

Piccinini, Paolo, Andrea Prati, and Rita Cucchiara. ”Real-

time object detection and localization with SIFT-based

clustering.” Image and Vision Computing 30.8 (2012):

573-587.

R. Gandhi, ”R-CNN, Fast R-CNN, Faster R-CNN, YOLO

— Object Detection Algorithms,” 2018.

Xiangwu Ding and Ruidi Yang, ”Vehicle and Parking Space

Detection Based on Improved YOLO Network,” Conf.

Ser. 1325 012084

Beyond Sight: VQA for Car Parking Detection Using YOLOv8

843