many IoT frameworks. Nevertheless, these
technologies necessitate the installation of a variety
of sensors in automobiles. These technologies are
expensive and energy-intensive to upgrade the
current cars, and because they are sensitive to
environmental noise, the detection accuracy of traffic
density estimation is not very good.
This work proposed a new framework called IoT-
based Intelligent Urban Traffic System (I2UT S) to
control the signal light and overcome the problems
mentioned above. This framework was created
through the present CCTV network. CCTV footage
has been useful and fairly inexpensive in the past.
Over the years, various studies have used CCTV
camera networks as input sensors to address different
traffic issues, such as predicting accidents Balid, W.,
Tafish, H., & Refai, H. H. (2017)., studying the
spatiotemporal behaviour of pedestrian crossing and
detecting knives and firearms Chen, L., Englund, C.,
& Papadimitratos, P. (2017). The traffic density was
estimated using the I2UT S framework, supported by
state-of-the-art CNN and CCTV footage. Traffic
density was one of the most important factors that
affect the controlling of traffic signal. In addition to
addressing the privacy issue we employed Yolo v3
with a darknet backbone for computational resources
using an edge device, Raspberry Pi, with the
proposed scheduling mechanism. Although its I2UT
S score of 68.10% revealed vehicle detection
performance among the best-in-class, its end-to-end
convolutional neural network (CNN)-based visible
light data demanded a TD and CV for run processing
that has been too large to provide any useful real-
time traffic network pedagogy.
The first challenge with database of vehicles was
that of I2UT S, so the innovative feature was how to
introduce the state of emergency vehicles such as
police cars, fire engines, ambulances into traffic
signal periods, and coordination of traffic signal
periods with comprehensive scheduling algorithm of
vehicles. It was difficult to find a labeled dataset of
the emergency vehicle because the suggested CNNs
Yolo v3-Efficient Net rely on labeled data. Given the
infrequent presence of emergency vehicles in traffic,
locating such a dataset is difficult. One area of
research has been the proposal of emergency vehicle
datasets. In addition to manually annotating 1500
photographs, researchers have also used YouTube
streaming, Google search, and manual filtering of
images from the Kaggle dataset. These datasets suffer
from significant viewpoint fluctuations due to the
combination of many picture acquisition sources,
making them inappropriate for our Internet of Things
system since the CCTV camera, the input sensor, has
a fixed viewpoint. Weather variation is another issue
with emergency vehicle detection, in addition to
viewpoint variance. The accuracy of vehicle detection
is frequently greatly reduced by unfavorable weather
conditions visible in the photos. Non-emergency cars
are more common on CCTV photos for a set period
of time since their speeds are relatively slow.
Together with the availability of a sizable collection
of tagged photos in various weather conditions as a
training input for YOLO V3, this improves their
detection accuracy. These issues make it abundantly
evident why emergency vehicles are not permitted to
use RGB cameras. This is a strong incentive to revisit
our earlier work on emergency vehicle detection
using I2UT S. Around the world, emergency vehicles
utilize sirens, which are loud noises, to warn
oncoming traffic. In this study, we present a multi-
modal distributed Internet of Things framework that
uses image-based traffic density estimation in
conjunction with the ability to identify the unique
sound characteristics of emergency vehicles.
One prominent second issue with I2UT S is the
high variance of mean average precision (mAP)
between classes of vehicle detector however the "van
class" which returned a mAP of 37.49%, acted as an
n outlier. The proposed YOLOV3-Effecient net also
demonstrate a significant false positive (FP) over true
positive (TP) count for "van" at FDR = 63.23% (not
shown here). YOLO outperforms two stage object
detectors such as RCNN because of its speed on edge
devices. However, there is some drawback in YOLO
such as it cannot estimate the optimal number of
clusters and also cannot localize small objects and
vehicles. This is one of the primary reasons for why
the I2UT S has a high rate of false positives in
classifying bus stops as scooters, given their similar
characteristics. Two-stage detectors introduce an
additional stage for region proposals, whereas single
stage detectors sample region densely and classify
and localize all objects in one pass.
Although the region proposal stage improves
object detector performance, it is computationally
costly. Nevertheless, the CCTV camera on the road is
stationary, making it difficult to change the viewpoint
of the pictures it takes. Road infrastructure has not
changed much either. In light of this benefit, we
suggest a brand-new two-stage detector road-based
YOLO ("R-YOLO") that limits vehicle searches to
the road with low computational resource
consumption similar to single-stage detectors (like
YOLO) and improved performance accuracy
comparable to two-stage object detectors (like
RCNN).