Towards Efﬁcient Driver Distraction Detection with DARTS-Optimized

Lightweight Models

Yassamine Lala Bouali

1,2 a

, Olfa Ben Ahmed

1 b

, Smaine Mazouzi

2 c

and Abbas Bradai

3 d

XLIM Research Institute, URM CNRS 7252, University of Poitiers, France

Computer Science Dept., University of 20 Aout 1955, Skikda, Algeria

University Cote d’Azur LEAT, CNRS UMR 7248, Biot, France

Keywords:

Human-Computer Interaction, Emotional States, Driver Distraction, DAS, Deep Learning, NAS, DARTS.

Abstract:

Driver Distraction is, increasingly, one of the major causes of road accidents. Distractions can be caused

by activities that may shift the driver’s attention and potentially evoke negative emotional states. Recently,

there has been notable interest in Driver Assistance Systems (DAS) designed for Driver Distraction Detection

(DDD). These systems focus on improving both safety and driver comfort by issuing alerts for potential haz-

ards. Recent advancements in DAS have prominently incorporated deep learning techniques, showcasing a

shift towards sophisticated and intelligent approaches for enhanced performance and functionality. However,

model architecture design is mainly based on expert knowledge and empirical evaluations, which are time-

consuming and resource-intensive. Hence, it is hard to design a model that is both efﬁcient and accurate at the

same time. This paper presents a Neural Architecture Search (NAS)-based approach for efﬁcient deep CNN

design for DDD. The proposed approach leverages RGB images to train a lightweight model with few parame-

ters and high recognition accuracy. Experimental validation is performed on two driver distraction benchmark

datasets, demonstrating that the proposed model outperforms state-of-the-art models in terms of efﬁciency

while maintaining competitive accuracy. We report 99.08% and 93.23% with model parameter numbers equal

to 0.10 and 0.14 Million parameters for respectively SFD and AUC datasets. The obtained architectures are

both accurate and lightweight for DDD.

1 INTRODUCTION

With the development of smart vehicles, Driver As-

sistance System (DAS) in human-centered transporta-

tion has attracted much attention in recent years (Xing

et al., 2021). Using an intuitive Human-Machine in-

terface, such systems aim to enhance driver comfort,

ensure safety, and assist drivers. Driver monitoring

in terms of emotions, behaviors, and actions is a key

application of DAS to control the driver’s mood and

emotions (McCall and Trivedi, 2006). In this con-

text, Affective Computing is revolutionizing the au-

tomotive industry by creating DAS capable of rec-

ognizing, interpreting, processing, and responding to

human emotions and behaviors (Nareshkumar et al.,

2023). Through the integration of sensors, cameras,

https://orcid.org/0000-0002-2133-6086

https://orcid.org/0000-0002-6942-2493

https://orcid.org/0000-0003-3587-7657

https://orcid.org/0000-0002-6809-4897

and AI algorithms, vehicles can detect signs of fa-

tigue, stress, or distraction. This prompts the vehi-

cle to issue alerts or take corrective actions, thereby

ensuring driver safety.

Driver distraction is a major cause of road acci-

dents. According to recently published statistics (for

Statistics and Analysis, 2023), eight percent of fatal

car accidents are due to distraction. Indeed, nowa-

days, drivers are continually bombarded with poten-

tial distractions due to the widespread use of smart-

phones, infotainment systems, and various other in-

car technologies. These circumstances can compro-

mise the driver’s attentiveness and disturb their over-

all mood, thereby impacting their ability to drive

safely. Driver distraction can be categorized into three

main types (Lee, 2005): 1) visual distraction, such

as diverting one’s gaze away from the roadway, 2)

cognitive distraction, which involves the mind being

diverted from the road, and 3) manual distraction,

including activities like responding to a ringing cell

phone. It is worth noting that distractions caused by

480

Lala Bouali, Y., Ben Ahmed, O., Mazouzi, S. and Bradai, A.

Towards Efﬁcient Driver Distraction Detection with DARTS-Optimized Lightweight Models.

DOI: 10.5220/0012595400003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 1, pages 480-488

ISBN: 978-989-758-680-4; ISSN: 2184-433X

a driver’s activities can potentially lead to a shift in

emotional state. Indeed, recent psychological stud-

ies have shown that driver emotions can be aroused

and activated by driver activities, such as a call phone,

texting information, or radio information (Fern

andez

et al., 2016). Concretely, attentive drivers focus on

the vehicle, the trafﬁc, and the surroundings enabling

them to anticipate to unforeseen dangers. A serious

problem can arise when a driver loses attention and

becomes concentrated on an extra event (activity) that

affects his mental and emotional states. For exam-

ple, using a phone and having a nervous conversation

can transform the affective state of the driver and de-

crease driving performance and concentration. Such

an event or behavior redirects the driver’s attention

and makes driving difﬁcult and unsafe.

Artiﬁcial Intelligence has signiﬁcantly revolution-

ized the DAS by the investigation of deep learning

techniques. Deep learning, especially Convolutional

Neural Networks (CNNs), have been widely used for

driver behavior (Shahverdy et al., 2020) and emo-

tions (Zepf et al., 2020) recognition. Existing models

are hand-designed, and obtained after several exper-

iments on different architectures and substantial pa-

rameters tuning. Despite good model performance,

the parameter size poses a signiﬁcant challenge for

real-world applications due the limitations of vehicle-

mounted computing equipment.

In this paper, we propose a gradient-based NAS

method for automatic deep neural network design in

the context of DDD. The proposed method is based

on a Differentiable Architecture Search (DARTS).

The latter is known for its reduced search cost, com-

pared to non-differentiable NAS, and ﬂexibility for

searching for high-performance architectures. We use

RGB images to search for light models with few pa-

rameters and high recognition accuracy. We con-

duct experiments on two driver distraction benchmark

datasets namely, the State Farm Distracted Driver

Dataset (SFD) and the American University in Cairo

Distracted Driver Dataset (AUC). To the best of our

knowledge, our work is the ﬁrst to investigate DARTS

for a real-world application, namely driver distrac-

tion detection. The rest of the paper is organized as

follows: Section 2 discusses recent works on deep

learning-based methods for DDD. Section 3 describes

the proposed method. Section 4 presents experiments

and results and ﬁnally, section 5 concludes the work

and opens new perspectives.

2 RELATED WORK

Driver distraction detection ﬁeld has been notably

inﬂuenced by the transformative capacities of deep

learning, especially Convolutional Neural Networks

(CNNs) (Li et al., 2021). Therefore, a variety of ap-

proaches using multiple data types and sensors have

been proposed in the literature for the DDD. For in-

stance, some works investigated multi-sensing data

(Nidamanuri et al., 2022) (Das et al., 2022) and bi-

ological signals (Chen et al., 2022) (Dolezalek et al.,

2021). However, fusing data from different sensors

is complex and requires the presence of all sensors

in the prediction phase. Moreover, leveraging phys-

iological data, to infer and understand the cognitive

and emotional states of drivers can be deemed in-

vasive due to their reliance on physiological mea-

surements. Visual data, namely RGB images, has

emerged as the most effective and affordable infor-

mation due to its non-intrusive nature. This practical-

ity makes it suitable for real-world applications (Zeng

et al., 2022). In this context, CNNs have been exten-

sively trained on large-scale imaging datasets for the

DDD (Koay et al., 2022). For instance, (Ai et al.,

2019) proposed an attention-based CNN combined

with VGG16 and built an accurate model with 140M

parameters. (Dhakate and Dash, 2020) integrated fea-

tures extracted from RESNET, InceptionV3, Xcep-

tion, and VGG networks and trained a second-level

neural network and achieved an accuracy of 92.20%

on the State Farm dataset (SFD) with 25.60 M param-

eters. Similarly, (Eraqi et al., 2019) utilized a genetic

algorithm to assign weights to a CNN ensemble and

achieved 94.29% accuracy with 62.00 M parameters

on the American University in Cairo dataset (AUC).

(Huang and Fu, 2022) proposed a deep 3D residual

network with an attention mechanism and encoder-

decoder for predicting the true driver’s focus of atten-

tion. (Wang and Wu, 2023) enhanced the generaliza-

tion of DDD using multi-scale feature learning and

domain adaptation, achieving an accuracy of 96.82%

on SFD with 23.67 M parameters.

However, the aforementioned models remain too

large. Indeed, the automotive context requires

lightweight solutions, and neglecting such constraints

may result in models that are accurate but inefﬁ-

cient. Addressing this challenge, recent works have

proposed hand-crafted lightweight models such as

MobileNetV2-tiny, (Wang et al., 2022b) and MTNet

(Zhu et al., 2023). Nevertheless, manually designing

CNN is a time-consuming and iterative task that of-

ten requires a high level of expert knowledge. More-

over, the iterative nature of the design process implies

training models until a satisfactory result leading to

Towards Efﬁcient Driver Distraction Detection with DARTS-Optimized Lightweight Models

481

Figure 1: Illustration of the proposed approach.

an excessive consumption of resources.

Recently, Neural Architecture Search (NAS) has

emerged as a new paradigm to address this chal-

lenge by automating the design of neural architec-

tures. NAS has been widely used for computer vi-

sion applications (Kang et al., 2023). However, de-

spite its potential, scarce are the works that have ex-

plored NAS for detecting driver distraction. Recently,

(Seong et al., 2022) employed reinforcement learning

in conjunction with a weight-sharing method for real-

time recognition of driver behavior. They gathered

their own data and found that their proposed model

outperformed hand-crafted models. However, the

lack of testing on benchmark datasets makes a com-

prehensive evaluation impossible. Moreover, (Zaman

et al., 2022) integrated an enhanced faster R-CNN

with NasNet large CNN to identify driver emotions.

They also used a private dataset and compared their

model with static emotion recognition datasets. In ad-

dition, (Chen et al., 2021) fused data from multiple

sources and used NAS to generate a CNN architec-

ture that identify normal driving and distraction states.

However, the resulting CNN was large and did not

fully meet the speciﬁc requirements of its intended

use. Lastly, (Liu et al., 2023) presented a NAS-based

teacher-student model with knowledge distillation for

the same task. This study achieved a lightweight

model with 0.42 M parameters. To our knowledge,

this is the unique study that has used benchmark pub-

lic datasets and NAS for DDD, making it the most

pertinent reference to our work.

3 PROPOSED APPROACH

Considering the pivotal importance of detecting driver

distraction, it is crucial to emphasize the need for a

detection model that not only proves effective but is

also lightweight enough for practical deployment. In

this section, we describe the proposed approach to ef-

ﬁciently detect driver distraction. Figure 1 illustrates

the main steps of our approach: 1) Preprocessing, 2)

Architecture search, and 3) Model training.

First, we preprocess the input data to ensure it is

in a suitable format for our model. Therefore, we

resize the images to 32x32 pixels. We believe that

this is a crucial step that contributes to achieving a

balance between precision and efﬁciency. Moreover,

using a smaller image size during the search process

can accelerate the exploration, as it reduces computa-

tional requirements. This facilitates a more efﬁcient

and faster architecture search process, helping in the

discovery of lightweight yet effective model architec-

tures.

Second, we perform a Differentiable Architecture

Search (DARTS) (Liu et al., 2019) to look for the best

network architecture for our DDD task. We inves-

tigate DARTS as a cutting-edge technique that auto-

mates the exploration of a diverse space of neural net-

work architectures. Therefore, DARTS facilitates the

search for architectures that excel in both efﬁciency

and accuracy.

The architecture search problem is formulated as

a bi-level optimization problem. As stated in Eq. 1,

in the upper level, DARTS searches for an architec-

ture by minimizing a validation loss using gradient

descent. Simultaneously, in the lower level, the algo-

rithm ﬁne-tunes the neural network weights based on

the architecture identiﬁed in the upper-level optimiza-

tion.

min

val

∗

(α), α)

s.t. w

∗

(α) = argmin

train

(w, α) (1)

DARTS achieves computational efﬁciency by rep-

resenting the architecture search space as a directed

acyclic graph (DAG) with N nodes. Each directed

edge (i, j) in the graph is associated with a set of can-

didate operations o

(i, j)

transforming node x

( j)

∑

i< j

(i, j)

(i)

) (2)

EAA 2024 - Special Session on Emotions and Affective Agents

482

The continuous distribution of weights or proba-

bilities, modeled using the Softmax function, allows

for gradient-based optimization in the search space.

Therefore, the architecture gradient is approximated

as follows:

∇

val

(w − ξ∇

train

(w, α), α) (3)

We apply the search space deﬁned in DARTS (Liu

et al., 2019), i.e., a supermodel formed by repeatedly

stacking normal and reduction cells. Each cell is a

collection of nodes. The operations involved in darts

are max pooling 3x3, average pooling 3x3, skip con-

nect, separable convolution 3x3 and 5x5, and dilated

convolution 3x3 and 5x5.

Third, following the architecture search, we

meticulously train the resulting models on two dis-

tinct benchmark datasets. This approach allows us to

analyze the generalization capabilities of the architec-

tures and provide further insights into the proposed

methodology. Finally, we rigorously evaluate the per-

formance of our model on the testing dataset.

4 EXPERIMENTS AND RESULTS

4.1 Distracted Drivers Datasets

We evaluate our approach on two benchmark datasets

illustrated in Figure 2, namely, the State Farm Dis-

tracted Driver Dataset (SFD) (Anna Montoya, 2016)

and the American University in Cairo Distracted

Driver Dataset (AUC) (Eraqi et al., 2019).

Figure 2: Sample images from SFD and AUC.

State Farm Distracted Driver Dataset (SFD). is

made up of 22,424 images, all of which were taken

from video footage recorded with cameras positioned

on a car’s dashboard. Each image in the collection is

tagged with the speciﬁc activity the driver is engaged

in at the time the image was captured. These activi-

ties include safe driving (0), texting on the right (1),

talking on the phone-right (2), texting-left (3), talking

on the phone-left (4), operating the radio (5), drink-

ing (6), reaching behind (7), hair and makeup (8), and

talking with a passenger (9). This dataset has been ex-

tensively used in research and has contributed to the

creation of a variety of models for detecting driver

distraction. We split the dataset into three sets : 60%

for training, 10% for validation, 30% for testing.

American University in Cairo Distracted Driver

Dataset (AUC). is made up of video footage that

captures drivers engaging in various activities. The

videos were recorded from two distinct perspectives

and each video is approximately 10 minutes in dura-

tion. The dataset encompasses a total of 44 partic-

ipants, with 29 males and 15 females, and includes

over 17000 frames. The images are categorized into

the following classes: safe driving (0), texting left (1),

talking on the phone-left (2), texting right (3), talking

on the phone-right (4), adjust the radio (5), drinking

(6), reaching behind (7), hair and makeup (8), talking

to passenger (9). The dataset is already split into train

and test sets by the original authors. In addition, we

use 10% of the training data to perform validation.

4.2 Experimental Setups

Hyperparameters and Preprocessing: we em-

ployed the Cosine Annealing scheduler to dynami-

cally modify the learning rate with a lower limit of 1e-

3. The initial learning rate was ﬁxed to 0.025. Cross-

validation and early stopping, with a patience of 10

iterations, were also incorporated during the training

phase of the ﬁnal architecture. The number of epochs

varied, as the requirements for training and searching

differed for each dataset. Speciﬁcally, training was

conducted over a maximum of 60 epochs, each con-

sisting of 1900 steps, while the search process lasted

for 3 epochs of 1700 iterations each. We used the

corss-entropy loss and SGD optimizer.

Architecture Search: to automatically ﬁnd the

best-performing architecture, we conduct a differen-

tiable architecture search (DARTS) on both SFD and

AUC datasets. With regards to our speciﬁc task, i.e.,

DDD, we carefully initialize our supermodel with a

width of 8 and 8 stacked cells. The model’s com-

plexity and the search cost are signiﬁcantly impacted

by two primary hyperparameters: the width and the

number of channels. The width, which refers to the

number of neurons in a layer, and the number of chan-

nels, indicating the depth of the feature maps, are both

experimentally set to eight in our supermodel. As a

Towards Efﬁcient Driver Distraction Detection with DARTS-Optimized Lightweight Models

483

Figure 3: Cells structures of ﬁnal architecture on SFD.

Figure 4: Cells structures of ﬁnal architecture on AUC.

(a) (b)

Figure 5: Training performances of both architectures on SFD (left column) and AUC (right column).

result, our supermodel contains approximately 0.96

million parameters.

We then run the search for a total of 3 epochs with

1766 steps per epoch. Figure 3 illustrates the cell

structures of the ﬁnal architectures. We notice that,

on SFD, the cells mostly contain separable convolu-

tion operations as well as a few dilated convolutions

and maxpooling. Similar operation choices are ob-

served on AUC dataset in Figure 4. This consistency

across different datasets may indicate that the discov-

ered architecture is potentially generalizable and not

overﬁtting to a speciﬁc dataset.

4.3 Model Training and Classiﬁcation

Results

Following the architecture search, we then train the

resulting architectures on both SFD and AUC sepa-

rately. We evaluate the classiﬁcation performances of

the models through various metrics including : 1) val-

idation loss and accuracy, 2) test accuracy, 3) preci-

EAA 2024 - Special Session on Emotions and Affective Agents

484

Table 1: Achieved Recall, Precision, and F1 score for each class of SFD / AUC.

Precision Recall F1-score

Driver Activity SFD AUC SFD AUC SFD AUC

Safe Driving 0.98 0.93 0.99 0.91 0.99 0.92

Texting - Right 0.98 0.94 1.00 0.94 0.99 0.94

Talking on the phone - Right 0.98 0.95 1.00 0.94 0.99 0.95

Texting - Left 1.00 0.91 1.00 0.95 1.00 0.93

Talking on the phone - Left 1.00 0.96 0.99 0.94 1.00 0.95

Operating the radio 1.00 0.96 0.99 0.93 0.99 0.94

Drinking 1.00 0.92 0.99 0.94 0.99 0.93

Reaching behind 1.00 0.91 1.00 0.92 1.00 0.91

Hair and makeup 0.99 0.94 0.97 0.92 0.98 0.93

Talking to passenger 1.00 0.93 0.97 0.95 0.98 0.94

(a) (b)

Figure 6: Confusion matrices of both architectures on : (a) SFD and (b) AUC.

sion, 4) recall and, 5) F1-score. In addition, we pro-

vide further insights on the models’ efﬁciency by re-

porting the number of parameters, inference time and

search cost in Table 3.

4.3.1 Performance Evaluation

The learning curves (accuracy and loss) on each

dataset are illustrated in Fig. 5 where we observe a

faster training on SFD, taking nearly half of the time

required for AUC. This can be attributed to the chal-

lenging nature of AUC, primarily due to imbalanced

data distribution.

Table 2: Classiﬁcation results.

Dataset L

val

↓ Acc

val

(%) ↑ Acc

test

(%) ↑

SFD 0.03 99.11 99.08

AUC 0.25 93.53 93.23

For the same underlying reasons, the classiﬁcation

results, reported in Table 2, demonstrate increased ac-

curacy on SFD. Indeed, we notice a test accuracy of

99.08% for SFD and 91.85% for AUC. Moreover, Fig.

6a shows the confusion matrix of each dataset. We

notice that only a few images from SFD are wrongly

classiﬁed. For instance, six images of ”hair and

makeup” were classiﬁed as ”safe driving” likely due

to the similarity between these classes. This misclas-

siﬁcation may be attributed to the subtle similarity in

the head orientation. Hair and makeup involves mov-

ing hands while the posture of the head may remain

the same, i.e., focused on the road, which can be con-

sidered as safe. Similarly, on AUC, up to 32 images

from ”safe driving” were misclassiﬁed as talking to

passenger. Table 1 further emphasize these results

and presents precision, recall and f1-scores. The ta-

ble clearly indicates a lower rate of false positives on

SFD as well as a better consistency across metrics.

4.3.2 Efﬁciency Evaluation

We also evaluate the efﬁciency of the resulting mod-

els in terms of number of parameters, inference time

and search cost. These metrics are of high importance

considering the resource limited environment of DAS.

A model with fewer parameters is generally more ef-

ﬁcient and easier to implement, making it a crucial

characteristic for driver distraction applications. Sim-

Towards Efﬁcient Driver Distraction Detection with DARTS-Optimized Lightweight Models

485

ilarly, low inference time, i.e., fast response, is a de-

sirable characteristic in such an environment. The re-

sults are reported in Table 3. It is to note that we im-

plemented our approach using a GPU Nvidia Tesla

A100 32G.

Table 3: Model computational metrics.

Dataset Search cost ↓ Params ↓ Inference ↓

SFD 1h00 0.10 M 9 ms

AUC 1h30 0.14 M 10 ms

Firstly, we notice that the search process is re-

markably efﬁcient, completing within a notably brief

timeframe of only 55min to 1.5 hours. Secondly, our

resulting models are extremely lightweight with 0.10

M and 0.14 M parameters for SFD and AUC respec-

tively. Furthermore, the inference time on the GPU

is impressively fast with 9 (ms) on SFD and 10 (ms)

on AUC. Similar inference is expected on in-car plat-

forms as studies have shown that lightweight archi-

tectures usually perform equal or better on CPUs than

GPUs (Li et al., 2023).

4.3.3 Comparison with State-of-the-Art

In evaluating our approach for DDD, we bench-

marked against existing state-of-the-art methods. As

a reminder, most of the studies using benchmark

datasets present hand-crafted approaches. Only a sin-

gle work by (Liu et al., 2023) uses non-differentiable

NAS. Notably, our work stands as the sole contributor

exploring DARTS in the context of DDD. We report

state-of-the-art on SFD in Table 4 and on AUC in Ta-

ble 5.

Table 4: Comparison with state of the art on SFD.

Work Acc (%) Params (M)

Hand-crafted

(Dhakate and Dash, 2020) 92.90 25.60

(Baheti et al., 2020) 99.75 2.20

(Qin et al., 2021) 99.82 0.76

(Hossain et al., 2022) 98.12 3.50

(Wang et al., 2022b) 99.88 2.78

(Wang et al., 2022a) 99.91 9.02

(Wang and Wu, 2023) 96.82 23.67

(Mittal and Verma, 2023) 99.50 8.50

NAS

(Liu et al., 2023) 99.87 0.42

Ours (2023) 99.08 0.10

Table 5: Comparison with state of the art on AUC.

Work Acc (%) Params (M)

Hand-crafted

(Eraqi et al., 2019) 94.29 62.00

(Ai et al., 2019) 87.74 140

(Baheti et al., 2020) 95.24 2.20

(Qin et al., 2021) 95.64 0.76

(Mittal and Verma, 2023) 95.59 8.50

NAS

(Liu et al., 2023) 96.78 0.42

Ours (2023) 93.23 0.14

We ﬁrst assess hand-crafted methodologies on

SFD where traditional CNNs such as VGG16

(Dhakate and Dash, 2020) and Capsule Networks

(Mittal and Verma, 2023) are employed. Our ap-

proach achieves a comparable accuracy of 99.08%

while having x7 times fewer parameters than the

best performing study by (Qin et al., 2021) in terms

of efﬁciency. Additionally, the comparison includes

well-established architectures like MobileNet (Hos-

sain et al., 2022) which is surpassed by our model

both in terms of accuracy and efﬁciency. Similarly,

despite the challenges posed by the AUC dataset, our

model showcases competitive performance, achieving

an accuracy of 93.23% while reducing the number of

parameters to 0.14 M.

In the NAS category, our approach establishes its

efﬁciency by achieving approximately x3 fewer pa-

rameters than the model reported by (Liu et al., 2023).

This emphasizes not only the accuracy but also the

computational efﬁciency of our method in compari-

son to the NAS counterpart.

Overall, our approach, guided by the innovative

application of DARTS, not only outperforms some

hand-crafted methodologies but also demonstrates ef-

ﬁciency gains, thereby contributing signiﬁcantly to

the evolving landscape of DDD methodologies.

5 CONCLUSIONS

In this paper, we proposed an efﬁcient Driver Distrac-

tion Detection with DARTS-optimized lightweight

models. We perform for the ﬁrst time Differentiable

Architecture Searches to automatically ﬁnd accurate

yet efﬁcient models for a real-world application. We

discuss the challenges of hand-designed models and

the motivation behind NAS. We demonstrate that the

obtained models are extremely lightweight with high

classiﬁcation performance compared to the state-of-

the-art. The efﬁciency gains, evidenced by the re-

duction in the number of parameters by almost three-

fold compared to state-of-the-art models, further em-

EAA 2024 - Special Session on Emotions and Affective Agents

486

phasize the practical viability of our approach. Fu-

ture works include a broader investigation of multiple

hardware as well as a more efﬁcient search strategy.

ACKNOWLEDGEMENTS

This work was supported by the CHIST-ERA grant

SAMBAS (CHIST-ERA20-SICT-003), with funding

from FWO, ANR, NKFIH, and UKRI.

REFERENCES

Ai, Y., Xia, J., She, K., and Long, Q. (2019). Double at-

tention convolutional neural network for driver action

recognition. In (EITCE), pages 1515–1519.

Anna Montoya, Dan Holman, S. T. S. W. K. (2016). State

farm distracted driver detection.

Baheti, B., Talbar, S., and Gajre, S. (2020). Towards com-

putationally efﬁcient and realtime distracted driver de-

tection with mobilevgg network. IEEE Transactions

on Intelligent Vehicles, 5(4):565–574.

Chen, J., Jiang, Y., Huang, Z., Guo, X., Wu, B., Sun, L.,

and Wu, T. (2021). Fine-grained detection of driver

distraction based on neural architecture search. IEEE

Transactions on Intelligent Transportation Systems,

22(9):5783–5801.

Chen, J., Li, H., Han, L., Wu, J., Azam, A., and Zhang,

Z. (2022). Driver vigilance detection for high-speed

rail using fusion of multiple physiological signals and

deep learning. Applied Soft Computing, 123:108982.

Das, K., Papakostas, M., Riani, K., Gasiorowski, A.,

Abouelenien, M., Burzo, M., and Mihalcea, R. (2022).

Detection and recognition of driver distraction using

multimodal signals. ACM Transactions on Interactive

Intelligent Systems, 12(4):1–28.

Dhakate, K. R. and Dash, R. (2020). Distracted driver de-

tection using stacking ensemble. In (SCEECS), pages

1–5. IEEE.

Dolezalek, E., Farnan, M., and Min, C.-H. (2021). Phys-

iological signal monitoring system to analyze driver

attentiveness. In (MWSCAS), pages 635–638. IEEE.

Eraqi, H. M., Abouelnaga, Y., Saad, M. H., and Moustafa,

M. N. (2019). Driver distraction identiﬁcation with an

ensemble of convolutional neural networks. Journal

of Advanced Transportation, 2019.

Fern

andez, A., Usamentiaga, R., Car

us, J. L., and Casado,

R. (2016). Driver distraction using visual-based sen-

sors and algorithms. Sensors, 16(11):1805.

for Statistics, N. C. and Analysis (2023). Distracted driving

in 2021. Research Note DOT HS 813 443, National

Highway Trafﬁc Safety Administration.

Hossain, M. U., Rahman, M. A., Islam, M. M., Akhter, A.,

Uddin, M. A., and Paul, B. K. (2022). Automatic

driver distraction detection using deep convolutional

neural networks. Intelligent Systems with Applica-

tions, 14:200075.

Huang, T. and Fu, R. (2022). Driver distraction detection

based on the true driver’s focus of attention. IEEE

Transactions on Intelligent Transportation Systems,

23(10):19374–19386.

Kang, J.-S., Kang, J., Kim, J.-J., Jeon, K.-W., Chung,

H.-J., and Park, B.-H. (2023). Neural architecture

search survey: A computer vision perspective. Sen-

sors, 23(3):1713.

Koay, H. V., Chuah, J. H., Chow, C.-O., and Chang, Y.-L.

(2022). Detecting and recognizing driver distraction

through various data modality using machine learn-

ing: A review, recent advances, simpliﬁed framework

and open challenges (2014–2021). Engineering Ap-

plications of Artiﬁcial Intelligence, 115:105309.

Lee, J. D. (2005). Driving safety. Reviews of human factors

and ergonomics, 1(1):172–218.

Li, H., Wang, Z., Yue, X., Wang, W., Tomiyama, H., and

Meng, L. (2023). An architecture-level analysis on

deep learning models for low-impact computations.

Artiﬁcial Intelligence Review, 56(3):1971–2010.

Li, W., Huang, J., Xie, G., Karray, F., and Li, R. (2021).

A survey on vision-based driver distraction analysis.

Journal of Systems Architecture, 121:102319.

Liu, D., Yamasaki, T., Wang, Y., Mase, K., and Kato,

J. (2023). Toward extremely lightweight distracted

driver recognition with distillation-based neural archi-

tecture search and knowledge transfer. IEEE Transac-

tions on Intelligent Transportation Systems.

Liu, H., Simonyan, K., and Yang, Y. (2019). Darts: Differ-

entiable architecture search.

McCall, J. and Trivedi, M. M. (2006). Driver monitoring for

a human-centered driver assistance system. In Pro-

ceedings of the 1st ACM international workshop on

Human-centered multimedia, pages 115–122.

Mittal, H. and Verma, B. (2023). Cat-capsnet: A convo-

lutional and attention based capsule network to detect

the driver’s distraction. IEEE Transactions on Intelli-

gent Transportation Systems.

Nareshkumar, R., Suseela, G., Nimala, K., and Niranjana,

G. (2023). Feasibility and necessity of affective com-

puting in emotion sensing of drivers for improved

road safety. In Principles and Applications of Socio-

Cognitive and Affective Computing, pages 94–115.

IGI Global.

Nidamanuri, J., Mukherjee, P., Assfalg, R., and Venkatara-

man, H. (2022). Dual-v-sense-net (dvn): Multisensor

recommendation engine for distraction analysis and

chaotic driving conditions. IEEE Sensors Journal,

22(15):15353–15364.

Qin, B., Qian, J., Xin, Y., Liu, B., and Dong, Y. (2021).

Distracted driver detection based on a cnn with de-

creasing ﬁlter size. IEEE Transactions on Intelligent

Transportation Systems, 23(7):6922–6933.

Seong, J., Lee, C., and Han, D. S. (2022). Neural architec-

ture search for real-time driver behavior recognition.

In (ICAIIC), pages 104–108. IEEE.

Shahverdy, M., Fathy, M., Berangi, R., and Sabokrou, M.

(2020). Driver behavior detection and classiﬁcation

using deep convolutional neural networks. Expert Sys-

tems with Applications, 149:113240.

Towards Efﬁcient Driver Distraction Detection with DARTS-Optimized Lightweight Models

487

Wang, H., Chen, J., Huang, Z., Li, B., Lv, J., Xi, J., Wu, B.,

Zhang, J., and Wu, Z. (2022a). Fpt: ﬁne-grained de-

tection of driver distraction based on the feature pyra-

mid vision transformer. IEEE transactions on intelli-

gent transportation systems, 24(2):1594–1608.

Wang, J. and Wu, Z. (2023). Driver distraction detection

via multi-scale domain adaptation network. IET Intel-

ligent Transport Systems.

Wang, J., Wu, Z., et al. (2022b). Model lightweighting for

real-time distraction detection on resource-limited de-

vices. Computational Intelligence and Neuroscience,

2022.

Xing, Y., Lv, C., Cao, D., and Hang, P. (2021). To-

ward human-vehicle collaboration: Review and per-

spectives on human-centered collaborative automated

driving. Transportation research part C: emerging

technologies, 128:103199.

Zaman, K., Sun, Z., Shah, S. M., Shoaib, M., Pei, L.,

and Hussain, A. (2022). Driver emotions recognition

based on improved faster r-cnn and neural architec-

tural search network. Symmetry, 14(4):687.

Zeng, X., Wang, F., Wang, B., Wu, C., Liu, K. J. R., and Au,

O. C. (2022). In-vehicle sensing for smart cars. IEEE

Open Journal of Vehicular Technology, 3:221–242.

Zepf, S., Hernandez, J., Schmitt, A., Minker, W., and Pi-

card, R. W. (2020). Driver emotion recognition for

intelligent vehicles: A survey. ACM Computing Sur-

veys (CSUR), 53(3):1–30.

Zhu, Z., Wang, S., Gu, S., Li, Y., Li, J., Shuai, L.,

and Qi, G. (2023). Driver distraction detection

based on lightweight networks and tiny object de-

tection. Mathematical Biosciences and Engineering,

20(10):18248–18266.

EAA 2024 - Special Session on Emotions and Affective Agents

488