Eye-Based Cognitive Overload Prediction in Human-Machine

Interaction via Machine Learning

Maria Trigka

, Elias Dritsas

and Phivos Mylonas

Department of Informatics and Computer Engineering, University of West Attica, Greece

Keywords:

Human-Centered Computing, Supervised Learning, User Models, Eye Tracking.

Abstract:

Cognitive overload signiﬁcantly affects human performance in complex interaction settings, making its early

detection essential for designing adaptive systems. This study investigated whether gaze-derived features can

reliably predict overload states using supervised machine learning (ML). The analysis was conducted on an

eye-tracking dataset from a cognitively demanding visual task that incorporated ﬁxations, saccades, and pupil

diameter measurements. Five classiﬁers, namely, Logistic Regression (LR), Naive Bayes (NB), Support Vector

Machine (SVM), XGBoost (XGB), and Multilayer Perceptron (MLP), were evaluated using stratiﬁed train/test

splits and 5-fold cross-validation. XGB achieved the best performance, with an accuracy of 0.902, a precision

of 0.958, a recall of 0.821, an F1 score of 0.884, and an area under the ROC curve (AUC) of 0.956. These

ﬁndings conﬁrm that gaze-derived features alone can reliably distinguish between cognitive overload states.

The results also revealed trade-offs between simple models, which are easier to interpret but more conservative,

and complex models, such as XGB and MLP, which achieved stronger predictive performance. Future studies

should address subject-independent validation, incorporate temporal modeling of gaze dynamics, and explore

personalization and cross-task generalization to advance robust and adaptive cognitive monitoring.

1 INTRODUCTION

The ability to monitor users’ cognitive states during

task execution is increasingly essential in domains

such as human-computer interaction (HCI), educa-

tion, simulation training, and safety-critical opera-

tions. When cognitive demand surpasses an individ-

ual’s capacity, performance degradation is likely to

occur, a phenomenon referred to as cognitive over-

load. Detecting this overload in real time enables sys-

tems to adapt their complexity, pacing, or feedback,

thereby reducing user frustration and enhancing over-

all system usability (Kosch et al., 2023).

Recent advances in eye-tracking technology have

made it feasible to noninvasively capture detailed

gaze behavior, offering insights into attention, in-

formation processing, and cognitive effort. Com-

pared with physiological measures such as electroen-

cephalography (EEG) or functional near-infrared

spectroscopy (fNIRS), gaze-based features are easier

to integrate into practical environments and impose

https://orcid.org/0000-0001-7793-0407

https://orcid.org/0000-0001-5647-2929

https://orcid.org/0000-0002-6916-3129

minimal burden on users. Research has shown that

saccade patterns, ﬁxation durations, pupil dilation,

and blink rates are modulated by cognitive load, mak-

ing them useful input signals for classiﬁcation models

(Abbad-Andaloussi et al., 2022),(Gorin et al., 2024).

ML has become the predominant approach for

modeling the relationship between gaze behavior and

the cognitive state. Classical models, such as SVM

and LR, as well as more recent deep learning and en-

semble methods, have been applied to various cogni-

tive estimation tasks. However, many studies rely on

multimodal inputs or domain-speciﬁc datasets, which

limit their generalizability (Aksu et al., 2024) (Skara-

magkas et al., 2023).

Despite the growing body of literature on cog-

nitive state monitoring, a gap remains in evaluating

how well ML models can generalize cognitive over-

load detection using only gaze data. Existing studies

often involve complex sensor setups or focus on spe-

ciﬁc environments (e.g., virtual reality (VR) or driv-

ing), which limits their applicability to more general

HCI scenarios (Ghosh et al., 2023).

This study was motivated by the need to support

the design of cognitively ergonomic interfaces for

professional human-machine interactions. Predict-

Trigka, M., Dritsas, E. and Mylonas, P.

Eye-Based Cognitive Overload Prediction in Human-Machine Interaction via Machine Learning.

DOI: 10.5220/0013782800003985

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 21st International Conference on Web Information Systems and Technologies (WEBIST 2025), pages 565-572

ISBN: 978-989-758-772-6; ISSN: 2184-3252

565

ing cognitive overload from eye-tracking data enables

system designers to align interface complexity with

user capabilities better, thereby minimizing cognitive

strain and preserving interaction ﬂuency for optimal

performance. By identifying when users experience

mental overload, designers can proactively adjust the

information ﬂow and visual load, preventing frus-

tration, reducing negative emotional responses, and

maintaining effective decision-making. Such predic-

tive insights are critical for ensuring that high-demand

operational environments remain user-centered, with-

out hindering human cognition or compromising task

performance.

A supervised learning framework was adopted to

infer whether cognitive overload occurred based on

gaze-derived features, assuming a uniﬁed and efﬁ-

cient pipeline. To position this work within the cur-

rent research landscape and clarify its methodological

scope, the key contributions are as follows:

• Investigation of cognitive overload prediction us-

ing gaze-derived features (ﬁxations, saccades,

pupil dynamics) in a visually demanding interac-

tion task.

• Application of a structured preparation pipeline,

including feature standardization for model train-

ing and statistical/visual analysis to conﬁrm the

relevance of gaze-based metrics.

• Comparative evaluation of ﬁve supervised learn-

ing models (LR, NB, SVM, XGB, MLP) with

stratiﬁed validation and multiple performance

metrics.

• Demonstration that gaze-only features can reli-

ably predict overload, with XGB achieving high

accuracy and AUC without requiring multimodal

inputs.

The remainder of this paper is organized as fol-

lows. Section 2 reviews the relevant literature on

cognitive workload estimation using gaze-based fea-

tures and ML techniques. Section 3 details the pro-

posed methodology, including dataset overview, pre-

processing, feature analysis, model formulation, and

evaluation strategies. Section 4 presents and analyzes

the experimental results and provides a comparative

assessment of model performance across key metrics.

Section 5 discusses the limitations and future direc-

tions of this study. Finally, Section 6 summarizes the

main ﬁndings of this study and outlines key directions

for future research.

2 RELATED WORKS

Recent research on cognitive workload estimation has

increasingly focused on gaze-based indicators owing

to their unobtrusive nature and applicability in real-

time applications. Several studies have utilized ML

to model the relationship between eye behavior and

cognitive demand in various domains, including VR,

driving simulations, and tasks that require attention.

A foundational dataset in this area is COLET,

which captures gaze behavior under multitasking and

time pressure across multiple task conditions (Ktis-

takis et al., 2022). By training classical classiﬁers

on ﬁxation, saccade, and pupil-related features, the

authors reported classiﬁcation accuracies of nearly

88%, validating gaze signals as effective predictors of

cognitive load. To advance generalization in uncon-

strained settings, the CLERA framework was intro-

duced as a uniﬁed deep model for eye-region track-

ing and load estimation (Ding et al., 2023). It in-

tegrates keypoint localization with workload regres-

sion in a single trainable architecture and outperforms

SVM-based approaches in naturalistic environments.

In the context of immersive training, cognitive load

was modeled during VR-based disassembly tasks us-

ing ﬁxation duration and pupil dilation as inputs for

MLP classiﬁers (Nasri et al., 2024). The results indi-

cated a high F1, underscoring the discriminative value

of gaze dynamics as task complexity increased.

Multimodal approaches have also been explored

for this purpose. One study combined gaze fea-

tures with fNIRS signals and driving dynamics within

a Convolutional Neural Network (CNN) and Long

Short-Term Memory (LSTM) pipeline, achieving

near-perfect classiﬁcation performance across n-back

difﬁculty levels (Khan et al., 2024). This integration

of physiological and behavioral data demonstrates the

beneﬁts of signal fusion for robust load inference.

Gaze and pupillary data alone have proven sufﬁ-

cient in low-latency contexts. A CNN-based model

was developed to detect stimulus onset using short

windows of pupil diameter and gaze vectors across

multiple cognitive domains (Dang et al., 2024). De-

spite domain variation, the models maintained re-

liable performance, especially for attention-oriented

tasks.

Workload prediction in gamiﬁed VR environ-

ments has also been examined using a combination of

ocular and biosignals, such as heart rate and galvanic

skin response (GSR) (Szczepaniak et al., 2024). Us-

ing SVM and Random Forest (RF) models, the study

reported F1 above 0.90, with interpretability analy-

sis highlighting pupil size and blink rate as dominant

predictors. Finally, a systematic benchmark evaluated

WEBIST 2025 - 21st International Conference on Web Information Systems and Technologies

566

11 ML algorithms on gaze-derived features extracted

under dual-task and time-pressure conditions (Skara-

magkas et al., 2021). This study demonstrated that

lightweight models, such as RF, can match more com-

plex methods for both binary and multi-class cogni-

tive load classiﬁcation.

A comparative summary of the aforementioned

studies is provided in Table 1, which outlines the

core elements, including the domain, modality, fea-

ture types, and model classes. As shown in the ta-

ble, most prior work emphasizes VR or driving con-

texts and relies on multimodal instrumentation, often

including biosignals such as EEG, fNIRS, or GSR.

Although such approaches can improve robustness,

they require additional sensors, calibration, and user

compliance, which limit their scalability and every-

day applicability. In contrast, this study demonstrates

that ﬁne-grained gaze-only signals sufﬁce for reliable

overload prediction, offering a lightweight and unob-

trusive alternative particularly suited to adaptive HCI

and web-based interaction scenarios.

3 METHODOLOGY

The methodology follows a structured pipeline (Fig-

ure 1) consisting of dataset description, preprocess-

ing, model training, and evaluation.

3.1 Dataset Overview

This study utilized a dataset collected using the Gaze-

point GP3 eye-tracking system (Mannaru et al., 2017)

during simulated print conﬁguration tasks involving

complex visual interactions. The task required partic-

ipants to navigate menus, adjust parameters, and con-

ﬁrm settings under time constraints, thereby induc-

ing either a normal or an overloaded cognitive state.

The device recorded continuous streams of ﬁxations,

saccades, pupil diameter, and gaze coordinates under

ecologically valid conditions. The participants under-

went individual calibration procedures to ensure the

spatial accuracy of gaze mapping.

The dataset included recordings from nine users

supplemented with demographic and interaction-

related metadata, such as age, professional experi-

ence, and task familiarity. The features of the dataset

are summarized in Table 2. The participant demo-

graphics are reported in Table 3, which lists each

user’s age, total and platform-speciﬁc experience (in

years), and the proportion of samples per user that

were labeled as cognitively overloaded. The dataset

comprised 2,510 overload samples (43.9%) and 3,207

normal samples (56.1%).

Class labels distinguishing overload from normal

states were assigned based on gaze dynamics, speciﬁ-

cally ﬁxation duration and saccade magnitude, which

are widely recognized in the literature as behavioral

indicators of cognitive load (Rayner, 1998; Holmqvist

et al., 2011; Duchowski, 2007). In this dataset,

the thresholds were deﬁned relative to the empiri-

cal feature distributions, resulting in approximately

balanced class proportions. Although this procedure

provides consistent labeling across participants, it re-

mains heuristic and may limit generalizability.

This dataset ensures validity through standard pre-

processing, including the exclusion of missing or low-

conﬁdence gaze samples (based on system conﬁdence

scores and pupil validity ﬂags), detection and re-

moval of blinks, reconstruction of ﬁxation events, and

derivation of saccade magnitudes from inter-ﬁxation

displacement. All timestamps were aligned with the

session start time to ensure a consistent temporal ref-

erence across participants.

In the next section, the distributions and statistical

properties of the gaze-derived features are examined

to assess their relationship with cognitive overload.

3.2 Exploratory Feature Analysis

We examined gaze-derived features related to ﬁxation

duration, saccadic behavior, pupil size, and gaze dis-

tribution to explore the behavioral signatures of cog-

nitive overloading in this study. These features were

selected based on the literature and empirical variabil-

ity. Our aim was not to reduce dimensionality but to

evaluate the extent to which feature distributions dif-

fered across cognitive states in a statistically and be-

haviorally meaningful way.

Figure 2 shows the kernel density plots of the

distributions of each gaze-derived feature across the

normal and overload conditions. The ﬁxation dura-

tion and saccade magnitude demonstrated the most

distinct separation, with overload samples character-

ized by longer ﬁxation durations and reduced sac-

cade amplitudes. Pupil diameter measurements are

generally elevated under overload conditions, albeit

with a moderate overlap in distribution. Features such

as blink-constricted pupil size, pupil motion magni-

tudes, and gaze coordinates exhibited less separability

but still reﬂected subtle class-dependent shifts in the

data. These trends are consistent with ﬁndings linking

prolonged ﬁxation, reduced eye movement, and pupil

dilation to cognitive load and sustained attention.

To assess the statistical separability of fea-

tures across cognitive states, we applied the

Mann–Whitney U test to all relevant features in the

dataset. This non-parametric test evaluates whether

Eye-Based Cognitive Overload Prediction in Human-Machine Interaction via Machine Learning

567

Table 1: Comparative overview of related works on cognitive workload estimation.

Study Domain Input Modalities Features Used Model Type Labels

(Ktistakis

et al., 2022)

Visual Search Eye Tracking Fixations, Sac-

cades, Pupil,

Blinks

RF, SVM, XGB NASA Task

Load Index

(Ding et al.,

2023)

Driving (natu-

ral)

Eye region video Keypoints, Pupil,

Blinks

Deep multitask

CNN

Binary

(Nasri et al.,

2024)

VR Training Eye Tracking Pupil Dilation,

Fixation Duration

MLP, RF NASA Task

Load Index

(Binary)

(Khan et al.,

2024)

Driving Sim Eye + fNIRS +

Vehicle data

Gaze, HbO2, Ve-

hicle

Signals

CNN-LSTM N-back levels

(Dang et al.,

2024)

Multi-domain Eye Tracking Pupil, Gaze Vec-

tors

Task-speciﬁc

CNNs

Stimulus Onset

(Szczepaniak

et al., 2024)

VR Game Eye + GSR +

Heart Rate

Saccades, Pupil,

Heart Rate, Elec-

trodermal Activ-

ity

SVM, RF Perceived Load

(Skaramagkas

et al., 2021)

Visual + Dual

Task

Eye Tracking 29 gaze metrics

incl. Blink, Fixa-

tion

RF, Extra Trees NASA Task

Load Index

(3-class)

This work Visual Task Eye Tracking

only

Fixation, Sac-

cade, Pupil

LR, NB, SVM,

XGB, MLP

Cognitive

Overload

Data

pre-processing

Model Training

(LR, NB, SVM,

XGB, MLP)

Evaluation

Accuracy, Precision,

Recall, F1, AUC

Eye gaze features

user metada

Figure 1: Pipeline for eye gaze-based overload prediction.

the values in the two classes originate from distinct

distributions, without assuming a normal distribution.

Features were then grouped based on whether they ex-

hibited statistically signiﬁcant differences at the p <

0.01 level:

• Signiﬁcant Features (p < 0.01): Most gaze-

derived features showed strong evidence of dis-

tributional divergence between cognitive states.

These included FPOGD (ﬁxation duration),

SAC MAG, LPD, RPD (left/right pupil diameter),

LPMM, RPMM, BKDUR, BKPMIN, and gaze

position CX and CY. Other signiﬁcant features

are AGE, TOT EXP, EXP PLAT, TIME, CNT,

FPOGS, FPOGID, BKID, and CS. These results

are consistent with the literature linking these fea-

tures to visual attention and the cognitive load.

The statistical signiﬁcance of these variables sup-

ports their inclusion in subsequent analyses and

model development.

• Non-Signiﬁcant Features (p ≥ 0.01): A small

number of features did not show statistically sig-

niﬁcant differences. These include i) eye-speciﬁc

gaze coordinates LPCX, LPCY, RPCX, RPCY,

and ii) pupil validity ﬂags LPV, RPV. The lim-

ited separability of these features is likely due to

their dependence on external factors, such as dis-

play layout or signal quality, rather than the inter-

nal cognitive state. While retained for modeling

purposes, these features were excluded from in-

terpretative and visual analyses because of their

minimal relevance to behavior analysis.

Figure 3 presents the exploratory correlations

between the workload proportion and the selected

participant-level variables. A weak negative asso-

ciation was observed between overload status and

both total and platform-speciﬁc experience, suggest-

ing that greater familiarity with the task environment

may reduce cognitive strain. In contrast, pupil diame-

ter and ﬁxation duration tended to increase with work-

load, which is consistent with the established psy-

chophysiological markers of increased mental effort.

Saccade magnitude showed an inverse trend, indicat-

ing more localized gaze behavior under higher cogni-

tive load.

WEBIST 2025 - 21st International Conference on Web Information Systems and Technologies

568

Table 2: Structured summary of extracted features from eye-tracking data.

Feature(s) Type Description

Participant Metadata

UID Nominal Participant identiﬁers for grouping or stratiﬁed sampling, not predictive.

AGE Numeric Age of the participants in years.

TOT EXP Numeric Total professional experience; reﬂects overall expertise.

EXP PLAT Numeric Experience speciﬁc to the simulated platform.

Fixation and Saccade Features

CNT Numeric Frame/sample index; useful for computing ﬁxation order or timing.

TIME Numeric Time elapsed since the session start; used for temporal analysis.

FPOGID Nominal Identiﬁer for each ﬁxation event.

FPOGS Numeric Fixation onset time, marking the start of ﬁxation.

FPOGD Numeric Fixation duration (ms): key indicator of cognitive effort.

SAC MAG Numeric Saccade magnitude: amplitude of movement between ﬁxations.

SAC DIR Nominal Saccade direction used in visual-scanning analysis.

Pupil Metrics and Motion

LPD, RPD Numeric Left and right pupil diameters.

LPV, RPV Nominal Validity ﬂags for pupil diameter measurements.

LPMM, RPMM Numeric Eye motion magnitude: may reﬂect fatigue or stress.

LPMMV, RPMMV Numeric Pupil motion velocity: complementary to LPMM, RPMM.

Blink Features

BKID Numeric Blink ID grouping samples during the same blink.

BKDUR Numeric Blink duration (ms).

BKPMIN Numeric Minimum pupil diameter recorded during a blink.

Gaze Coordinates and Conﬁdence

CX, CY Numeric Central gaze coordinates on screen.

CS Numeric System-provided conﬁdence score for gaze sample validity.

LPCX, LPCY Numeric Left eye gaze X/Y screen coordinates.

RPCX, RPCY Numeric Right eye gaze X/Y screen coordinates.

BPOGX, BPOGY Numeric Raw base point of gaze coordinates; system-derived, not used directly.

Table 3: Participant demographics and overload proportion.

UID Age TotExp PlatExp Overload

1 45 21.0 15.00 0.44

2 46 20.0 14.00 0.37

3 24 0.0 0.67 0.45

4 32 10.0 0.00 0.39

5 45 22.0 15.00 0.41

6 21 0.0 0.67 0.42

7 27 0.2 1.00 0.46

8 59 30.0 16.00 0.09

9 42 18.0 12.00 0.26

3.3 Machine Learning Models

Let x ∈ R

denote the feature vector, concatenating

ﬁxation, saccadic, and pupil metrics with participant

metadata, and let y ∈ {0,1} represent the binary cog-

nitive state label. Five supervised classiﬁers were em-

ployed for overload detection: LR, SVM, NB, XGB,

and MLP. This conﬁguration yields a compact repre-

sentation of gaze dynamics and user traits, enabling

comparative evaluation across diverse inductive bi-

ases without requiring multimodal inputs.

LR (Das, 2024) provides a linear and interpretable

baseline by mapping features to class probabilities.

SVMs (Pisner and Schnyer, 2020) identify the deci-

sion boundary that maximizes the separation between

classes and can capture nonlinear relationships using

kernel functions. NB (Chen et al., 2020b) offers a

probabilistic approach based on conditional indepen-

dence assumptions and remains efﬁcient even when

these assumptions are violated moderately. XGB

(Chen et al., 2020a) is an ensemble method that se-

quentially combines decision trees, achieving strong

predictive accuracy while sacriﬁcing interpretability.

Finally, the MLP (Cinar, 2020) represents a neu-

ral network–based model, where layered nonlinear

transformations enable the learning of complex in-

put–output mappings.

3.4 Pre-Processing, Training &

Evaluation

All numerical features were standardized using z-

score normalization to ensure optimization stability in

gradient-based models. To minimize sampling bias,

stratiﬁed 5-fold cross-validation was applied within

an initial 80/20 training/testing split, which preserved

class balance across the folds. The hyperparameters

were tuned using the training folds, and the ﬁnal mod-

Eye-Based Cognitive Overload Prediction in Human-Machine Interaction via Machine Learning

569

Figure 2: Distributions of 9 gaze-derived features across cognitive states (Normal vs. Overload), capturing ﬁxation, saccade,

pupil, and spatial attention dynamics.

Figure 3: Participant-level correlations between overload proportion and gaze-derived features.

els were retrained on the full training subset before

being evaluated on a held-out test set.

Each algorithm was trained with standard con-

ﬁgurations: LR using the L-BFGS optimizer with

ℓ

-regularization, SVM with RBF kernel (default

regularization parameter C = 1.0 and γ = scale),

NB with closed-form Gaussian estimates, XGB with

100 boosted trees (learning rate of 0.1), and MLP

with a single hidden layer (100 units, ReLU activa-

tions, and Adam optimizer). Performance was as-

sessed through accuracy, precision, recall, F1 score,

and AUC, deﬁned in terms of the confusion matrix

(T P,T N,FP, FN) as

• Accuracy =

T P+T N

T P+T N+FP+FN

• Precision =

T P

T P+FP

, Recall =

T P

T P+FN

• F1 = 2 ·

Precision·Recall

Precision+Recall

The AUC was used to quantify threshold-independent

discrimination. The models were implemented in

WEBIST 2025 - 21st International Conference on Web Information Systems and Technologies

570

scikit-learn 1.3.0 and XGB 1.7.6, executed on

Ubuntu 22.04 with an Intel i7 CPU and 32GB RAM.

4 RESULTS ANALYSIS

The ﬁve ML models were evaluated on a stratiﬁed

20% test set using accuracy, precision, recall, F1, and

AUC. These metrics capture both overall performance

and sensitivity to cognitive overload. Table 4 summa-

rizes the performance of all evaluated models. XGB

consistently outperformed the other models across all

metrics, achieving the highest accuracy (0.902), F1

(0.884), and AUC (0.956), highlighting its ability to

effectively model non-linear dependencies. The MLP

also yielded strong results, particularly in terms of

recall and F1, indicating its capacity to learn com-

plex interactions among gaze-based features. LR and

SVM demonstrated comparable but more conserva-

tive behavior, with high precision but lower recall,

while NB trailed slightly due to its simplifying inde-

pendence assumptions.

These results illustrate the value of using comple-

mentary metrics beyond accuracy, as F1 and AUC

capture the trade-offs between precision and recall

under class imbalance and subtle cognitive effects.

Such trade-offs are particularly important for deploy-

ment, where factors such as interpretability, respon-

siveness, and tolerance to false negatives must be con-

sidered. XGB was the most reliable model, consis-

tently achieving the best scores for all metrics. The

MLP also performed strongly, particularly in terms of

recall and F1 scores, reﬂecting its ability to capture

complex feature interactions. LR and SVM showed

higher precision but lower recall, indicating a more

conservative classiﬁcation behavior, while NB lagged

behind due to its simplifying independence assump-

tions.

Table 4: Experimental results on the test set.

Model Accuracy Precision Recall F1 AUC

XGB 0.902 0.958 0.821 0.884 0.956

MLP 0.870 0.925 0.765 0.837 0.918

LR 0.851 0.910 0.723 0.805 0.894

SVM 0.846 0.902 0.714 0.797 0.889

NB 0.828 0.872 0.690 0.770 0.871

5 DISCUSSION

The results demonstrate that gaze-derived features

alone can provide meaningful indicators of cogni-

tive overload, supporting lightweight and noninva-

sive monitoring for adaptive human–machine inter-

actions. By relying only on ﬁxations, saccades, and

pupil dynamics, this study complements prior work

that depends on multimodal or domain-speciﬁc data

and shows that robust detection is feasible without in-

trusive instrumentation.

This study had some limitations. Although the

dataset contains thousands of samples, they origi-

nate from only nine participants. This yields sufﬁ-

cient sample-level power but restricts generalizability

because the models may capture user-speciﬁc traits

rather than universal markers of overload. Moreover,

training and testing on the same participants increased

the risk of overﬁtting because cross-subject validation

was not performed. Although the dataset was approx-

imately balanced between the overload and normal

samples, the conﬁdence intervals could not be reli-

ably estimated because of the limited number of par-

ticipants. This underscores the need for larger and

more diverse cohorts in future studies to enable robust

statistical inferences. Class labels were also derived

heuristically from ﬁxation duration and saccade mag-

nitude distributions. While grounded in established

cognitive science, such labeling remains an indirect

proxy of cognitive state and may limit construct va-

lidity. Therefore, future studies should validate this

approach on datasets in which independent measures,

such as task performance and multimodal markers,

deﬁne the labels.

The present analysis relied on feature-based clas-

siﬁers that ignored gaze sequential dynamics. Incor-

porating temporal models, such as recurrent or trans-

former architectures, can exploit this structure. Er-

ror analysis through confusion matrices would further

clarify systematic misclassiﬁcations, whereas bench-

marking against multimodal baselines would help

quantify the trade-offs of unimodal gaze input. As-

sessing latency and computational costs is also neces-

sary to establish the feasibility of real-time and adap-

tive interface systems. In addition, calibration analy-

sis, which evaluates how well the predicted probabili-

ties reﬂect the actual outcomes, would strengthen de-

ployment readiness in settings where system actions

depend on conﬁdence thresholds. Overall, these lim-

itations open concrete directions for advancing gaze-

based cognitive state modeling in more realistic set-

tings.

6 CONCLUSIONS

This study examined cognitive overload detection us-

ing only gaze-derived features and applied ﬁve ML

models to data from a cognitively demanding visual

task. Among them, XGB delivered the best per-

Eye-Based Cognitive Overload Prediction in Human-Machine Interaction via Machine Learning

571

formance, achieving an accuracy of 0.902, precision

of 0.958, recall of 0.821, F1 of 0.884, and AUC of

0.956. These results demonstrate that eye-based met-

rics, including ﬁxations, saccades, and pupil diam-

eter, are sufﬁcient for reliable binary classiﬁcation,

thereby eliminating the need for multimodal inputs.

Beyond predictive performance, the ﬁndings high-

light the feasibility of deploying lightweight gaze-

based models in real-time HCI systems. Unlike mul-

timodal approaches, this method offers a focused and

interpretable solution based solely on ocular behavior.

As an exploratory study based on nine participants,

the ﬁndings provide initial evidence of the discrimi-

native power of gaze-only features. Larger and more

diverse datasets are necessary to conﬁrm the gener-

alizability and establish statistical reliability. Future

work should also extend the evaluation to subject-

independent scenarios, incorporate temporal model-

ing of gaze dynamics, and explore personalization

and cross-task generalization to advance robust and

adaptive cognitive monitoring systems.

REFERENCES

Abbad-Andaloussi, A., Sorg, T., and Weber, B. (2022). Es-

timating developers’ cognitive load at a ﬁne-grained

level using eye-tracking measures. In Proceedings of

the 30th IEEE/ACM international conference on pro-

gram comprehension, pages 111–121.

Aksu, S¸. H., C¸ akıt, E., and Da

gdeviren, M. (2024). Men-

tal workload assessment using machine learning tech-

niques based on eeg and eye tracking data. Applied

Sciences, 14(6):2282.

Chen, J., Zhao, F., Sun, Y., and Yin, Y. (2020a). Improved

xgboost model based on genetic algorithm. Interna-

tional Journal of Computer Applications in Technol-

ogy, 62(3):240–245.

Chen, S., Webb, G. I., Liu, L., and Ma, X. (2020b). A

novel selective na

ıve bayes algorithm. Knowledge-

Based Systems, 192:105361.

Cinar, A. C. (2020). Training feed-forward multi-layer per-

ceptron artiﬁcial neural networks with a tree-seed al-

gorithm. Arabian Journal for Science and Engineer-

ing, 45(12):10915–10938.

Dang, Q., Kucukosmanoglu, M., Anoruo, M., Kargosha, G.,

Conklin, S., and Brooks, J. (2024). Auto detecting

cognitive events using machine learning on pupillary

data. arXiv preprint arXiv:2410.14174.

Das, A. (2024). Logistic regression. In Encyclopedia of

quality of life and well-being research, pages 3985–

3986. Springer.

Ding, L., Terwilliger, J., Parab, A., Wang, M., Fridman, L.,

Mehler, B., and Reimer, B. (2023). Clera: a uniﬁed

model for joint cognitive load and eye region analysis

in the wild. ACM Transactions on Computer-Human

Interaction, 30(6):1–23.

Duchowski, A. T. (2007). Eye tracking methodology: The-

ory and practice. Springer Science & Business Media.

Ghosh, S., Dhall, A., Hayat, M., Knibbe, J., and Ji, Q.

(2023). Automatic gaze analysis: A survey of deep

learning based approaches. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 46(1):61–84.

Gorin, H., Patel, J., Qiu, Q., Merians, A., Adamovich,

S., and Fluet, G. (2024). A review of the use of

gaze and pupil metrics to assess mental workload in

gamiﬁed and simulated sensorimotor tasks. Sensors,

24(6):1759.

Holmqvist, K., Nystr

om, M., et al. (2011). Eye tracking: A

comprehensive guide to methods and measures. Ox-

ford University Press.

Khan, M. A., Asadi, H., Qazani, M. R. C., Lim, C. P., and

Nahavandi, S. (2024). Functional near-infrared spec-

troscopy (fnirs) and eye tracking for cognitive load

classiﬁcation in a driving simulator using deep learn-

ing. arXiv preprint arXiv:2408.06349.

Kosch, T., Karolus, J., Zagermann, J., Reiterer, H., Schmidt,

A., and Wo

zniak, P. W. (2023). A survey on mea-

suring cognitive workload in human-computer inter-

action. ACM Computing Surveys, 55(13s):1–39.

Ktistakis, E., Skaramagkas, V., Manousos, D., Tachos,

N. S., Tripoliti, E., Fotiadis, D. I., and Tsiknakis, M.

(2022). Colet: A dataset for cognitive workload esti-

mation based on eye-tracking. Computer Methods and

Programs in Biomedicine, 224:106989.

Mannaru, P., Balasingam, B., Pattipati, K., Sibley, C., and

Coyne, J. T. (2017). Performance evaluation of the

gazepoint gp3 eye tracking device based on pupil di-

lation. In Augmented Cognition. Neurocognition and

Machine Learning: 11th International Conference,

AC 2017, Held as Part of HCI International 2017,

Vancouver, BC, Canada, July 9-14, 2017, Proceed-

ings, Part I 11, pages 166–175. Springer.

Nasri, M., Kosa, M., Chukoskie, L., Moghaddam, M., and

Harteveld, C. (2024). Exploring eye tracking to de-

tect cognitive load in complex virtual reality training.

In 2024 IEEE International Symposium on Mixed and

Augmented Reality Adjunct (ISMAR-Adjunct), pages

51–54. IEEE.

Pisner, D. A. and Schnyer, D. M. (2020). Support vector

machine. In Machine learning, pages 101–121. Else-

vier.

Rayner, K. (1998). Eye movements in reading and informa-

tion processing: 20 years of research. Psychological

bulletin, 124(3):372–422.

Skaramagkas, V., Ktistakis, E., Manousos, D., Kazantzaki,

E., Tachos, N. S., Tripoliti, E., Fotiadis, D. I., and

Tsiknakis, M. (2023). esee-d: Emotional state esti-

mation based on eye-tracking dataset. Brain Sciences,

13(4):589.

Skaramagkas, V., Ktistakis, E., Manousos, D., Tachos,

N. S., Kazantzaki, E., Tripoliti, E. E., Fotiadis, D. I.,

and Tsiknakis, M. (2021). Cognitive workload level

estimation based on eye tracking: A machine learning

approach. In 2021 IEEE 21st International Confer-

ence on Bioinformatics and Bioengineering (BIBE),

pages 1–5. IEEE.

Szczepaniak, D., Harvey, M., and Deligianni, F. (2024).

Predictive modelling of cognitive workload in vr: An

eye-tracking approach. In Proceedings of the 2024

Symposium on Eye Tracking Research and Applica-

tions, pages 1–3.

WEBIST 2025 - 21st International Conference on Web Information Systems and Technologies

572