Research on Multi-Modal Interaction in Aging Friendly Smart Home
to Compensate for Sensory Decline of Elderly Users
Ziyi Wang
a
College of Artificial Intelligence, Hebei Vocational University of Industry and Technology, Shijiazhuang, Hebei, China
Keywords: Suitable for Aging, Smart Home, Single Mode, Multimodal.
Abstract: As the global population ages, home-based elderly care has become the mainstream choice, and the demand
for and dependence on aging-friendly smart home technology among elderly users is also gradually increasing.
However, Traditional unimodal interaction technology faces challenges such as insufficient sensory
adaptationcomplex operation and potential safety hazards.so it is difficult to meet the diverse needs of
elderly users. This study provides a systematic review of the literature to comprehensively summarize and
deeply explore the inherent limitations of single modal interaction, as well as the adaptability of multimodal
interaction technology in addressing sensory decline in the elderly. Combining the comparative study of single
modal interaction and multimodal interaction, reveals the mechanism of multimodal fusion in improving the
smart home experience of the elderly. Although multimodal interaction shows great potential, existing
technologies still face challenges that are difficult to overcome, such as multimodal data fusion and a lack of
data sets. Future research should focus more on developing multimodal fusion frameworks and expanding
multimodal databases. The main significance of this article is to provide theoretical support for the design of
aging friendly smart homes and promote the development of multimodal fusion interaction technology.
1 INTRODUCTION
As of 2022, the global population aged 65 and above
totaled 771 million, projected to reach 994 million by
2030 and 1.6 billion by 2050. In 2022, the proportion
of the population aged 65 or older accounted for
approximately 10% of the global population, and is
projected to increase to nearly 16% by 2050(Hong,
Wang, & Cho, 2022). According to the results of
China's Seventh National Population Census, the
population aged 60 and above reached 264 million in
2020, accounting for 18.7% of the total population,
with the aging process gradually intensifying. It is
projected that this proportion will rise to 35% by 2050,
making China the country or region with the fastest
aging rate globally (Hou & Ren, 2024). Additionally,
90% of the elderly choose to age in place at home, and
over 30% experience sensory decline. This has led to
traditional elderly care models being unable to adapt
to the rapid growth of an aging population,
necessitating the development of aging-friendly
industries and significantly increasing demand for
a
https://orcid.org/0009-0005-4541-931X
aging-friendly smart home solutions (Hong, Wang, &
Cho, 2022; Gu et al., 2011). Meanwhile, the global
smart home market size surpassed 715.7 billion yuan
in 2024 and is projected to exceed 1.09 trillion yuan
by 2029. While single-modal interaction has made
significant progress, it still has certain limitations for
the elderly, and for those with sensory impairments, it
increases operational difficulty and safety risks. For
example, individuals with hearing impairments cannot
hear voice commands, and those with visual
impairments cannot view screens. These issues not
only reduce the usability of smart home technology for
the elderly but also threaten their ability to live
independently (Huang et al., 2023). Current research
primarily focuses on two dimensions: single-modal
optimization and multi-modal fusion. In the single-
modal domain, existing single-interaction methods
mitigate specific sensory decline through technical
optimization, parameter adjustments, and functional
simplification. However, single-modal interaction has
three main drawbacks: first, insufficient sensory
adaptability, as single-modal interaction cannot
266
Wang, Z.
Research on Multi-Modal Interaction in Aging Friendly Smart Home to Compensate for Sensory Decline of Elderly Users.
DOI: 10.5220/0014350600004718
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 2nd International Conference on Engineering Management, Information Technology and Intelligence (EMITI 2025), pages 266-270
ISBN: 978-989-758-792-4
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
address the diverse sensory declines experienced by
the elderly population (Tao et al., 2022); second, high
operational difficulty, with numerous cumbersome
steps and processes that may cause the elderly to
abandon smart home systems; third, safety hazards are
more likely to occur, as traditional single-modal
interaction devices often overlook the sensory decline
characteristics of the elderly, leading to increased
operational error rates, especially in emergency
scenarios where single-modal interaction cannot
provide timely feedback (Feng et al., 2022).
In contrast, multimodal interaction integrates
multi-channel input and feedback such as voice,
touch, and vision, significantly improving the
operational efficiency and emotional experience of the
elderly (Tao et al., 2022), and can dynamically adapt
to the various sensory impairments of the elderly. It is
also regarded as an important direction for breaking
through the bottleneck of aging-friendly smart homes.
Therefore, the demand for multimodal interaction
technology in aging-friendly smart homes is growing
increasingly strong. However, multimodal interaction
is not simply a matter of stacking functions.
Unoptimized designs may lead to information
overload, such as confusion caused by simultaneous
triggering of voice and vibration.
This study explores the topic from a dual
perspective of human-computer interaction
technology and adaptability to the physiological
characteristics of the elderly: first, it systematically
reviews the technical bottlenecks and limitations of
single-modal optimization strategies; second, based
on multi-modal fusion frameworks in the literature, it
demonstrates their superior performance and the
advantages of multi-modal dynamic adaptation over
single-modal approaches. The significance of this
study lies in promoting the development of current
smart home technology toward aging-friendly
applications, providing technical and theoretical
frameworks for aging-friendly renovations in the
smart home industry, and offering significant
academic and commercial value prospects for future
researchers.
2 AGING FRIENDLY SMART
HOME RESEARCH STATUS
2.1 Research on Aging Friendly Smart
Homes Based on Single Modality
Single-modal refers to a mode in intelligent
interaction systems that relies solely on a single type
of sensor or interaction channel (such as voice,
gestures, vision, or touch) for information collection
and processing(Ismail et al., 2021; Ni, García
Hernando, & De la Cruz, 2015). Compared to multi-
modal fusion technology, its primary characteristic
lies in achieving specific functionalities within a
given scenario using a single data source. In the field
of aging-friendly smart home technology, single-
modal interaction technology refers to the use of a
single sensor or data source to monitor the behavior
of elderly users and control the environment. Due to
its lower deployment costs and manageable
operational complexity, single-modal technology can
serve as one of the solutions to lower the technical
barriers for elderly users. However, it has certain
limitations in terms of environmental adaptability and
scenario coverage.
The RaGeoSense system developed by Chen
Honghong's team at Northwest Normal University
utilizes millimeter-wave radar sparse point cloud
technology to overcome the limitations of traditional
gesture recognition in terms of light sensitivity(Chen
et al., 2025), spatial resolution, and non-contact
scenarios. The system significantly improves
environmental robustness through a three-level
adaptive noise reduction mechanism. Experiments
show that this design reduces the recognition
accuracy of the system by only 5.1% under pedestrian
interference and improves recognition efficiency by
more than 30% compared to traditional CNN/RNN
models. After validation with 7,000 samples from 10
participants, the system achieved average recognition
rates of 95.2% and 92.56% for eight single-arm
gestures in open environments (60 square meters) and
complex home scenarios (including obstacle
interference), respectively. The average response
time of 103 milliseconds meets the real-time
interaction requirements of smart homes. This
technology leverages the penetration and privacy
protection features of millimeter-wave radar to
provide a new contactless, low-learning-curve
interaction paradigm for elderly users and sensitive
scenarios such as bathrooms and bedrooms.
In recent years, research on voice modalities in the
field of aging-friendly smart homes has exhibited
multi-dimensional characteristics. The Mittal team
developed a smart home automation system based on
dedicated hardware modules and Arduino
microcontrollers (Mittal et al., 2015), achieving
cross-dialect control of home appliances by
establishing an accent-independent speech
recognition model. The system adopts a two-stage
processing architecture: the front end uses noise
Research on Multi-Modal Interaction in Aging Friendly Smart Home to Compensate for Sensory Decline of Elderly Users
267
reduction algorithms to extract speech features, while
the back end employs dynamic time warping
algorithms to match command libraries. In micro-
home prototype testing, the system achieved a
command recognition accuracy rate of 94.3%,
validating the feasibility of low-cost voice control
systems and providing elderly users with a touchless
operation solution.
Zhong et al. approached the issue from a human-
computer interaction perspective (Zhong et al., 2022),
conducting a stratified sample survey of 471 Chinese
users using a 27-dimensional questionnaire to reveal
the differentiated impact mechanisms of age
stratification on the acceptance of voice assistants.
The six-factor regression model they constructed
showed that the elderly group's demand for trust and
emotional connection was significantly higher than
that of the middle-aged and young groups, while the
decline in acceptance due to operational cognitive
load was 2.3 times higher for the elderly group than
for the young group. This finding explains the
underlying reasons for the limited application of
traditional voice interaction design in the elderly
market and provides a basis for developing age-
friendly voice interfaces.
In terms of constructing safety-enhanced systems,
Feng's research team integrated the LD3320 voice
recognition module with sensors to design an
intelligent hub with environmental perception
capabilities (Feng & Xie, 2020). The system uses an
STC89C52 microcontroller to integrate voice
commands, temperature and humidity data, and
smoke concentration data in real time. When
environmental parameters exceed thresholds, it
prioritizes safety protocols—for example, when
detecting a gas leak, it will trigger an audio-visual
alarm even if it receives a voice command to turn on
the gas stove. Following stress testing, the system
maintained a 91.2% speech recognition rate in an
85dB background noise environment, with a false
alarm rate controlled at 0.7 times per thousand hours,
significantly enhancing safety in elderly lone-living
scenarios.
2.2 Research on Multi-Modal Aging
Adaptive Smart Home
In the field of aging-friendly smart home research,
multimodal interaction technology integrates
multiple interaction channels such as voice, vision,
touch, hearing, and electromyographic signals to
provide a systematic solution to the issue of sensory
function decline in the elderly, as well as to meet their
needs and adapt to specific scenarios. Compared to
single-modal interaction, its core advantages lie in
dynamic adaptation to the sensory capabilities of
elderly users and enhanced scenario adaptability: on
one hand, multimodal interaction can verify and
compensate for errors caused by environmental
factors or sensory decline in a single modality
through the alternation of multiple modalities; on the
other hand, the multimodal combination mechanism
allows users to select the optimal interaction method
based on their physical capabilities or operational
habits. For example, EMG gesture recognition can
assist individuals with mobility impairments in
executing complex command inputs, while voice
interaction provides an accessible communication
channel for those with impaired vision (Bian et al.,
2024). This technical feature effectively aligns with
the physiological characteristics of aging users,
making it a core breakthrough in aging-friendly smart
home research.
Gu Xuejing et al. constructed an Avatar-based
voice-gaze dual-channel interaction system (Gu et al.,
2011), using dynamic grammar rule loading and
PCCR gaze tracking algorithms, allowing users to
complete operations through gaze area positioning
and voice commands, and achieving semantic
compensation for non-precise input. Experimental
data shows that the system's operational satisfaction
and interaction experience are significantly higher
than traditional touch interfaces, effectively
alleviating single-sensory load.
Qin's team broke through traditional modality
limitations through a five-modality fusion
architecture (Qin et al., 2023). Their system design
integrates touch-based graphical interfaces, voice
interaction, electromyography gestures, visual
gestures, and haptic controllers, and innovatively
introduces a VR pre-training mechanism to optimize
algorithm transfer. Feasibility tests show that in
environments with light interference, the intent
recognition accuracy of the multi-modal system is
32.7% higher than that of a single-modal system,
confirming the necessity of modality fusion in smart
home environments. In nursing home tests, the
system helped disabled elderly people improve the
efficiency of expressing daily life commands by 3.2
times, with intent recognition stability reaching
93.62% in environments with interference.
Additionally, surveys showed high satisfaction.
According to Zhou's bibliometric research and
trends in scholarly studies (Zhou et al., 2024), the
number of papers in this field grew at an average
annual rate of 17.3% between 2014 and 2022, with
EMITI 2025 - International Conference on Engineering Management, Information Technology and Intelligence
268
smart homes and the Internet of Things (IoT)
emerging as core application scenarios, accounting
for 38.6% of the total. The current research focus has
expanded to cross-modal semantic understanding and
privacy protection mechanisms. For example, Tao
Jianhua's proposed hybrid fusion framework achieves
a 91.4% utilization rate of multimodal data through
dual optimization at the feature layer and decision
layer, offering new insights for real-time response in
aging-friendly systems (Tao et al., 2022). These
advancements signify that research on aging-friendly
smart home systems is entering a core critical phase,
with the rapid development of multimodal technology
driving substantial improvements in the quality of life
for the elderly population.
3 CURRENT LIMITATIONS AND
FUTURE PROSPECTS
Current research in the field of aging-friendly smart
home interaction technology still has significant
limitations. At present, while single-modal
interaction technology demonstrates advantages in
operational efficiency, its sensory adaptation
capabilities are insufficient to address the complex
decline in vision, hearing, and touch among the
elderly. Additionally, multi-modal data fusion
remains superficial: existing research primarily
focuses on optimizing single modalities, lacking
cross-modal collaborative decision-making
mechanisms. Additionally, there is a lack of multi-
modal datasets and databases, and related
technologies have not been fully implemented (Bian
et al., 2024). Furthermore, the current technical
capabilities are insufficient to support multi-modal
fusion at this stage. Elderly individuals also place a
high priority on privacy and security, as multi-modal
interaction technologies involve the collection of data
from multiple modalities, which may pose risks of
data leakage when uploaded to devices.
Future research can explore the following
directions: first, developing a multi-modal fusion
model with dynamically allocated weights, which
adjusts the dominant interaction modality in real-time
based on the environment and user sensory input,
with a focus on practical applications and addressing
technical challenges. Next, developing more
comprehensive and adaptable multi-modal databases,
establishing relevant policies and standards, and
promoting the vigorous development of the aging-
friendly smart home sector.
4 CONCLUSIONS
This study examines the aging-friendly requirements
for smart homes in the context of an aging society. By
reviewing and summarizing relevant literature on
aging-friendly smart homes in the areas of single-
modal and multi-modal approaches, the study has
gained an understanding of the current state of
research and technological developments in this field.
Additionally, the study reviews the application of
single-modal and multi-modal technologies in aging-
friendly smart homes, as well as new technologies, It
also identifies the limitations of single-modal
approaches and highlights the core value and practical
pathways of multi-modal interaction technology in
the field of aging-friendly smart homes. The necessity
of multi-modal interaction technology stems from the
fact that older adults experience varying degrees of
sensory decline, and a single interaction channel
cannot fully address complex scenarios. Multi-modal
fusion interaction technology, which integrates
visual, auditory, tactile, voice, and gesture inputs, can
enhance the user experience for older adults through
redundant channel design and dynamic adaptation.
Only through interdisciplinary collaboration and
technological innovation can smart home technology
transition from “usable” to “user-friendly” and
ultimately to “enjoyable,” driving silver-tech
innovation at the industrial level, accelerating the
development of smart aging standards at the policy
level, and ultimately benefiting the independent
living of the elderly while addressing future
population aging challenges.
REFERENCES
Bian, K., Han, D., Li, S., et al. (2024). Research progress of
multimodal human–computer interaction design.
Mechanical Design, 41(11), 199–204.
Chen, H., Wang, X., Hao, Z., et al. (2025). RaGeoSense for
smart home gesture recognition using sparse millimeter
wave radar point clouds. Scientific Reports, 15, 15267.
Feng, C., & Xie, H. (2020). The Smart Home System Based
on Voice Control. In Smart Innovation, Systems and
Technologies (pp. 383–392).
Feng, Z., et al. (2022). HMMCF: A human-computer
collaboration algorithm based on multimodal intention
of reverse active fusion. International Journal of
Human–Computer Studies, 158, 102735.
Gu, X., Wang, Z., He, J., Zheng, S., & Wang, W. (2011).
Research on multimodal interaction system for
elderly‑oriented smart home. Computer Science, 38(11),
216–219.
Research on Multi-Modal Interaction in Aging Friendly Smart Home to Compensate for Sensory Decline of Elderly Users
269
Hong, Y.-K., Wang, Z.-Y., & Cho, J. Y. (2022). Global
research trends on smart homes for older adults:
Bibliometric and scientometric analyses. International
Journal of Environmental Research and Public Health,
19(22), 14821.
Hou, Y., & Ren, X. (2024). Influencing factors of the
elderly’s intention to use smart home–based elderly
care systems. Design, 9(5), 38–53.
Huang, L., He, Y., Ma, Z., et al. (2023). Research trend
analysis of smart homes from a human–computer
interaction perspective. Journal of Computer-Aided
Design and Computer Graphics, 35(2), 165–184.
Ismail, N. A., Ab Majid, N. A., Abdul Wahab, N. H., &
Mohamed, F. (2021). A comparative study of unimodal
and multimodal interactions for digital TV remote
control mobile application among elderly. International
Journal of Advanced Computer Science and
Applications (IJACSA), 12(7).
Mittal, Y., Toshniwal, P., Sharma, S., Singhal, D., Gupta,
R., & Mittal, V. K. (2015). A voice-controlled
multi‑functional smart home automation system. In
2015 Annual IEEE India Conference (INDICON) (pp.
1–6).
Ni, Q., García Hernando, A. B., & De la Cruz, I. P. (2015).
The elderly’s independent living in smart homes: A
characterization of activities and sensing infrastructure
survey to facilitate services development. Sensors,
15(5), 11312–11362.
Qin, C., Song, A., Wei, L., & Zhao, Y. (2023). A
multimodal domestic service robot interaction system
for people with declined abilities to express themselves.
Intelligent Service Robotics, 16(3), 373–392.
Tao, J., Wu, Y., Yu, C., et al. (2022). A survey on
multimodal human–computer interaction. Journal of
Image and Graphics, 27(6), 1956–1987.
Zhong, R., Ma, M., Zhou, Y., Lin, Q., Li, L., & Zhang, N.
(2022). User acceptance of smart home voice assistant:
A comparison among younger, middle-aged, and older
adults. Universal Access in the Information Society,
23(1), 275–292.
Zhou, C., Zhang, Z., Huang, T., Gu, W., & Kaner, J. (2024).
A bibliometric analysis of interaction interface aging
research: From 2003 to 2022. SAGE Open, 14(3).
EMITI 2025 - International Conference on Engineering Management, Information Technology and Intelligence
270