smart homes and the Internet of Things (IoT)
emerging as core application scenarios, accounting
for 38.6% of the total. The current research focus has
expanded to cross-modal semantic understanding and
privacy protection mechanisms. For example, Tao
Jianhua's proposed hybrid fusion framework achieves
a 91.4% utilization rate of multimodal data through
dual optimization at the feature layer and decision
layer, offering new insights for real-time response in
aging-friendly systems (Tao et al., 2022). These
advancements signify that research on aging-friendly
smart home systems is entering a core critical phase,
with the rapid development of multimodal technology
driving substantial improvements in the quality of life
for the elderly population.
3 CURRENT LIMITATIONS AND
FUTURE PROSPECTS
Current research in the field of aging-friendly smart
home interaction technology still has significant
limitations. At present, while single-modal
interaction technology demonstrates advantages in
operational efficiency, its sensory adaptation
capabilities are insufficient to address the complex
decline in vision, hearing, and touch among the
elderly. Additionally, multi-modal data fusion
remains superficial: existing research primarily
focuses on optimizing single modalities, lacking
cross-modal collaborative decision-making
mechanisms. Additionally, there is a lack of multi-
modal datasets and databases, and related
technologies have not been fully implemented (Bian
et al., 2024). Furthermore, the current technical
capabilities are insufficient to support multi-modal
fusion at this stage. Elderly individuals also place a
high priority on privacy and security, as multi-modal
interaction technologies involve the collection of data
from multiple modalities, which may pose risks of
data leakage when uploaded to devices.
Future research can explore the following
directions: first, developing a multi-modal fusion
model with dynamically allocated weights, which
adjusts the dominant interaction modality in real-time
based on the environment and user sensory input,
with a focus on practical applications and addressing
technical challenges. Next, developing more
comprehensive and adaptable multi-modal databases,
establishing relevant policies and standards, and
promoting the vigorous development of the aging-
friendly smart home sector.
4 CONCLUSIONS
This study examines the aging-friendly requirements
for smart homes in the context of an aging society. By
reviewing and summarizing relevant literature on
aging-friendly smart homes in the areas of single-
modal and multi-modal approaches, the study has
gained an understanding of the current state of
research and technological developments in this field.
Additionally, the study reviews the application of
single-modal and multi-modal technologies in aging-
friendly smart homes, as well as new technologies, It
also identifies the limitations of single-modal
approaches and highlights the core value and practical
pathways of multi-modal interaction technology in
the field of aging-friendly smart homes. The necessity
of multi-modal interaction technology stems from the
fact that older adults experience varying degrees of
sensory decline, and a single interaction channel
cannot fully address complex scenarios. Multi-modal
fusion interaction technology, which integrates
visual, auditory, tactile, voice, and gesture inputs, can
enhance the user experience for older adults through
redundant channel design and dynamic adaptation.
Only through interdisciplinary collaboration and
technological innovation can smart home technology
transition from “usable” to “user-friendly” and
ultimately to “enjoyable,” driving silver-tech
innovation at the industrial level, accelerating the
development of smart aging standards at the policy
level, and ultimately benefiting the independent
living of the elderly while addressing future
population aging challenges.
REFERENCES
Bian, K., Han, D., Li, S., et al. (2024). Research progress of
multimodal human–computer interaction design.
Mechanical Design, 41(11), 199–204.
Chen, H., Wang, X., Hao, Z., et al. (2025). RaGeoSense for
smart home gesture recognition using sparse millimeter
wave radar point clouds. Scientific Reports, 15, 15267.
Feng, C., & Xie, H. (2020). The Smart Home System Based
on Voice Control. In Smart Innovation, Systems and
Technologies (pp. 383–392).
Feng, Z., et al. (2022). HMMCF: A human-computer
collaboration algorithm based on multimodal intention
of reverse active fusion. International Journal of
Human–Computer Studies, 158, 102735.
Gu, X., Wang, Z., He, J., Zheng, S., & Wang, W. (2011).
Research on multimodal interaction system for
elderly‑oriented smart home. Computer Science, 38(11),
216–219.