Research on the Application of Multimodal Technology in the
Development of Second-Language Oral Communication Ability
Kairui Wang
School of Computer Science, Hubei University of Education, Wuhan, Hubei, 430205, China
Keywords: Multimodal Discourse, Second-Language Oral Ability, Integration of Multimodal Technology.
Abstract: This research conducts an in-depth literature review on the application of multimodal technology in the de-
velopment of second-language oral communication ability. This paper expounds on the positive significance
of multimodal communication in promoting second-language oral ability by constructing multimodal dis-
course. It deeply analyzes the important roles of resources brought by media development, users' autonomous
learning behaviors, and modal switching in multimodal second-language oral learning. It also points out the
existing deficiencies, including the lack of micro-level research, personalized teaching research, and long-
term follow-up research, as well as the disputes over the weight distribution of multimodal technology and
traditional teaching methods and the evaluation criteria for learning resources. Finally, prospects for future
research are put forward, such as strengthening the research on micro-teaching strategies, focusing on the
development of personalized programs, conducting long-term follow-up research, and establishing a unified
evaluation system for multimodal learning resources.
1 INTRODUCTION
1.1 Research Background
In the context of the accelerating globalization
process, second-language oral communication ability
has become one of the core elements in the field of
cross-cultural communication. With the rapid
development of information technology, multimodal
technology has gradually emerged, bringing
unprecedented opportunities for change to the
traditional second-language oral teaching model.
Traditional second-language oral teaching has long
faced the problem of a serious shortage of real-
interaction scenarios. Learners have difficulty
obtaining sufficient opportunities for natural
language communication in the classroom
environment, which greatly limits the improvement
of their oral ability. In addition, the strong bondage of
mother-tongue thinking often makes learners
unconsciously rely on the language structure and
thinking mode of their mother-tongue in the process
of second-language oral expression, making it
difficult to achieve authentic and fluent second-
language oral output. Multimodal technology, with its
unique advantage of integrating multiple information
transmission methods such as text, images, audio, and
video, provides a possible way to break through the
limitations of traditional teaching. In-depth explo-
ration of the application of multimodal technology in
second-language oral teaching is of great practical
significance for both improving learners' second-
language oral level and promoting the innovative
development of the entire second-language teaching
field.
1.2 Research Objectives
This research aims to systematically and
comprehensively sort out the existing research results
of multimodal technology in the development of
second-language oral communication ability, and
deeply analyze its current situation, influencing
factors, and various existing problems in practical
teaching applications. Through detailed research on
these aspects, it is expected to lay a solid theoretical
foundation for subsequent related research. At the
same time, it provides practical guidance and
suggestions for second-language oral teaching
practice, helping teachers to better use multimodal
technology to improve teaching effectiveness and
promote the effective development of learners' oral
ability.
100
Wang, K.
Research on the Application of Multimodal Technology in the Development of Second-Language Oral Communication Ability.
DOI: 10.5220/0013965700004912
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 1st International Conference on Innovative Education and Social Development (IESD 2025), pages 100-106
ISBN: 978-989-758-779-5
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
1.3 Research Questions
1. How does the construction of multimodal discourse
enhance second-language oral ability and create a
favorable learning environment through modal
synergy?
2. What roles do media development, user
behavior, and modal switching play in multimodal
second-language oral learning? How do media
resources affect the choice of learning strategies?
What unique contributions does user-autonomous
learning make? And how does modal switching
promote the development of oral ability?
3. What are the deficiencies in the integration of
multimodal technology and second-language oral
teaching? How should the weight of technology and
traditional teaching be allocated? What are the
disputes? How to construct a unified evaluation
system for multimodal learning resources?
2 RESEARCH SCOPE AND
METHODS
2.1 Literature Sources
To ensure the comprehensiveness and authority of the
research, this study extensively searches multiple
well-known academic literature databases. These
include CNKI, a comprehensive database in China
covering research results in many academic fields,
Web of Science, an authoritative database that widely
collects top-level research literature in various
disciplines worldwide, and ERIC, a professional
database focusing on the collection and collation of
research literature in the field of education. At the
same time, core academic journals with high
influence in the field of foreign language education,
such as Foreign Language Teaching and Research
and Modern Foreign Languages, are also deeply
reviewed. These journals have long been committed
to publishing cutting-edge research results related to
foreign language teaching, providing rich and in-
depth academic materials for this research.
2.2 Retrieval Strategies
When conducting literature retrieval, a multi-
keyword combination retrieval method is adopted to
improve the accuracy and relevance of the retrieval
results. Specifically, multiple keywords closely
related to the research topic, such as "multimodal
technology", "second-language oral", "multimodal
discourse", and "modal switching", are used for
combination retrieval. Moreover, in order to focus on
the research trends and latest results in this field
within the past fifteen years, a time-range limit is
specifically set, and only relevant literature published
within this specific period is selected. In this way, it
can better reflect the contemporary research status
and development trends of multimodal technology in
the field of second-language oral teaching, ensuring
the timeliness and pertinence of the research.
2.3 Inclusion and Exclusion Criteria
This research formulates strict inclusion and
exclusion criteria for literature to ensure that the
selected literature can accurately and effectively
serve the research purpose. The inclusion criteria
clearly state that only empirical research articles and
theoretical exploration articles directly related to the
application of multimodal technology in second-
language oral teaching or learning are selected. These
articles need to deeply explore the action
mechanisms, influencing factors, teaching strategies,
etc. of multimodal technology in the development of
second-language oral ability, or deeply analyze and
construct the relevant theoretical basis. The exclusion
criteria exclude non-academic articles, such as news
reports and blog articles, because these articles often
lack rigorous academic research methods and in-
depth theoretical analysis, and are difficult to meet the
requirements of academic depth and scientificity of
this research. At the same time, for literature with a
very low degree of relevance to the topic, even if it
involves some aspects of multimodal technology or
second-language oral teaching, if it fails to closely
focus on the core issues of this research, it will also
be excluded from the research scope. In addition, to
avoid interference from duplicate research on result
analysis, all duplicate-published research is not
included in the literature collection of this research.
3 RELATED THEORIES AND
CONCEPTS
3.1 Overview of Core Theories
The comprehensive theoretical framework of
multimodal discourse analysis proposed by Zhang
Delu plays an extremely important role in the
research on the application of multimodal technology
in second-language oral teaching. This theoretical
framework emphasizes the synergistic effect of
Research on the Application of Multimodal Technology in the Development of Second-Language Oral Communication Ability
101
multiple modalities in the process of discourse
meaning construction. In the context of second-
language oral teaching, this means that by organically
integrating multimodal elements such as visual
images and text images, a more abundant and diverse
language learning environment can be created for
learners, thus greatly facilitating the language
learning process. For example, in a multimodal
language-learning video, visual elements such as
scene display, character movements and expressions
in the picture, and text elements such as subtitles and
voice-overs cooperate, helping learners to more
intuitively and deeply understand language
knowledge such as vocabulary, grammatical
structures, and pragmatic rules, and thus effectively
promoting the improvement of their oral ability.
3.2 Definition of Key Concepts
3.2.1 Multimodal Technology
As an advanced technical means, it integrates
multiple symbol systems such as text, images, audio,
and video. These systems are interrelated and
collaborate to construct a rich and three-dimensional
information dissemination space, stimulating
language learning in all aspects.
3.2.2 Second-Language Oral
Communication Ability
It has a rich connotation, covering aspects such as
accuracy, fluency, appropriateness of expression, and
the ability to use various communication strategies.
Learners need to meet certain standards in all aspects
to achieve effective cross-cultural oral
communication and free interaction.
3.2.3 Multimodal Discourse
It is a special form of discourse constructed by
multiple modal symbol resources. Each modality
collaborates and complements each other. For
example, in a language-learning video, modalities
such as pictures, sounds, and texts are closely
combined to form an organic whole, presenting a
complete, vivid, and meaningful language-learning
context.
3.2.4 Modal Switching
It refers to the ability and process of learners to switch
and connect between different modal information.
For example, after learning from a video modality,
learners can retell through the oral modality. This
requires flexible thinking transformation, integrating
information and transforming it into oral expression,
thus enhancing the creativity and flexibility of oral
expression.
4 COMPREHENSIVE
LITERATURE ANALYSIS
4.1 Main Research Schools and Views
4.1.1 Multimodal Discourse Promotion
School
Firmly support the crucial role of multimodal
discourse in second-language oral learning. It can
provide learners with a rich context and diverse
information input, effectively promoting the
improvement of oral comprehension and expression
abilities. For example, "Proposed a robot-based
digital storytelling method, combining multimodal
elements such as animation and voice interaction to
create an immersive language-learning environment
for learners", which strongly promotes the
development of oral ability (Liang & Hwang, 2023).
4.1.2 Media-User Interaction School
Emphasize the powerful influence generated by the
combination of media resource diversity and users'
autonomous learning behaviors. For example,
"Conducted a case study on the use of virtual reality
technology in multimodal computer-assisted
language learning. Learners entered virtual scenes
through devices for interactive communication,
independently selected learning content and methods,
and the oral ability of learners who actively
participated in interactive activities was significantly
improved”, fully demonstrating the rationality of this
school's view (Kim, 2021).
4.1.3 Modal Synergy School
Focus on exploring the mechanisms and effects of
different modal switching and synergy, and are
committed to enhancing the creativity and flexibility
of oral expression. For example, "Deeply explored
the impact of multimodal feedback on learners' oral
performance and motivation. Learners integrated and
analyzed multimodal feedback information such as
voice, text, and video, and then transformed it into
oral output, effectively stimulating the ability to think
innovation and improving the quality and effect of
oral expression" (Lan et al., 2022).
IESD 2025 - International Conference on Innovative Education and Social Development
102
4.1.4 Resource Development and Evaluation
School
Pay close attention to the development of multimodal
learning resources and the construction of evaluation
criteria. "Discussed various resource forms of digital
multimodality in language learning in *Digital
Multimodalities in Language Learning*, providing a
theoretical basis and practical case references for
resource development" (Baranova, Zaitsev &
Makarova, 2022). At the same time, different scholars
hold different views on resource evaluation criteria.
For example, "Emphasized the teaching adaptability
and interactivity of resources in *Multimodal
Communication in the Language Classroom*,
believing that resources should be flexibly adjusted
according to teaching goals and learners' needs, and
promote interaction and communication between
teachers and students, and among students" (Dofs,
2023). Some scholars who study the application of
virtual reality and augmented reality technologies in
multimodal learning resources focus on the
technological innovation of resources, believing that
novel technological means can enhance the learning
experience and effect.
4.1.5 Teaching Strategy Research School
Devote themselves to verifying the effectiveness of
multimodal teaching strategies. "Conducted a review
study on the role of multimodal input in the
development of second-language oral skills, and
found that different multimodal input strategies (such
as picture-first-then-text, audio-first-then-video, etc.)
have different impacts on the improvement of
learners' oral ability" (Guichon & Cohen, 2023).
"Studied the impact of multimodal annotations on
second-language incidental vocabulary learning and
oral performance, and the results showed that
appropriate multimodal annotations (such as picture
annotations, audio annotations, etc.) can help learners
better understand and remember vocabulary, and then
improve the accuracy and richness of oral expression"
(Tseng & Sheu, 2023). "Comprehensively evaluated
the effectiveness of multimodal teaching on second-
language oral skills using meta-analysis methods, and
found that the teaching strategy of combining
multimodal input and output is significantly effective
in improving oral fluency", providing strong data
support for the selection of teaching strategies (Yuan
& Wang, 2024).
4.1.6 Technology Integration and
Innovation School
Actively explore the innovative integration and
application path of new technologies and multimodal
oral teaching. "Conducted a case study on the
integration of multimodal technology and task-based
language teaching. Integrated new technologies such
as virtual reality and artificial intelligence into task-
based teaching. Through creating real-task scenarios
(such as simulated business negotiations, travel
exchanges, etc.), learners used multimodal resources
for communication and cooperation during the task-
completion process, significantly improving their oral
communication ability" (Zhang & Zhang, 2024).
"Conducted a qualitative study on learners'
perceptions and experiences of multimodal learning
in second-language oral classrooms, and found that
learners have a relatively high acceptance of the
integration of new technologies into multimodal
teaching, but there are also some difficulties in
technology use and adaptation problems due to
individual differences", providing a direction and
basis for further optimizing technology integration
(Zhao & Liu, 2024).
4.2 Research Development Context
The research on multimodal technology in second-
language teaching shows phased characteristics. In
the early stage, the research on the newly emerging
multimodal technology focused on the feasibility and
potential impact of its integration with the traditional
model.
With the changes in information technology and
educational concepts, the focus has shifted to aspects
such as multimodal discourse design, the impact of
media on learning behaviors, and modal switching
mechanisms. For example, when designing discourse,
modal elements are selected and combined according
to teaching goals and learner characteristics. When
studying the impact of media, the effects of different
media on attention, interest, and strategy selection are
analyzed. The research on modal switching
mechanisms explores the psychological and cognitive
laws of information conversion and connection.
In recent years, with the prominent demand for
educational personalization, the research focus has
expanded to individual differences and personalized
teaching. Attention is paid to the performance
differences of different learners in a multimodal
environment, and efforts are made to explore the
customization of personalized teaching programs to
achieve accurate and efficient teaching results.
Research on the Application of Multimodal Technology in the Development of Second-Language Oral Communication Ability
103
4.3 Research Hotspots and Trends
Currently, the research hotspots of multimodal
technology in second-language oral teaching focus on
several aspects. First, the development and evaluation
of multimodal learning resources. Under
technological innovation, it is necessary to develop
highly targeted and high-quality resources, such as
creating immersive scenarios and interactive software
with the help of virtual reality, etc. At the same time,
a unified evaluation standard should be established,
because the existing resources have large differences
in quality and effectiveness, and the lack of standards
confuses teachers and learners when choosing
resources. Second, the verification of the
effectiveness of multimodal teaching strategies.
Through empirical research methods such as
comparative experiments and case analyses, the
effects of multimodal input, modal switching
training, multimodal interaction, and other teaching
strategies on improving oral ability are verified, and
the best strategy combination is explored. Third, the
innovative application of new technologies in
multimodal oral teaching. For example, virtual reality
creates an immersive feeling, and artificial
intelligence provides personalized suggestions and
guidance. It is necessary to combine these with
multimodal teaching concepts to innovate teaching
models.
Looking to the future, one research trend is to
construct a multimodal teaching theoretical system,
integrate existing results, and improve the theory to
guide practice. Second, explore accurate personalized
teaching models, and customize programs according
to learners' differences to improve effectiveness.
Third, comprehensively evaluate multimodal
learning effects, evaluate learning effects from
multiple dimensions, and comprehensively under-
stand the impact on the development of second -
language oral communication ability.
5 ADVANTAGES,
DISADVANTAGES, AND
DISPUTES OF THE RESEARCH
5.1 Advantages of Existing Research
5.1.1 Theoretical Framework Construction
A preliminary theoretical framework for the
application of multimodal technology in second-
language oral teaching has been constructed. For
example, the comprehensive theoretical framework
of multimodal discourse analysis clarifies the
collaborative mechanism of multiple modalities,
laying a foundation for subsequent research. Many
studies have explored multimodal discourse design,
modal switching, etc. based on this, enriching and
improving the theoretical system.
5.1.2 Empirical Research Results
A large number of empirical studies have provided
strong evidence for the positive role of multimodal
technology in improving oral ability, and facilitating
teaching practice. Through scientific
methods such as
experiments, the improvement effects of multimodal
teaching on aspects such as oral accuracy have been
verified. For example, comparative studies have
found that learners' oral performance has improved
significantly under multimodal teaching, promoting
teachers' application and teaching innovation.
5.1.3 Diversity of Research Methods
A variety of research methods are comprehensively
used. The experimental research method strictly
controls variables to test the effectiveness of
strategies; the case-analysis method analyzes specific
cases; the survey research method collects
information such as learning experiences, deeply
analyzing the application of multimodal technology
in teaching from multiple perspectives and providing
a basis for optimizing teaching.
5.2 Disadvantages of Existing Research
5.2.1 Lack of Micro-Level Research
Most research focuses on macro - level theoretical
discussions and effect verification, and there is a
serious lack of research on micro-level strategies for
the precise combination of multimodal elements. For
example, issues such as the suitable combination of
images, audio, and text for specific learners and the
dynamic adjustment of modalities have not been
deeply explored, resulting in a lack of operational
guidelines for teachers.
5.2.2 Insufficient Research on Personalized
Teaching
The research on personalized multimodal teaching for
different learners is insufficient. Individual
differences significantly affect learners' acceptance
and processing of multimodal information, but
existing research has not fully considered these
IESD 2025 - International Conference on Innovative Education and Social Development
104
differences. For example, the needs of low-level and
high-level learners are different, but there is a lack of
targeted design and analysis.
5.2.3 Lack of Long-Term Follow-Until
Research
There is a lack of long-term follow-up empirical
research to determine the long-term effectiveness of
multimodal technology in improving oral ability.
Most research only focuses on short-term teaching
effects, such as changes in learners' oral ability within
one semester or a few months. However, language
learning is a long-term process, and the continuous
impact of multimodal technology in the long-term
learning process, learners' retention rate, and its far-
reaching impact on their comprehensive language
literacy still need to be deeply investigated.
5.3 Disputes in The Research
5.3.1 Dispute over The Weight of Teaching
Methods
There is a dispute over the weight distribution
between multimodal and traditional teaching
methods. Some scholars advocate that multimodal
teaching should be dominant. They believe that its
rich resources can stimulate students' interest and
enhance their participation, suggesting a reduced
reliance on traditional teaching. However, opponents
argue that traditional teaching has distinct advantages
in grammar explanation and systematic knowledge
transfer. Excessive application of multimodal
teaching might cause students to neglect fundamental
knowledge. Moreover, in resource-scarce
environments, it may be difficult to implement
multimodal teaching. Therefore, they suggest a
cautious combination of the two.
5.3.2 Differences in Resource Evaluation
Criteria
The evaluation criteria for multimodal learning
resources have not been unified. Some studies focus
on technological innovation, considering factors such
as the fidelity of virtual reality scenarios and the
functionality of interactive software. Others
emphasize the accuracy and completeness of teaching
content, paying attention to the coverage of language
knowledge points and the introduction of cultural
backgrounds. Still, others attach great importance to
the user experience, such as the user-friendliness of
the interface and the interest in learning tasks. Due to
the lack of unified standards, the quality of
multimodal learning resources on the market varies
widely. Teachers and learners are often at a loss when
selecting resources, finding it difficult to distinguish
the true value and applicability of resources. This has
greatly affected the effective promotion and precise
application of multimodal technology in teaching.
6 CONCLUSIONS
6.1 Research Summary
Multimodal technology has shown significant value
in the development of second-language oral
communication ability. Multimodal discourse
promotes learners' oral understanding and expression
by providing rich context information. The
combination of diverse resources brought by media
development and users' autonomous learning
behaviors creates more opportunities and motivation
for oral learning. The cultivation of modal switching
ability helps enhance the creativity and flexibility of
oral expression. However, existing research has many
deficiencies and disputes. In-depth exploration is still
needed in areas such as micro-teaching strategies,
personalized teaching, long-term effect evaluation,
and the integration of teaching methods and resource
evaluation.
6.2 Research Prospects
Future research should focus on strengthening the
research on multimodal teaching strategies at the
micro level, deeply analyzing the combination effects
and dynamic adjustment mechanisms of different
modal elements, and providing detailed teaching
operation manuals for teachers. Attention should be
paid to the development of personalized teaching
programs, fully considering individual differences
such as learners' language proficiency, cultural
backgrounds, and learning styles, and constructing
precisely adapted multimodal teaching models. Long-
term follow-up research should be carried out, using
methods such as follow-up surveys and return visits
to determine the long-term impact and transfer effects
of multimodal learning on oral ability and other
language skills. A unified evaluation system for
multimodal learning resources should be established,
comprehensively considering factors such as
technology, teaching content, and user experience,
formulating scientific and reasonable evaluation
indicators and weights, providing a reliable basis for
the screening and optimization of multimodal
learning resources, and thus promoting the effective
Research on the Application of Multimodal Technology in the Development of Second-Language Oral Communication Ability
105
application and sustainable development of
multimodal technology in the field of second-
language oral teaching.
REFERENCES
Baranova, E. G., Zaitsev, D. A., & Makarova, E. V. 2022.
Digital Multimodalities in Language Learning.
Springer.
Dofs, K. 2023. Multimodal Communication in the Lan-
guage Classroom. Multilingual Matters.
Guichon, N., & Cohen, A. D. 2023. The role of multimodal
input in L2 speaking development: A review. Language
Teaching Research, 27(4): 431-450.
Kim, S. 2021. Multimodal CALL for enhancing speaking
skills: A case study of using virtual reality in EFL edu-
cation. Computer Assisted Language Learning, 34(5):
465-482.
Lan, Y. J., Liu, H. M., & Huang, H. C. 2022. The impact of
multimodal feedback on EFL learners’ oral perfor-
mance and motivation. System, 105, 102807.
Liang, J. Q, & Wang, G. J. 2023. A robot-based digital sto-
rytelling approach to enhancing EFL learners’ multi-
modal storytelling ability and narrative engagement.
Computers & Education, 201, 104827.
Tseng, Y. H., & Sheu, H. C. 2023. Investigating the effects
of multimodal glosses on L2 incidental vocabulary
learning and speaking performance. ReCALL, 35(2):
185 - 206.
Yuan, Y., & Wang, L. 2024. A meta-analysis of the effec-
tiveness of multimodal instruction on L2 speaking
skills. Studies in Second Language Acquisition, 46(1):
121-146.
Zhang, D., & Zhang, X. 2024. The integration of multi-
modal technology and task-based language teaching: A
case study. System, 115, 103348.
Zhao, Y., & Liu, X. 2024. Learners’ perceptions and expe-
riences of multimodal learning in L2 speaking class-
rooms: A qualitative study. Computer Assisted Lan-
guage Learning, 37(2): 153-172.
IESD 2025 - International Conference on Innovative Education and Social Development
106