sentiment dictionary of the film industry, the
emotional tendency of film reviews can be obtained
more accurately, and some emotional tendency
patterns with statistical validity can be obtained by
analyzing a large number of comments in the film
review area. Specifically, when a film review is input,
the model can quickly analyze the vocabulary,
sentence structure, and context, thereby determining
the emotional attitude expressed by the author when
writing the review, whether it is praise, criticism, or
neutral evaluation of the film (Ullah et al., 2023). In
addition, due to the openness of review data, a large
number of public user reviews can be easily obtained
from domestic film review platforms. The openness
and accessibility of this data provide conditions for an
in-depth understanding of user feedback and
emotional dynamics in the film market and also
provide a strong reference for the development and
optimization of the film industry.
In recent years, with the development of natural
language processing technology, the research on
combining film reviews with sentiment analysis
mainly includes using machine learning methods to
try to extract effective content from the words and
sentences in emotionally extreme film reviews and
improve them. Lee, Sang Hoon, and others use
sentiment dictionaries to assist in analyzing the
sentiment of film reviews and effectively improve the
accuracy of models in the field of film review analysis
(Lee et al., 2016). Because there is a correlation
between the emotionality of film reviews and the
effectiveness of the content, sentiment analysis can be
used to classify film reviews to find more high-
quality film reviews that are worth referring to.
Sudhanshu Kumar and others combine sentiment
analysis with traditional recommendation systems to
filter and classify emotional reviews through
sentiment analysis to recommend reviews and content
that are closer to the user's emotional tendencies
(Kumar et al., 2020; Soubraylu & Rajalakshmi,
2021). In terms of specific analysis methods, they
include using hybrid models, pre-screening and
classification based on text length, and building
extreme word recognition dictionaries to perform
more accurate sentiment analysis, etc., to improve the
accuracy of analysis (Peng & Cheng, 2023). At
present, there is relatively mature analysis software in
the commercial market that can monitor the
generation and decline of hot topics and even judge
social media group polarization, which act as
"accelerators" to jointly promote the generation of
group polarization (Rabiee et al., 2024; Utmhikari,
2017). However, there is still a lack of research from
a specific perspective on the correlation between
audience emotional polarization and film-related
factors.
The core issue of this study is what factors of the
film are mainly related to the emergence of the
emotional polarization phenomenon in the audience's
evaluation in the comment area. Unlike existing film
review sentiment analysis research, this study focuses
on the audience emotional polarization phenomenon
itself. This phenomenon is of great value because if it
can be determined that the type of film or specific
factors are correlated with the audience's emotional
polarization or even causal relationship, it will have a
positive effect on the filmmaker's marketing strategy
formulation, helping them to accurately locate
potential users and improve the success rate of
marketing activities. Specifically, this study aims to
define and classify the emergence of audience
emotional polarization and analyze whether there is a
correlation between the type of film, box office,
ratings, and audience emotional polarization to
provide possible new indicators for the analysis of
film reviews.
2 METHOD
The data for this study comes from the public film
review data uploaded to Kaggle in 2017, which is
sourced from the Douban platform (Yuan et al.,
2020). The data mainly includes the Chinese name,
English name, review content, rating, number of
likes, and other information about the movie. The
earliest movie in the data is Iron Man 1 in 2008, and
the rest of the movies are from 2012 and later. This
data collection method avoids the copyright and
technical difficulties that may be faced by self-
collected data and can quickly obtain a large amount
of real and public film review data.
2.1 Sentiment Tendency Analysis
This study aims to explore whether the type of movie
affects the audience's emotional polarization. The
analysis model selected is the Baidu voice sentiment
tendency analysis model, which has been put into
commercial use and is relatively stable. From the 28
movies in the database, 2000 reviews of each movie
were randomly selected for analysis. Before data
analysis, it first cleaned the comment confidence
(confidence>0.5), then performed sentiment analysis
on each comment, and finally combined statistical
methods to analyze the sentiment distribution and its
possible polarization phenomenon, and obtained the
correlation analysis between the audience's sentiment