that the price of the reviewer's mobile phone had a
positive effect on consumers’ purchase intention.
The academics also looked at how to build more
accurate price models to better predict and analyse
mobile phone prices.
Chen (2024) improved the hedonic price index
model by incorporating LASSO regression and the
RYGEKS index method, effectively addressing
multicollinearity issues among variables. Xu (2022)
adopted multiple machine learning classification
algorithms for mobile phone price categorization and
prediction, identifying logistic regression as the
optimal model. Han, Li, & Du (2022) proposed an
adaptive price adjustment method based on a Dual
Deep Fuzzy Network (DDFN) for the second-hand
mobile phone market, ensuring accuracy and
reliability in recycled device price adjustments.
Existing studies demonstrate that mobile phone
pricing is influenced by complex multidimensional
factors, necessitating the adoption of advanced
modeling approaches to enhance price prediction
accuracy.
This study focuses on the pricing mechanism of
smartphones, exploring in depth the key factors
affecting smartphone pricing by referring to industry
reports and analysing multi-dimensional data.
Aiming at the problems of insufficient model
accuracy and time lag in existing research on price
influencing factors, this study constructs a more
comprehensive linear regression model by analysing
relevant pricing factors. The study aims to provide a
two-way reference for consumers and enterprises and
promote the healthy development of the smartphone
market.
2 LINEAR REGRESSION
METHODS AND DATA
SOURCES AND
PRE-PROCESSING
Linear regression methods are mainly used to study
the linear relationships between variables and to
model them for prediction and data analysis (Maulud
and Abdulazeez, 2020). This study constructs a
multiple linear regression model with smartphone
price as the response variable and each key factor
affecting mobile phone price as the predictor, and
analyses the correlation between these factors and
smartphone price.
This study uses a comprehensive dataset from the
Kaggle website, which presents an all-encompassing
collection of information on all the latest
smartphones existing in the market, which can be
used for in-depth analysis of the factors affecting
smartphone pricing (Kaggle, 2023). The dataset was
created by Abhijit Dahatonde and contains basic
attributes of 980 different types of smartphones,
covering a wide range of information such as brand,
model, configuration, etc (Kaggle, 2023).
Before analyzing the collected dataset, data
preprocessing was conducted to enhance data quality
and ensure analytical reliability. First, missing data
were addressed through mean or median imputation
based on variable distributions. Subsequently, price
values were converted from Indian Rupees (INR) to
US Dollars (USD) using the exchange rate to
standardize numerical variables for quantitative
analysis. Finally, outliers were identified and
addressed through rigorous statistical inspection,
with two anomalous data rows removed to mitigate
their distorting effects. This preprocessing resulted in
a refined dataset containing 978 validated
observations for each attribute.
3 INDICATORS SELECTION
The dependent variable of the multiple linear
regression model constructed by this research is the
price of smartphones. The predictor variables include
the average customer rating, the number of cores in
the processor, the processing speed of the processor,
etc. Table 1 shows the naming and interpretation of
the variables:
Table 1: The naming and explanation of variables
Variables Ex