development costs and rapid updates in battery
technology, experience greater price fluctuations
(Zhao, 2023). Moreover, the type of transmission also
affects car prices. Cars with automatic transmissions
are generally more expensive than those with manual
transmissions because automatic transmissions better
suit the driving habits of most consumers (Zhang,
2022).
Although previous studies have achieved certain
results in identifying the factors influencing car prices,
they generally suffer from the limitations of single
data sources and limited sample sizes, leading to an
insufficiently comprehensive coverage of scenarios
(Chen, 2023). Moreover, the existing analytical
methods are mostly simple correlation analyses,
failing to deeply explore the complex interactions
among various factors, leaving room for further
research.
To break through the limitations of previous
studies, this paper mainly uses correlation analysis
and multiple linear regression (MLR) analysis
methods to explore the influence of various factors on
car prices, reveal their intrinsic relationships, and
provide scientific decision-making bases for all
parties in the automotive market, promoting its
healthy development.
2 METHODS
2.1 Data Sources and Description
This paper is based on a car price prediction dataset
containing 10,000 entries from the Kaggle website,
which covers various aspects of information such as
brand, model, production year, engine size, fuel type,
transmission type, mileage, number of doors, and the
number of previous owners (Mustafa, 2025; Wang,
2023). Before the analysis, this study cleaned and
preprocessed the data to ensure its reliability. All the
data will be used in the subsequent research.
2.2 Selection and Explanation of
Indicators
Table 1 presents the selection and explanation table
of key indicators affecting car prices, listing seven
variable names such as year and engine size, and
briefly explaining the principles by which each
variable affects car prices.
Table 1: Key Indicators Selection and Explanation for Factors Affecting Automobile Prices
Number Variable Name Brief Description
x
1
Year New cars are usually more expensive
x
2
Engine Size Large engines are costly, making the car price higher
x
3
Mileage The higher the mileage, the lower the price usually is
x
4
Doors Different door numbers represent different models, and the prices vary
x
5
Owner Count Frequent changes of ownership lead to a lower price
2.3 Method Introduction
This study employs descriptive statistics, correlation
analysis, and MLR models for analysis. Descriptive
statistics are used to summarize the characteristics of
the data, revealing the central tendency and
dispersion of variables through the calculation of
means, standard deviations, and other indicators,
providing a foundational framework for subsequent
analysis. Correlation analysis, based on Pearson's
correlation coefficient, quantifies the strength and
direction of linear relationships between variables,
helping to identify those significantly related to car
prices and select core independent variables for
model construction. The MLR model is constructed
with car price as the dependent variable and x1, x2,
x3, x4, and x5 as independent variables, with the
formula as follows:
Price = β
+β
x
+β
x
+β
x
+β
x
+
β
x
+ε (1)
This method can control the influence of other
variables and independently assess the marginal
effect of each independent variable on the price. The
advantage of the model lies in its strong
interpretability, allowing for a direct comparison of
the degree of variable influence through standardized
coefficients. This method is particularly suitable for
the quantitative analysis of the combined influence of
multiple variables, breaking through the limitations of
single-variable analysis.