of historical data for training, thereby improving the
accuracy of predictions. In addition, machine learning
models, especially complex models like random
forests, decision trees, and neural networks, are able
to capture nonlinear relationships and complex
patterns in data, which gives them an advantage over
traditional statistical methods in predicting trends and
changes in the growth of wealth among the rich
(Dhall, 2020). And machine learning can also be used
to extract important information from a large number
of features, such as country, industry, age, etc., to help
identify key factors that affect wealth growth. This
study used machine learning to explore the
relationship between the rich list and country,
industry, and gender, and then inferred the
relationship between personal wealth and the three.
The advantage of visualization is that it can present
complex data results in a simple and intuitive form,
so that non-technical personnel can understand it.
This study uses pie charts, bar charts, trend lines and
other charts to clearly reveal the distribution and
changes of wealth.
2 METHOD
2.1 Data Acquisition and Processing
The dataset used in this study is the annual rich list
from 1997 to 2024. The data for each year is a two-
dimensional table, where the attributes include
ranking, name, net worth, country, industry, age,
gender, position, whether it is self-made, and ranking
change. For the initial data processing, since the data
from 1997 to 2014 is relatively old, and the data from
1997 to 2006 is incomplete, and the rankings are
updated very quickly, the impact of 1997 to 2014 on
the prediction is relatively low, so it is ignored.
Finally, the data set from 2015 to 2024 is used for
prediction. The attributes used are: ranking, name, net
worth, country, industry, gender, year (Guillem,
2024).
2.2 Machine Learning Models
This study used three machine learning models: linear
regression model, polynomial regression model, and
random forest model. The linear regression model
prediction is to build a model for each rich person,
and regress and fit the net assets of each rich person
in previous years (Su, 2012). Finally, the year is
substituted into the model to predict the net assets of
each rich person in 2025, and finally the prediction
results are obtained by sorting. Polynomial regression
prediction is divided into two methods in this
experiment (Heiberger, 2009). The first is to build a
polynomial model for the entire data set, and the
second is to build a polynomial model for each rich
person. Both methods use ranking, country, industry,
gender, and year as features, and assets as output
results, and then sort them for prediction. At the same
time, polynomial regression also tries to use only
assets as features to make polynomial fitting
predictions in the second method. The purpose is to
compare with the prediction results of adding more
factors, use linear regression models to analyze the
factors that predict the changes in the net worth of the
rich, and use coefficients to judge the degree of
influence of different features on the changes in net
assets. The random forest model first measures the
importance of the features of country, industry,
gender, ranking, and year, and then combines the
characteristic factors of each feature to predict each
rich person (Rigatti, 2017).
2.3 Evaluation Indicators
This study mainly used two evaluation indicators:
Mean Square Error (MSE) and Coefficient of
Determination (R²) (Ozer, 1985).
MSE calculates the prediction error of the model.
MSE is the average of the sum of squares of the errors
between the predicted value and the true value. The
smaller the MSE, the lower the prediction error of the
model.
R² measures the explanatory power of the model.
R² represents the correlation between the predicted
value and the true value, and the value range is [0, 1].
The closer R² is to 1, the stronger the explanatory
power of the model.
The model testing method is K-fold cross-
validation: K-fold cross validation is a commonly
used model evaluation technique to evaluate the
performance and generalization ability of machine
learning models. It achieves more reliable model
evaluation by dividing the dataset into multiple
subsets.
This validation method is to first divide the dataset
into K subsets (folds). Each subset is called a fold. In
each iteration, one of the folds is used as the
validation set, and the remaining K-1 folds are used
as training sets. In K iterations, a different fold is used
as the validation set each time to evaluate the model
performance. The average of the K evaluation results
is calculated as the final performance indicator of the
model. In this test, the dataset is divided into 5 or 10
folds to increase the reliability of the validation
results. The advantage of K-fold cross validation is