7 CONCLUSION
This study utilizes web crawler technology to obtain
a dataset for historical transactions in Beijing. After
the data is pre-processed, there are 56793 pieces of
data with 23 features. With grid-search and 5-Fold
Cross Validation, training Random Forest, XGBoost,
LightGBM and the ensemble model to offer
predictions for housing price. Afterward, this study
evaluates each model by indicators and visualization.
The effects, from best to worst, are: Ensemble
model, XGBoost, Random Forest and LightGBM. As
the research results show, the ensemble method can
outperform a single model in complex real estate
prediction tasks. This study offers a comparison of
various models and emphasizes the strengths of
ensemble approaches.
This study offers a practical machine learning
method for real estate market prediction. It offers
meaningful insight for real estate analysts and
policymakers. Moreover, it contributes to the
growing body of research applying ensemble
methods to China’s housing prices.
Though the results are great, this study has the
limitation that the omission of inflation effects in the
time span of the dataset may constrain the model’s
interpretation to capture real-estate market trends
years later. Future studies can explore deeper time-
series models like ARIMA and introduce them into
an ensemble model, promoting long-term stability in
model outcomes.
REFERENCES
Biau, G. 2012. Analysis of a random forests model. Journal
of Machine Learning Research, 13, 1063–1095.
Breiman, L. 2001. Random forests. Machine Learning,
45(1), 5–32.
Chen, T., & Guestrin, C. 2016. XGBoost: A scalable tree
boosting system. In Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge
Discovery and Data Mining (pp. 785–794).
Claesen, M., & De Moor, B. 2015. Hyperparameter search
in machine learning. arXiv.
Huang, Y., Khan, J., Girardin, E., & Shad, U. 2021. The
role of the real estate sector in the structural dynamics
of the Chinese economy: An input–output analysis.
China & World Economy, 29(1), 61–86.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W.,
... & Liu, T.-Y. 2017. LightGBM: A highly efficient
gradient boosting decision tree. In Advances in Neural
Information Processing Systems, 30 (pp. 3146–3154).
Liaw, A., & Wiener, M. 2002. Classification and regression
by randomForest. R News, 2(3), 18–22.
Liu, C., & Xiong, W. 2018. China’s real estate market
(NBER Working Paper No. 25297). National Bureau of
Economic Research.
Soper, D. S. 2021. Greed is good: Rapid hyperparameter
optimization and model selection using greedy k-fold
cross validation. Electronics, 10(16), 1973.
Tang, B., Liu, C., & Li, J. 2016. An investigation into real
estate investment and economic growth in China: A
dynamic panel data approach. Sustainability, 8(1), 66.
Thompson, N. C., Greenewald, K., Lee, K., & Manso, G. F.
2020. The computational limits of deep learning. arXiv.
Wasserbacher, H., & Spindler, M. 2022. Machine learning
for financial forecasting, planning and analysis: Recent
developments and pitfalls. Digital Finance, 4, 63–88.
Zaraket, F., Aziz, A., & Khurshid, S. 2006. Sequential
encoding for relational analysis. In Proceedings of the
18th International Conference on Computer Aided
Verification (CAV) (pp. 164–178). Springer.