Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance
David Suda, Mark Caruana, Lorin Grima
2025
Abstract
Fraud detection in motor insurance is investigated with the implementation and comparison of various tree-based learning methods subject to different data balancing approaches. A dataset obtained from the insurance industry will be used. The focus is on decision trees, random forests, gradient boosting machines, light gradient boosting machines and XGBoost. Due to the highly imbalanced nature of our dataset, synthetic minority oversampling and cost-sensitive learning approaches will be used to address this issue. A study aimed at comparing the two data-balancing approaches is novel in literature, and this study concludes that cost-sensitive learning is overall superior for this application. The light gradient boosting machine using cost-sensitive learning is the most effective method, achieving a balanced accuracy of 81% and successfully identifying 83% of fraudulent cases. For the most successful approach, the primary insights into the most important features are provided. The findings derived from this study provide a useful evaluation into the suitability of tree-based learners in the field of insurance fraud detection, and also contribute to the current development of useful tools for correct classification and the important features to be addressed.
DownloadPaper Citation
in Harvard Style
Suda D., Caruana M. and Grima L. (2025). Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 390-397. DOI: 10.5220/0013513900003967
in Bibtex Style
@conference{data25,
author={David Suda and Mark Caruana and Lorin Grima},
title={Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={390-397},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013513900003967},
isbn={978-989-758-758-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance
SN - 978-989-758-758-0
AU - Suda D.
AU - Caruana M.
AU - Grima L.
PY - 2025
SP - 390
EP - 397
DO - 10.5220/0013513900003967
PB - SciTePress