Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance

David Suda, Mark Caruana, Lorin Grima

2025

Abstract

Fraud detection in motor insurance is investigated with the implementation and comparison of various tree-based learning methods subject to different data balancing approaches. A dataset obtained from the insurance industry will be used. The focus is on decision trees, random forests, gradient boosting machines, light gradient boosting machines and XGBoost. Due to the highly imbalanced nature of our dataset, synthetic minority oversampling and cost-sensitive learning approaches will be used to address this issue. A study aimed at comparing the two data-balancing approaches is novel in literature, and this study concludes that cost-sensitive learning is overall superior for this application. The light gradient boosting machine using cost-sensitive learning is the most effective method, achieving a balanced accuracy of 81% and successfully identifying 83% of fraudulent cases. For the most successful approach, the primary insights into the most important features are provided. The findings derived from this study provide a useful evaluation into the suitability of tree-based learners in the field of insurance fraud detection, and also contribute to the current development of useful tools for correct classification and the important features to be addressed.

Download


Paper Citation


in Harvard Style

Suda D., Caruana M. and Grima L. (2025). Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 390-397. DOI: 10.5220/0013513900003967


in Bibtex Style

@conference{data25,
author={David Suda and Mark Caruana and Lorin Grima},
title={Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={390-397},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013513900003967},
isbn={978-989-758-758-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Comparison of Tree-Based Learning Methods for Fraud Detection in Motor Insurance
SN - 978-989-758-758-0
AU - Suda D.
AU - Caruana M.
AU - Grima L.
PY - 2025
SP - 390
EP - 397
DO - 10.5220/0013513900003967
PB - SciTePress