Gaussian Process for Regression in Business Intelligence: A Fraud Detection Application

Bruno H. A. Pilon, Juan J. Murillo-Fuentes, João Paulo C. L. da Costa, Rafael T. de Sousa Júnior, Antonio M. R. Serrano

2015

Abstract

Business Intelligence (BI) systems are designed to provide information to support the decision making process in companies and governmental institutions. In this scenario, future events depend on the decisions and on the previous events. Therefore, the mathematical analysis of past data can be an important tool for the decision making process and to detect anomalies. Depending on the amount and the type of data to be analyzed, techniques from statistics, Machine Learning (ML), data mining and signal processing can be used to automate all or part of the system. In this paper, we propose to incorporate Gaussian Process for Regression (GPR) in BI systems in order to predict the data. As presented in this work, fraud detection is one important application of BI systems. We show that such application is possible with the use of GPR in the predictive stage, considering that GPR natively returns a full statistical description of the estimated variable, which can be used as a trigger measure to classify trusted and untrusted data. We validate our proposal with real world BI data provided by the Brazilian Federal Patrimony Department (SPU), regarding the monthly collection of federal taxes. In order to take into account the multidimensional structure of this specific data, we propose a pre-processing stage for reshaping the original time series into a bidimensional structure. The resulting algorithm, with GPR at its core, outperforms classical predictive schemes such as Artificial Neural Network (ANN).

References

  1. Bernardo, J., Berger, J., Dawid, A., Smith, A., et al. (1998). Regression and classification using Gaussian process priors. Bayesian statistics, 6:475.
  2. Blum, M. and Riedmiller, M. (2013). Optimization of Gaussian process hyperparameters using Rprop. In European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning.
  3. Bolton, R. J. and Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, pages 235-249.
  4. Cinlar, E. (2013). Introduction to stochastic processes. Courier Dover Publications.
  5. Davis, R. A. (2001). Gaussian process. In Brillinger, D., editor, Encyclopedia of Environmetrics, Section on Stochastic Modeling and Environmental Change, NY. Willey.
  6. Dorronsoro, J. R., Ginel, F., Sánchez, C., and Cruz, C. (1997). Neural fraud detection in credit card operations. Neural Networks, IEEE Transactions on, 8(4):827-834.
  7. IBGE (2013). Historical series of IPCA.
  8. MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge university press.
  9. Murray-Smith, R. and Girard, A. (2001). Gaussian process priors with ARMA noise models. In Irish Signals and Systems Conference, pages 147-152. Maynooth.
  10. Nagi, J., Yap, K., Tiong, S., Ahmed, S., and Mohammad, A. (2008). Detection of abnormalities and electricity theft using genetic support vector machines. In TENCON 2008-2008 IEEE Region 10 Conference, pages 1-6. IEEE.
  11. Orfanidis, S. J. (2007). Optimum signal processing: an introduction. McGraw-Hill, New York, NY. ISBN 0- 979-37131-7.
  12. Pérez-Cruz, F. and Bousquet, O. (2004). Kernel methods and their potential use in signal processing. Signal Processing Magazine, 21(3):57-65.
  13. Pérez-Cruz, F., Van Vaerenbergh, S., Murillo-Fuentes, J. J., Lázaro-Gredilla, M., and Santamaria, I. (2013). Gaussian processes for nonlinear signal processing. IEEE Signal Processing Magazine, 30(4):40-50.
  14. Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA. ISBN 0-262-18253-X.
  15. Robertson, D. (2013). Global card fraud losses reach $11.27 billion in 2012. Nilson Report, The, (1023):6.
  16. Secretaria de Patrimonio da Unia˜o (SPU) (2011). Relatório de gesta˜o 2010.
  17. Serrano, A. M. R., da Costa, J. P. C. L., Cardonha, C. H., Fernandes, A. A., and de Sousa Jr., R. T. (2012). Neural network predictor for fraud detection: A study case for the federal patrimony department. In Proceeding of the Seventh International Conference on Forensic Computer Science (ICoFCS) 2012, pages 61-66, Brasília, Brazil. ABEAT. ISBN 978-85-65069-08-3.
  18. Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. In NIPS, pages 2960-2968.
  19. Williams, C. K. and Barber, D. (1998). Bayesian classification with Gaussian processes. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(12):1342-1351.
  20. Williams, C. K. and Rasmussen, C. E. (1996). Gaussian processes for regression.
  21. 1. Normalized Root Mean Squared Error (NRMSE):
  22. 2. Mean Absolute Relative Error (MARE):
  23. u P (ti - yi)2
  24. u 1 i=1
Download


Paper Citation


in Harvard Style

Pilon B., J. Murillo-Fuentes J., Paulo C. L. da Costa J., T. de Sousa Júnior R. and M. R. Serrano A. (2015). Gaussian Process for Regression in Business Intelligence: A Fraud Detection Application . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS, (IC3K 2015) ISBN 978-989-758-158-8, pages 39-49. DOI: 10.5220/0005593000390049


in Bibtex Style

@conference{kmis15,
author={Bruno H. A. Pilon and Juan J. Murillo-Fuentes and João Paulo C. L. da Costa and Rafael T. de Sousa Júnior and Antonio M. R. Serrano},
title={Gaussian Process for Regression in Business Intelligence: A Fraud Detection Application},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS, (IC3K 2015)},
year={2015},
pages={39-49},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005593000390049},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KMIS, (IC3K 2015)
TI - Gaussian Process for Regression in Business Intelligence: A Fraud Detection Application
SN - 978-989-758-158-8
AU - Pilon B.
AU - J. Murillo-Fuentes J.
AU - Paulo C. L. da Costa J.
AU - T. de Sousa Júnior R.
AU - M. R. Serrano A.
PY - 2015
SP - 39
EP - 49
DO - 10.5220/0005593000390049