Prediction for Disease Risk and Medical Cost using Time Series Healthcare Data

Masatoshi Nagata, Kazunori Matsumoto, Masayuki Hashimoto

Abstract

Foreseeing the medical expenditure is beneficial for both insurance companies and individuals. In this paper we propose a new methodology to predict disease risk and medical cost. Based on sequential latent dirichlet allocation (SeqLDA), which classifies hierarchical sequential data into segments of topics, we tried to predict the number of people with diseases and the one-year cost of lifestyle-related diseases. Using the health checkup information and medical claims of 6500 people for three years, we achieved that prediction error was less than conventional LDA, and for accuracy rate, AUC was more than 0.71. The results suggest that the SeqLDA method serve to predict the number of people with diseases and the related medical costs using time series healthcare data.

References

  1. Bertsimas D, Bjarnadottir MV, Kane MA, Kryder JC, Pandey R, Vempala S, and Wang G, (2008). Algorithmic prediction of health-care costs. Oper. Res., vol. 56, 1382 -1392.
  2. Blei DM, Ng AY and Jordan MI, (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993-1022.
  3. Brandle M, Zhou H, Smith BR, Marriott D, Burke R, Tabaei BP, Brown MB, Herman WH, (2003). The Direct Medical Cost of Type 2 Diabetes. Diabetes Care. 26(8):2300-4.
  4. DeLong ER, DeLong DM, Clarke-Pearson DL, (1988). Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach, Biometrics. 44(3):837-45.
  5. Griffiths TL and Steyvers M, (2004). Finding scientific topics, pnas, 101:5228-5235.
  6. Kashima S, Inoue K, Matsumoto M, Akimoto K, (2013). Do Non-Glycaemic Markers Add Value to Plasma Glucose and Hemoglobin A1c in Predicting Diabetes? Yuport Health Checkup Center Study. PLoS One. 20;8(6).
  7. Mizushima Research Team of the Ministry of Health, Labour and Welfare, (2007). Report of LifestyleRelated Disease Administration Using Medical Checkup and Billing Data.
  8. Lan Du, Wray Buntine, and Huidong Jin, (2010). Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document. IEEE Computer Society, 148-157.
  9. Lan Du, Wray Buntine, Huidong Jin, Changyou Chen, (2012). Sequential latent Dirichlet allocation. Knowledge and Information Systems. vol. 31, 3, 475- 503.
  10. Lim SS, Vos T, Flaxman AD, Danaei G, Shibuya K, Adair-Rohani H, et al. (2012). A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010. Lancet. 380(9859):2224-2260.
  11. Ogawa K, Matsumoto K, Hashimoto M, and Nagatomi R, (2015). Method of Screening the Health of Persons with High Risk for Potential Lifestyle-related Diseases using LDA - Toward a Better Screening Method for Persons with High Health Risks. SciTePress, 502-507.
  12. Teh YW, Jordan MI, and Beal MJ, (2006). Hierarchical Dirichlet Processes, Journal of the American Statistical Association, vol.01 476, 1566-1581.
  13. WHO. (2009). Global health risks: morality and burden of disease attributable to selected major risks. World Health Organization, Geneva.
  14. Zhao Y, Ash AS, Ellis RP, Ayanian JZ, Pope GC, Bowen B, Weyuker L, (2005). Predicting Pharmacy Costs and Other Medical Costs Using Diagnoses and Drug Claims, Med Care. 43(1):34-43.
Download


Paper Citation


in Harvard Style

Nagata M., Matsumoto K. and Hashimoto M. (2016). Prediction for Disease Risk and Medical Cost using Time Series Healthcare Data . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 517-522. DOI: 10.5220/0005827405170522


in Bibtex Style

@conference{healthinf16,
author={Masatoshi Nagata and Kazunori Matsumoto and Masayuki Hashimoto},
title={Prediction for Disease Risk and Medical Cost using Time Series Healthcare Data},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)},
year={2016},
pages={517-522},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005827405170522},
isbn={978-989-758-170-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, (BIOSTEC 2016)
TI - Prediction for Disease Risk and Medical Cost using Time Series Healthcare Data
SN - 978-989-758-170-0
AU - Nagata M.
AU - Matsumoto K.
AU - Hashimoto M.
PY - 2016
SP - 517
EP - 522
DO - 10.5220/0005827405170522