Controlling the Cost of Prediction in using a Cascade of Reject Classifiers for Personalized Medicine

Blaise Hanczar, Avner Bar-Hen

Abstract

The supervised learning in bioinformatics is a major tool to diagnose a disease, to identify the best therapeutic strategy or to establish a prognostic. The main objective in classifier construction is to maximize the accuracy in order to obtain a reliable prediction system. However, a second objective is to minimize the cost of the use of the classifier on new patients. Despite the control of the classification cost is high important in the medical domain, it has been very little studied. We point out that some patients are easy to predict, only a small subset of medical variables are needed to obtain a reliable prediction. The prediction of these patients can be cheaper than the others patient. Based on this idea, we propose a cascade approach that decreases the classification cost of the basic classifiers without dropping their accuracy. Our cascade system is a sequence of classifiers with rejects option of increasing cost. At each stage, a classifier receives all patients rejected by the last classifier, makes a prediction of the patient and rejects to the next classifier the patients with low confidence prediction. The performances of our methods are evaluated on four real medical problems.

References

  1. Ambroise, C. and McLachlan, G. (2002). Selection bias in gene extraction on the basis of microarray gene expression data. Proc. Natl. Acad. Sci., 99(10):6562- 6566.
  2. Chow, C. (1970). On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1):41-46.
  3. Diaz-Uriarte, R. and Alvarez de Andres, S. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(3).
  4. Dudoit, S., Fridlyand, J., and Speed, P. (2002). Comparison of discrimination methods for classification of tumors using gene expression data. Journal of American Statististial Association, 97:77-87.
  5. Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., and Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10):906-914.
  6. Hood, L. and Friend, S. H. (2011). Predictive, personalized, preventive, participatory (p4) cancer medicine. Nat Rev Clin Oncol, 8(3):184-187.
  7. Kapoor, A. and Horvitz, E. (2009). Breaking boundaries: Active information acquisition across learning and diagnosis. Advances in neural information processing systems.
  8. Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwarb, M., Antonescu, C., Peterson, C., and Meltzer, P. (2001). Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medecine, 7:673-679.
  9. Nan, F., Wang, J., and Saligrama, V. (2015). Featurebudgeted random forest. International Conference on Machine Learning.
  10. Raykar, V. C., Krishnapuram, B., and Yu, S. (2010). Designing efficient cascaded classifiers: tradeoff between accuracy and cost. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 853-860. ACM.
  11. Saar-Tsechansky, M., Melville, P., and Provost, F. (2009). Active feature-value acquisition. Management Science, 55(4):664-684.
  12. Singh, D., Febbo, P., Ross, K., Jackson, D., Manola, J., and Ladd, C. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell., 1(2):203- 209.
  13. Smith, J. W., Everhart, J., Dickson, W., Knowler, W., and Johannes, R. (1988). Using the adap learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Annual Symposium on Computer Application in Medical Care, page 261. American Medical Informatics Association.
  14. Tan, Y. F. and yen Kan, M. (2010). Cost-sensitive attribute value acquisition for support vector machines. Technical report, National University of Singapore.
  15. Trapeznikov, K. and Saligrama, V. (2013). Supervised sequential classification under budget constraints. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, pages 581- 589.
  16. Viola, P. and Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2):137-154.
  17. Wang, L., Lin, J., and Metzler, D. (2011). A cascade ranking model for efficient ranked retrieval. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 7811, pages 105-114, New York, NY, USA. ACM.
  18. Yang Pengyi; Hwa Yang Yee; Bing B. Zhou;, B. B. Z. (2010). A review of ensemble methods in bioinformatics. Current Bioinformatics, 5(4):296.
Download


Paper Citation


in Harvard Style

Hanczar B. and Bar-Hen A. (2016). Controlling the Cost of Prediction in using a Cascade of Reject Classifiers for Personalized Medicine . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 42-50. DOI: 10.5220/0005685500420050


in Bibtex Style

@conference{bioinformatics16,
author={Blaise Hanczar and Avner Bar-Hen},
title={Controlling the Cost of Prediction in using a Cascade of Reject Classifiers for Personalized Medicine},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)},
year={2016},
pages={42-50},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005685500420050},
isbn={978-989-758-170-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2016)
TI - Controlling the Cost of Prediction in using a Cascade of Reject Classifiers for Personalized Medicine
SN - 978-989-758-170-0
AU - Hanczar B.
AU - Bar-Hen A.
PY - 2016
SP - 42
EP - 50
DO - 10.5220/0005685500420050