Comparative Study on Data Mining Techniques Applied to Breast Cancer Gene Expression Profiles

Sérgio Mosquim Júnior, Juliana de Oliveira

Abstract

Breast cancer has the second highest incidence among all cancer types and is the fifth cause of cancer related death among women. In Brazil, breast cancer mortality rates have been rising. Cancer classification is intricate, mainly when differentiating subtypes. In this context, data mining becomes a fundamental tool to analyze genotypic data, improving diagnostics, treatment and patient care. As the data dimensionality is problematic, methods to reduce it must be applied. Hence, the present study aims at the analysis of two data mining methods (i.e., decision trees and artificial neural networks). Weka® and MATLAB® were used to implement these two methodologies. Decision trees appointed important genes for the classification. Optimal artificial neural network architecture consists of two layers, one with 99 neurons and the other with 5. Both data mining techniques were able to classify data with high accuracy.

References

  1. Aguiar-Pulido, V., et al. 2013. Exploring patterns of epigenetic information with data mining techniques. Curr Pharm Des, 19, 779-89.
  2. Ahmad, P., et al. 2015. Techniques of Data Mining In Healthcare: A Review. International Journal of Computer Applications, 120, 38-50.
  3. BRASIL. Instituto Nacional do Cancer (INCA). 2016. O que é o cancer? Available at http://www1.inca.gov.br/conteudo_view.asp?id=322. Accessed: 27 March 2016.
  4. Desantis, C.E., et al. 2015. International Variation in Female Breast Cancer Incidence and Mortality Rates. Cancer Epidemiol Biomarkers Prev, 24, 1495-506.
  5. Elias, D., Ditzel, H.J. 2015. Fyn is an important molecule in cancer pathogenesis and drug resistance. Pharmacological Research, 100, 250-254.
  6. EUA. The Cancer Genome Atlas. National Institute of Health. 2016. About TCGA. Available at <http://cancergenome.nih.gov/abouttcga>. Accessed 27 March 2016.
  7. Ferlay, J., et al. 2015. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer, 136, E359-86.
  8. Gamito, E.J., Crawford, E.D. 2004. Artificial neural networks for predictive modeling in prostate cancer. Curr Oncol Rep, 6, 216-21.
  9. Greer, B.T., Khan, J. 2004. Diagnostic classification of cancer using DNA microarrays and artificial intelligence. Ann N Y Acad Sci, 1020, 49-66.
  10. Hall, M.A., et al. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations 11, 1.
  11. Kalmegh, S. 2015. Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News. IJISET - International Journal of Innovative Science, Engineering & Technology, 2.
  12. Kingsford, C., Salzberg, S. L. 2008. What are decision trees? Nat Biotechnol, 26, 1011-3.
  13. Kumar, S., et al. 2012. Emerging Roles of ADAMTSs in Angiogenesis and Cancer. Cancers, 4, 1252-1299.
  14. Liu, B., et al. 2004. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics, 5, 136.
  15. Loh, W.-Y. 2014. Fifty Years of Classification and Regression Trees. International Statistical Review, 82, 329-348.
  16. Lokuan, E., Ziegler, S.F. 2014. Thymic Stromal Lymphopoietin (TSLP) and Cancer. Journal of immunology (Baltimore, Md. : 1950), 193, 4283-4288.
  17. Peruzzi, D., et al. 2009. MMP11: A Novel Target Antigen for Cancer Immunotherapy. Clinical Cancer Research, 15, 4104-4113.
  18. Podgorelec, V., et al. 2002. Decision trees: an overview and their use in medicine. J Med Syst, 26, 445-63.
  19. Porter, S., et al. 2004. Dysregulated Expression of Adamalysin-Thrombospondin Genes in Human Breast Carcinoma. Clinical Cancer Research, 10, 2429-2440.
  20. Roan, F., et al. 2012. The multiple facets of thymic stromal lymphopoietin (TSLP) during allergic inflammation and beyond. Journal of Leukocyte Biology, 91, 877-886.
  21. Sa'di, S., et al. 2015. Comparison of Data Mining Algorithms in the Diagnosis of Type Ii Diabetes. International Journal on Computational Science & Applications, 5, 1-12.
  22. Saito, Y.D., et al. 2010. Fyn: a novel molecular target in prostate cancer. Cancer, 116, 1629-1637.
  23. Shah, S., Kusiak, A. 2007. Cancer gene search with datamining and genetic algorithms. Comput Biol Med, 37, 251-61.
  24. Tseng, W.T., et al. 2015. The application of data mining techniques to oral cancer prognosis. J Med Syst, 39, 59.
  25. Witten, I.H., et al., 2011. Data Mining Practical Machine Learning Tools and Techniques, 3rd ed. Morgan Kaufmann Publishers, USA.
Download


Paper Citation


in Harvard Style

Mosquim Júnior S. and de Oliveira J. (2017). Comparative Study on Data Mining Techniques Applied to Breast Cancer Gene Expression Profiles . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017) ISBN 978-989-758-214-1, pages 168-175. DOI: 10.5220/0006170201680175


in Bibtex Style

@conference{bioinformatics17,
author={Sérgio Mosquim Júnior and Juliana de Oliveira},
title={Comparative Study on Data Mining Techniques Applied to Breast Cancer Gene Expression Profiles},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)},
year={2017},
pages={168-175},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006170201680175},
isbn={978-989-758-214-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, (BIOSTEC 2017)
TI - Comparative Study on Data Mining Techniques Applied to Breast Cancer Gene Expression Profiles
SN - 978-989-758-214-1
AU - Mosquim Júnior S.
AU - de Oliveira J.
PY - 2017
SP - 168
EP - 175
DO - 10.5220/0006170201680175