Tackling the Problem of Data Imbalancing for Melanoma Classification

Mojdeh Rastgoo, Guillaume Lemaitre, Joan Massich, Olivier Morel, Franck Marzani, Rafael Garcia, Fabrice Meriaudeau


Malignant melanoma is the most dangerous type of skin cancer, yet melanoma is the most treatable kind of cancer when diagnosed at an early stage. In this regard, Computer-Aided Diagnosis systems based on machine learning have been developed to discern melanoma lesions from benign and dysplastic nevi in dermoscopic images. Similar to a large range of real world applications encountered in machine learning, melanoma classification faces the challenge of imbalanced data, where the percentage of melanoma cases in comparison with benign and dysplastic cases is far less. This article analyzes the impact of data balancing strategies at the training step. Subsequently, Over-Sampling (OS) and Under-Sampling (US) are extensively compared in both feature and data space, revealing that NearMiss-2 (NM2) outperform other methods achieving Sensitivity (SE) and Specificity (SP) of 91.2% and 81.7%, respectively. More generally, the reported results highlight that methods based on US or combination of OS and US in feature space outperform the others.


  1. Abbasi, N. R., Shaw, H. M., et al. (2004). Early diagnosis of cutaneous melanoma: revisiting the abcd criteria. Jama, 292(22):2771-2776.
  2. American-Cancer-Society (2014). Cancer facts & figures 2014.
  3. Barata, C., Marques, J. S., and Emre Celebi, M. (2013). Towards an automatic bag-of-features model for the classification of dermoscopy images: The influence of segmentation. In Image and Signal Processing and Analysis (ISPA), 2013 8th International Symposium on, pages 274-279. IEEE.
  4. Barata, C., Ruela, M., Francisco, M., Mendonc¸a, T., and Marques, J. (2014). Two systems for the detection of melanomas in dermoscopy images using texture and color features. IEEE Systems Journal,, 8(3):965-979.
  5. Batista, G. E., Bazzan, A. L., and Monard, M. C. (2003). Balancing training data for automated annotation of keywords: a case study. In WOB, pages 10-18.
  6. Batista, G. E., Prati, R. C., and Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1):20-29.
  7. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
  8. Capdehourat, G., Corez, A., Bazzano, A., and Musé, P. (2009). Pigmented skin lesions classification using dermatoscopic images. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pages 537-544. Springer.
  9. Celebi, M. E., Kingravi, H. A., et al. (2007). A methodological approach to the classification of dermoscopy images. Computerized Medical Imaging and Graphics, 31(6):362-373.
  10. Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority oversampling technique. Journal of artificial intelligence research, pages 321-357.
  11. Forsea, A., Del Marmol, V., de Vries, E., Bailey, E., and Geller, A. (2012). Melanoma incidence and mortality in europe: new estimates, persistent disparities. British Journal of Dermatology, 167(5):1124-1130.
  12. Guo, Z. and Zhang, D. (2010). A completed modeling of local binary pattern operator for texture classification. IEEE Transactions on Image Processing, 19(6):1657- 1663.
  13. He, H., Garcia, E., et al. (2009). Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 21(9):1263-1284.
  14. Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution. Springer.
  15. Mani, I. and Zhang, I. (2003). knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of Workshop on Learning from Imbalanced Datasets.
  16. Prati, R. C., Batista, G. E., and Monard, M. C. (2009). Data mining with imbalanced class distributions: concepts and methods. In IICAI, pages 359-376.
  17. Rastgoo, M., Garcia, R., Morel, O., and Marzani, F. (2015a). Automatic differentiation of melanoma from dysplastic nevi. Computerized Medical Imaging and Graphics, 43:44-52.
  18. Rastgoo, M., Morel, O., Marzani, F., and Garcia, R. (2015b). Ensemble approach for differentiation of malignant melanoma. In The International Conference on Quality Control by Artificial Vision 2015, pages 953415-953415. International Society for Optics and Photonics.
  19. Tomek, I. (1976). Two modifications of cnn. IEEE Trans. Syst. Man Cybern., 6:769-772.
  20. Van De Weijer, J. and Schmid, C. (2006). Coloring local feature extraction. In Computer Vision-ECCV 2006, pages 334-348. Springer.

Paper Citation

in Harvard Style

Rastgoo M., Lemaitre G., Massich J., Morel O., Marzani F., Garcia R. and Meriaudeau F. (2016). Tackling the Problem of Data Imbalancing for Melanoma Classification . In Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: BIOIMAGING, (BIOSTEC 2016) ISBN 978-989-758-170-0, pages 32-39. DOI: 10.5220/0005703400320039

in Bibtex Style

author={Mojdeh Rastgoo and Guillaume Lemaitre and Joan Massich and Olivier Morel and Franck Marzani and Rafael Garcia and Fabrice Meriaudeau},
title={Tackling the Problem of Data Imbalancing for Melanoma Classification},
booktitle={Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: BIOIMAGING, (BIOSTEC 2016)},

in EndNote Style

JO - Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: BIOIMAGING, (BIOSTEC 2016)
TI - Tackling the Problem of Data Imbalancing for Melanoma Classification
SN - 978-989-758-170-0
AU - Rastgoo M.
AU - Lemaitre G.
AU - Massich J.
AU - Morel O.
AU - Marzani F.
AU - Garcia R.
AU - Meriaudeau F.
PY - 2016
SP - 32
EP - 39
DO - 10.5220/0005703400320039