Stream-based Active Learning in the Presence of Label Noise

Mohamed-Rafik Bouguelia, Yolande Belaïd, Abdel Belaïd

2015

Abstract

Mislabelling is a critical problem for stream-based active learning methods because it not only impacts the classification accuracy but also deviates the active learner from querying informative data. Dealing with label noise is omitted by most existing active learning methods. We address this issue and propose an efficient method to identify and mitigate mislabelling errors for active learning in the streaming setting. We first propose a mislabelling likelihood measure to characterize the potentially mislabelled instances. This measure is based on the degree of disagreement among the predicted and the queried class label (given by the labeller). Then, we derive a measure of informativeness that expresses how much the label of an instance needs to be corrected by an expert labeller. Specifically, an instance is worth relabelling if it shows highly conflicting information among the predicted and the queried labels. We show that filtering instances with a high mislabelling likelihood and correcting only the filtered instances with a high conflicting information greatly improves the performances of the active learner. Experiments on several real world data prove the effectiveness of the proposed method in terms of filtering efficiency and classification accuracy of the stream-based active learner.

References

  1. Bouguelia, M.-R., Belaïd, Y., and Belaïd, A. (2013). A stream-based semi-supervised active learning approach for document classification. ICDAR, pages 611-615.
  2. Brodley, C. and Friedl, M. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, pages 131-167.
  3. Dasgupta, S. (2005). Coarse sample complexity bounds for active learning. Neural Information Processing Systems (NIPS), pages 235-242.
  4. Fang, M. and Zhu, X. (2013). Active learning with uncertain labeling knowledge. Pattern Recognition Letters, pages 98-108.
  5. Frénay, B. and Verleysen, M. (2013). Classification in the presence of label noise: a survey. IEEE Transactions on Neural Networks and Learning Systems, pages 845-869.
  6. Gamberger, D., Lavrac, N., and Dzeroski, S. (1996). Noise elimination in inductive concept learning: A case study in medical diagnosis. Algorithmic Learning Theory, pages 199-212.
  7. Goldberg, A., Zhu, X., Furger, A., and Xu, J. (2011). Oasis: Online active semi-supervised learning. AAAI Conference on Artificial Intelligence, pages 1-6.
  8. Huang, L., Liu, Y., Liu, X., Wang, X., and Lang, B. (2014). Graph-based active semi-supervised learning: A new perspective for relieving multi-class annotation labor. IEEE International Conference Multimedia and Expo, pages 1-6.
  9. Ipeirotis, P., Provost, F., Sheng, V., and Wang, J. (2014). Repeated labeling using multiple noisy labelers. ACM Conference on Knowledge Discovery and Data Mining, pages 402-441.
  10. Kremer, J., Pedersen, K. S., and Igel, C. (2014). Active learning with support vector machines. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, pages 313-326.
  11. Kushnir, D. (2014). Active-transductive learning with labeladapted kernels. ACM SIGKDD international conference on Knowledge discovery and data mining, pages 462-471.
  12. Pedregosa, F. and et al. (2011). scikit-learn, scikit-learn: Machine learning in Python. Journal of Machine Learning Research, pages 2825-2830.
  13. Rebbapragada, Brodley, C., Sulla-Menashe, D., and Friedl, M. A. (2012). Active label correction. IEEE International Conference on Data Mining, pages 1080-1085.
  14. Rousseeuw, P. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics, pages 53-65.
  15. Settles, B. (2012). Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, pages 1-114.
  16. Sheng, V., Provost, F., and Ipeirotis, P. (2008). Get another label? improving data quality and data mining using multiple noisy labelers. ACM Conference on Knowledge Discovery and Data Mining, pages 614-622.
  17. Sun, S. (2013). A survey of multi-view machine learning. Neural Computing and Applications, pages 2031- 2038.
  18. Tuia, D. and Munoz-Mari, J. (2013). Learning user's confidence for active learning. IEEE Transactions on Geoscience and Remote Sensing, pages 872-880.
  19. Yan, Y., Fung, G., Rosales, R., and Dy., J. (2011). Active learning from crowds. International Conference on Machine Learning, pages 1161-1168.
  20. Zhu, X., Zhang, P., Wu, X., He, D., Zhang, C., and Shi, Y. (2008). Cleansing noisy data streams. IEEE International Conference on Data Mining, pages 1139-1144.
  21. Zliobaite, I., Bifet, A., Pfahringer, B., and Holmes, G. (2014). Active learning with drifting streaming data. IEEE transactions on neural networks and learning systems, pages 27-39.
Download


Paper Citation


in Harvard Style

Bouguelia M., Belaïd Y. and Belaïd A. (2015). Stream-based Active Learning in the Presence of Label Noise . In Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-076-5, pages 25-34. DOI: 10.5220/0005178900250034


in Bibtex Style

@conference{icpram15,
author={Mohamed-Rafik Bouguelia and Yolande Belaïd and Abdel Belaïd},
title={Stream-based Active Learning in the Presence of Label Noise},
booktitle={Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2015},
pages={25-34},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005178900250034},
isbn={978-989-758-076-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Stream-based Active Learning in the Presence of Label Noise
SN - 978-989-758-076-5
AU - Bouguelia M.
AU - Belaïd Y.
AU - Belaïd A.
PY - 2015
SP - 25
EP - 34
DO - 10.5220/0005178900250034