Semi-supervised Distributed Clustering for Bioinformatics - Comparison Study

Huayiing Li, Aleksandar Jeremic

Abstract

Clustering analysis is a widely used technique in bioinformatics and biochemistry for variety of applications such as detection of new cell types, evaluation of drug response, etc. Since different applications and cells may require different clustering algorithms combining multiple clustering results into a consensus clustering using distributed clustering is a popular and efficient method to improve the quality of clustering analysis. Currently existing solutions are commonly based on supervised techniques which do not require any a priori knowledge. However in certain cases, a priori information on particular labelings may be available a priori. In these cases it is expected that performance improvement can be achieved by utilizing this prior information. To this purpose in this paper, we propose two semi-supervised distributed clustering algorithms and evaluate their performance for different base clusterings

References

  1. Aggarwal, C. C. and Reddy, C. K. (2013). Data clustering: algorithms and applications. CRC Press.
  2. Basu, S., Banerjee, A., and Mooney, R. (2002). Semisupervised clustering by seeding. In In Proceedings of 19th International Conference on Machine Learning (ICML-2002. Citeseer.
  3. Chapelle, O., Schölkopf, B., Zien, A., et al. (2006). Semisupervised learning.
  4. Dudoit, S. and Fridlyand, J. (2003). Bagging to Improve the Accuracy of a Clustering Procedure. Bioinformatics, 19(9):1090-1099.
  5. Fred, A. L. and Jain, A. K. (2005). Combining multiple clusterings using evidence accumulation. IEEE transactions on pattern analysis and machine intelligence, 27(6):835-850.
  6. Ghaemi, R., Sulaiman, M. N., Ibrahim, H., and Mustapha, N. (2009). A survey: clustering ensembles techniques. World Academy of Science, Engineering and Technology, 50:636-645.
  7. Liu, Y., Jin, R., and Jain, A. K. (2007). Boostcluster: Boosting clustering by pairwise constraints. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 450- 459. ACM.
  8. Pickett, J. P. (2006). The American heritage dictionary of the English language. Houghton Mifflin.
  9. Shariff, A., Kangas, J., Coelho, L. P., Quinn, S., and Murphy, R. F. (2010). Automated image analysis for highcontent screening and analysis. Journal of biomolecular screening, 15(7):726-734.
  10. Strehl, A. and Ghosh, J. (2003). Cluster ensemblesa knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3:583-617.
  11. Vega-Pons, S. and Ruiz-Shulcloper, J. (2011). A survey of clustering ensemble algorithms. International Journal of Pattern Recognition and Artificial Intelligence, 25(03):337-372.
  12. Wang, H., Shan, H., and Banerjee, A. (2011). Bayesian cluster ensembles. Statistical Analysis and Data Mining, 4(1):54-70.
  13. Xu, R. and Wunsch, D. (2008). Clustering, volume 10. John Wiley & Sons.
Download


Paper Citation


in Harvard Style

Li H. and Jeremic A. (2017). Semi-supervised Distributed Clustering for Bioinformatics - Comparison Study . In Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOSIGNALS, (BIOSTEC 2017) ISBN 978-989-758-212-7, pages 259-264. DOI: 10.5220/0006253502590264


in Bibtex Style

@conference{biosignals17,
author={Huayiing Li and Aleksandar Jeremic},
title={Semi-supervised Distributed Clustering for Bioinformatics - Comparison Study},
booktitle={Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOSIGNALS, (BIOSTEC 2017)},
year={2017},
pages={259-264},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006253502590264},
isbn={978-989-758-212-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 4: BIOSIGNALS, (BIOSTEC 2017)
TI - Semi-supervised Distributed Clustering for Bioinformatics - Comparison Study
SN - 978-989-758-212-7
AU - Li H.
AU - Jeremic A.
PY - 2017
SP - 259
EP - 264
DO - 10.5220/0006253502590264