Subspace Clustering and Visualization of Data Streams

Ibrahim Louhi, Lydia Boudjeloud-Assala, Thomas Tamisier

Abstract

In this paper, we propose a visual subspace clustering approach for data streams, allowing the user to visually track data stream behavior. Instead of detecting elements changes, the approach shows visually the variables impact on the stream evolution, by visualizing the subspace clustering at different levels in real time. First we apply a clustering on the variables set to obtain subspaces, each subspace consists of homogenous variables subset. Then we cluster the elements within each subspace. The visualization helps to show the approach originality and its usefulness in data streams processing.

References

  1. Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. (2003). A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases-Volume 29, pages 81-92. VLDB Endowment.
  2. Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. (2004). A framework for projected clustering of high dimensional data streams. In Proceedings of the Thirtieth international conference on Very large data basesVolume 30, pages 852-863. VLDB Endowment.
  3. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. (2005). Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, 11(1):5-33.
  4. Agrawal, R., Gehrke, J. E., Gunopulos, D., and Raghavan, P. (1999). Automatic subspace clustering of high dimensional data for data mining applications. US Patent 6,003,029.
  5. Assent, I., Krieger, R., Müller, E., and Seidl, T. (2007). Visa: visual subspace clustering analysis. ACM SIGKDD Explorations Newsletter, 9(2):5-12.
  6. Cheng, C.-H., Fu, A. W., and Zhang, Y. (1999). Entropybased subspace clustering for mining numerical data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 84-93. ACM.
  7. de Oliveira, M. F. and Levkowitz, H. (2003). From visual data exploration to visual data mining: a survey. IEEE Transactions on Visualization and Computer Graphics, 9(3):378-394.
  8. Fayyad, U. M., Wierse, A., and Grinstein, G. G. (2002). Information visualization in data mining and knowledge discovery. Morgan Kaufmann.
  9. Ferdosi, B. J., Buddelmeijer, H., Trager, S., Wilkinson, M. H., and Roerdink, J. B. (2010). Finding and visualizing relevant subspaces for clustering highdimensional astronomical data using connected morphological operators. In Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on, pages 35-42. IEEE.
  10. Gao, J., Li, J., Zhang, Z., and Tan, P.-N. (2005). An incremental data stream clustering algorithm based on dense units detection. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 420- 425. Springer.
  11. Goil, S., Nagesh, H., and Choudhary, A. (1999). Mafia: Efficient and scalable subspace clustering for very large data sets. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 443-452. ACM.
  12. Havre, S., Hetzler, E., Whitney, P., and Nowell, L. (2002). Themeriver: Visualizing thematic changes in large document collections. IEEE transactions on visualization and computer graphics, 8(1):9-20.
  13. Hawkins, D. M. (1980). Identification of outliers, volume 11. Springer.
  14. Hund, M., Böhm, D., Sturm, W., Sedlmair, M., Schreck, T., Ullrich, T., Keim, D. A., Majnaric, L., and Holzinger, A. (2016). Visual analytics for concept exploration in subspaces of patient groups. Brain Informatics, pages 1-15.
  15. Keim, D. A. (2002). Information visualization and visual data mining. IEEE Transactions on Visualization and Computer Graphics, 8(1):1-8.
  16. Keim, D. A., Mansmann, F., Schneidewind, J., and Ziegler, H. (2006). Challenges in visual data analysis. In Tenth International Conference on Information Visualisation (IV'06), pages 9-16. IEEE.
  17. Kovalerchuk, B. and Schwing, J. (2005). Visual and spatial analysis: advances in data mining, reasoning, and problem solving. Springer Science & Business Media.
  18. Kriegel, H.-P., Kröger, P., Ntoutsi, I., and Zimek, A. (2011). Density based subspace clustering over dynamic data. In International Conference on Scientific and Statistical Database Management, pages 387-404. Springer.
  19. Lichman, M. (2013). Uci machine learning repository. https://archive.ics.uci.edu/ml/datasets/KDD+Cup+19 99+Data. (consulted on: 11.12.2015).
  20. Louhi, I., Boudjeloud-Assala, L., and Tamisier, T. (2016). Incremental nearest neighborhood graph for data stream clustering. In 2016 International Joint Conference on Neural Networks, IJCNN 2016, Vancouver, BC, Canada, July 24-29, 2016, pages 2468-2475.
  21. Muller, E., Assent, I., Krieger, R., Jansen, T., and Seidl, T. (2008). Morpheus: interactive exploration of subspace clustering. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1089-1092. ACM.
  22. Pearson, K. (1901). Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559-572.
  23. Soukup, T. and Davidson, I. (2002). Visual data mining: Techniques and tools for data visualization and mining. John Wiley & Sons.
  24. Tatu, A., Zhang, L., Bertini, E., Schreck, T., Keim, D., Bremm, S., and Von Landesberger, T. (2012). Clustnails: Visual analysis of subspace clusters. Tsinghua Science and Technology, 17(4):419-428.
  25. Torgerson, W. S. (1958). Theory and methods of scaling.
  26. Vadapalli, S. and Karlapalem, K. (2009). Heidi matrix: nearest neighbor driven high dimensional data visualization. In Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration, pages 83-92. ACM.
Download


Paper Citation


in Harvard Style

Louhi I., Boudjeloud-Assala L. and Tamisier T. (2017). Subspace Clustering and Visualization of Data Streams . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: IVAPP, (VISIGRAPP 2017) ISBN 978-989-758-228-8, pages 259-265. DOI: 10.5220/0006169702590265


in Bibtex Style

@conference{ivapp17,
author={Ibrahim Louhi and Lydia Boudjeloud-Assala and Thomas Tamisier},
title={Subspace Clustering and Visualization of Data Streams},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: IVAPP, (VISIGRAPP 2017)},
year={2017},
pages={259-265},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006169702590265},
isbn={978-989-758-228-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: IVAPP, (VISIGRAPP 2017)
TI - Subspace Clustering and Visualization of Data Streams
SN - 978-989-758-228-8
AU - Louhi I.
AU - Boudjeloud-Assala L.
AU - Tamisier T.
PY - 2017
SP - 259
EP - 265
DO - 10.5220/0006169702590265