A New Tool for Textual Aggregation In Information Retrieval

Mustapha Bouakkaz, Sabine Loudcher, Youcef Ouinten

Abstract

We present in this paper a system for textual aggregation from scientific documents in the online analytical processing (OLAP) context. The system extracts keywords automatically from a set of documents according to the lists compiled in the Microsoft Academia Search web site. It gives the user the possibility to choose their methods of aggregation among the implemented ones. That is TOP-Keywords, TOPIC, TUBE, TAG, BienCube and GOTA. The performance of the chosen methods, in terms of recall, precision, F-measure and runtime, is investigated with two real corpora ITINNOVATION and OHSUMED with 600 and 13,000 scientific articles respectively, other corpora can be integrated to the system by users.

References

  1. Archetti, F. and Campanelli, P. (2006). A hierarchical document clustering environment based on the induced bisecting k-means. International Conference on Database and Expert Systems Applications, pages 257-269.
  2. Blei, D. and Andrew, Y. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 42:993-1022.
  3. Bouakkaz, M., Loudcher, S., and Ouinten, Y. (2014). Automatic textual aggregation approach of scientific articles in olap context. 10th International Conference on Innovations in Information Technology.
  4. Bouakkaz, M., Loudcher, S., and Ouiten, Y. (2015). Gota: Using the google similarity distance for olap textual aggregation. 17th International Conference on Enterprise Information Systems (ICEIS).
  5. Bringay, S., Laurent, A., and Poncelet, P. (2011). Towards an on-line analysis of tweets processing. Database and Expert Systems Applications, pages 154-161.
  6. Cilibrasi, R. and Vitanyi, P. (2007). The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, pages 370-383.
  7. Hady, W., EcPeng, L., and HweeHua, P. (2007). Tube (textcube) for discovering documentary evidence of associations among entities. Symposium on Applied Computing, pages 824-828.
  8. Kohomban, U. and Lee, W. S. (2007). Optimizing classifier performance in word sense disambiguation by redefining sense classes. International Joint Conference on Artificial Intelligence , pages 1635-1640.
  9. Mihalcea, R. and Tarau, P. (2004). Textrank: Bringing order into texts. Empirical Methods in Natural Language Processing, pages 26-31.
  10. Moschitti, A. (2003). Natural language processing and text categorization: a study on the reciprocal beneficial interactions. PhD thesis, University of Rome Tor Vergata, Rome, Italy, pages 34-47.
  11. Moschitti, A. and Basili, R. (2004). Complex linguistic features for text classification: a comprehensive study. The 26th European Conference on Information Retrieval Research, pages 34-47.
  12. Oukid, L., Asfari, O., and Bentayeb, F. (2013). Cxt-cube: Contextual text cube model and aggregation operator for text olap. International Workshop On Data Warehousing and OLAP, pages 56-61.
  13. Poudat, C., Cleuziou, G., and Clavier, V. (2006). Cleuziou g., and clavier v., categorisation de textes en domaines et genres. complementarite des indexations lexicale et morpho syntaxique. Lexique et morphosyntaxe en RI, 9:61-76.
  14. Ravat, F., Teste, O., and Tournier, R. (2007). Olap aggregation function for textual data warehouse. In International Conference on Enterprise Information Systems, pages 151-156.
  15. Ravat, F., Teste, O., and Tournier, R. (2008). Top keyword extraction method for olap document. In International Conference on Data Warehousing and Knowledge Discovery, pages 257-269.
  16. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, pages 34- 47.
  17. Wartena, C. and Brussee, R. (2008). Topic detection by clustering keywords. International Conference on Database and Expert Systems Applications, pages 54- 58.
  18. Zhang, D., Zhai, C., and Han, J. (2009). Topic cube: Topic modeling for olap on multidimensional text databases. International Conference on Data Mining, pages 1124-1135.
Download


Paper Citation


in Harvard Style

Bouakkaz M., Loudcher S. and Ouinten Y. (2016). A New Tool for Textual Aggregation In Information Retrieval . In Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-187-8, pages 232-237. DOI: 10.5220/0005879702320237


in Bibtex Style

@conference{iceis16,
author={Mustapha Bouakkaz and Sabine Loudcher and Youcef Ouinten},
title={A New Tool for Textual Aggregation In Information Retrieval},
booktitle={Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2016},
pages={232-237},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005879702320237},
isbn={978-989-758-187-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 18th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A New Tool for Textual Aggregation In Information Retrieval
SN - 978-989-758-187-8
AU - Bouakkaz M.
AU - Loudcher S.
AU - Ouinten Y.
PY - 2016
SP - 232
EP - 237
DO - 10.5220/0005879702320237