Retrieval, Visualization and Validation of Affinities between Documents

Luis Trigo, Martin Víta, Rui Sarmento, Pavel Brazdil

2015

Abstract

We present an Information Retrieval tool that facilitates the task of the user when searching for a particular information that is of interest to him. Our system processes a given set of documents to produce a graph, where nodes represent documents and links the similarities. The aim is to offer the user a tool to navigate in this space in an easy way. It is possible to collapse/expand nodes. Our case study shows affinity groups based on the similarities of text production of researchers. This goes beyond the already established communities revealed by co-authorship. The system characterizes the activity of each author by a set of automatically generated keywords and by membership to a particular affinity group. The importance of each author is highlighted visually by the size of the node corresponding to the number of publications and different measures of centrality. Regarding the validation of the method, we analyse the impact of using different combinations of titles, abstracts and keywords on capturing the similarity between researchers.

References

  1. Arnold, A. and Cohen, W. W. (2009). Information extraction as link prediction: Using curated citation networks to improve gene detection. In Proc. of the 3rd Int. Conf. on Weblogs and Social Media, ICWSM 2009, San Jose, California, USA, May 17-20, 2009.
  2. Authenticus (2014). Authenticus bibliographic database. https://authenticus.up.pt/. Accessed: 2014-09-30.
  3. Baez, M., Mirylenka, D., and Parra, C. (2011). Understanding and supporting search for scholarly knowledge. In 7th European Computer Science Summit, Milano, Italy.
  4. Bar-Ilan, J., Mat-Hassan, M., and Levene, M. (2006). Methods for comparing rankings of search engine results. Computer networks, 50(10):1448-1463.
  5. Bednar, P., Welch, C., and Graziano, A. (2007). Learning objects and their implications on learning: A case of developing the foundation for a new knowledge infrastructure. Learning objects: Applications, implications & future directions.
  6. Brazdil, P., Trigo, L., Cordeiro, J., Sarmento, R., and Valizadeh, M. (2015). Affinity mining of documents sets via network analysis, keywords and summaries. Oslo Studies in Language, 7(1).
  7. Bugla, S. (2009). Name identification in scientific publications. Master's thesis, FCUP, University of Porto, Portugal.
  8. Fagin, R., Kumar, R., and Sivakumar, D. (2003). Comparing top k lists. SIAM Journal on Discrete Mathematics, 17(1):134-160.
  9. Feldman, R. and Sanger, J. (2007). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
  10. Gallicyadas (2015). Affinity miner online prototype. http://gallicyadas.pt/affinity-miner.
  11. Goldstone, R. L. and Rogosky, B. J. (2002). Using relations within conceptual systems to translate across conceptual systems. Cognition, 84(3):295-320.
  12. Huang, S., Wan, X., and Tang, X. (2013). Amrec: An intelligent system for academic method recommendation. In Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence.
  13. Iacobucci, D. (1994). Graphs and Matrices. In: Wasserman, S. (eds), Social network analysis: methods and applications. PP. 92-166. Cambridge University Press, New York.
  14. INESC-TEC (2015). Inesc tec. http://www.inesctec.pt/.
  15. IRAFM (2015). Institute for fuzzy modeling and application. http://irafm.osu.cz/.
  16. ISVAV (2015). Information system of the research, experimental development and inovations. http://www.isvav.cz.
  17. Küc¸üktunc¸, O., Saule, E., Kaya, K., and C¸ataly ürek, Ü. V. (2012). Recommendation on academic networks using direction aware citation analysis. CoRR, abs/1205.1143.
  18. Lao, N. and Cohen, W. W. (2010). Relational retrieval using a combination of path-constrained random walks. Machine Learning, 81(1):53-67.
  19. Lee, J., Lee, K., and Kim, J. G. (2013). Personalized academic research paper recommendation system. arXiv preprint arXiv:1304.5457.
  20. Mihalcea, R. and Tarau, P. (2004). TextRank: Bringing Order into Texts. In Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain.
  21. Nissen, H.-E., Bednar, P., and Welch, C. (2007). Use and Redesign in IS: Double Helix Relationships? Informing Science.
  22. Pons, P. and Latapy, M. (2005). Computing communities in large networks using random walks. In Proceedings of the 20th International Conference on Computer and Information Sciences, ISCIS'05, pages 284-293, Berlin, Heidelberg. Springer-Verlag.
  23. Price, S., Flach, P. A., and Spiegler, S. (2010). Subsift: a novel application of the vector space model to support the academic research process. In WAPA, pages 20- 27.
  24. Schmitt, G. (1998). Design and construction as computeraugmented intelligence processes. Caadria, Osaka.
  25. Víta, M., Komenda, M., and Pokorn á, A. (2015). Exploring medical curricula using social network analysis methods. 5th International Workshop on Artificial Intelligence in Medical Applications, Lodz, Poland.
  26. Zhou, D., Zhu, S., Yu, K., Song, X., Tseng, B. L., Zha, H., and Giles, C. L. (2008). Learning multiple graphs for document recommendations. In Proceedings of the 17th International Conference on World Wide Web, WWW 2008, Beijing, China, April 21-25, 2008, pages 141-150.
Download


Paper Citation


in Harvard Style

Trigo L., Víta M., Sarmento R. and Brazdil P. (2015). Retrieval, Visualization and Validation of Affinities between Documents . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KITA, (IC3K 2015) ISBN 978-989-758-158-8, pages 452-459. DOI: 10.5220/0005662904520459


in Bibtex Style

@conference{kita15,
author={Luis Trigo and Martin Víta and Rui Sarmento and Pavel Brazdil},
title={Retrieval, Visualization and Validation of Affinities between Documents},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KITA, (IC3K 2015)},
year={2015},
pages={452-459},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005662904520459},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 3: KITA, (IC3K 2015)
TI - Retrieval, Visualization and Validation of Affinities between Documents
SN - 978-989-758-158-8
AU - Trigo L.
AU - Víta M.
AU - Sarmento R.
AU - Brazdil P.
PY - 2015
SP - 452
EP - 459
DO - 10.5220/0005662904520459