Stanley Loh, Fabiana Lorenzi, Roger Granada, Daniel Lichtnow, Leandro Krug Wives, José Palazzo Moreira de Oliveira



This paper presents investigations on representing user’s profiles with information extracted from their scientific publications. The work assumes that scientific papers written by users can be used to represent user’s interest or expertise and that these representations can be used to find similar users. The goal is to support similarity evaluations between users in a model-based collaborative recommender. Representing users by their publications can help minimizing the new user problem. The idea is to avoid the necessity of asking users to evaluate a set of items or give some information about their preferences, for example. In scientific communities, particularly on digital libraries and systems focused on the retrieval of scientific papers, this is an interesting feature. We have conducted some experiments to compare different techniques to represent the papers (title, keywords, abstract and complete text) and two kinds of text indexes: terms and concepts. Furthermore, two distinct similarity functions (Jaccard and a Fuzzy function) were applied on these representations and then compared with the goal of finding similar users.


  1. Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734-749.
  2. Basu, C., Hirsh, H., and Cohen, W. (2001). A study in combining multiple information sources. Journal of the Artificial Intelligence Research (JAIR), 14:231-252.
  3. Brutlag, J. and Meek, C. (2000). Challenges of the email domain for text classification. In 7th International Conference on Machine Learning (ICML 2000), pages 103-110, Stanford University, USA.
  4. Chang, H.-C. and Hsu, C.-C. (2005). Using topic keyword clusters for automatic document clustering. IEICE - Trans. Inf. Syst., E88-D(8):1852-1860.
  5. Chen, H. (1994). The vocabulary problem in collaboration. IEEE Computer, 27(5):2-10.
  6. Ding, Y. and Li, X. (2005). Time weight collaborative filtering. In CIKM 7805: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 485-492, New York, NY, USA. ACM.
  7. Drachsler, H., Hummel, H. G. K., and Koper, R. (2008). Personal recommender systems for learners in lifelong learning networks: the requirements, techniques and model. Int. J. Learn. Technol., 3(4):404-423.
  8. Dumais, S. T. and Nielsen, J. (1992). Automating the assignment of submitted manuscripts to reviewers. In 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 233-244, Copenhagen, Denmark.
  9. Guarino, N. (1998). Formal ontology and information systems. In International Conference on Formal Ontologies in Information Systems - FOIS'98, pages 3-15, Trento, Italy.
  10. Koller, D. and Sahami, M. (1997). Hierarchically classifying documents using very few words. In ICML 7897: Proceedings of the Fourteenth International Conference on Machine Learning, pages 170-178, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  11. Kraft, R., Chang, C. C., Maghoul, F., and Kumar, R. (2006). Searching with context. In WWW 7806: Proceedings of the 15th international conference on World Wide Web, pages 477-486, New York, NY, USA. ACM.
  12. Lin, C.-h. and Chen, H. (1996). An automatic indexing and neural network approach to concept retrieval and classification of multilingual (chinese-english) documents. IEEE Transactions on Systems, Man and Cybernetics, 26(1):1-14.
  13. Loh, S. (2001). Concept-based approach for knowledge discovery in texts (in Portuguese). PhD thesis, Federal University of Rio Grande do Sul.
  14. Loh, S., Wives, L. K., and Oliveira, J. P. M. (1998). Concept-based knowledge discovery in texts extracted from the web. ACM SIGKDD Explorations, 2(1):29- 39.
  15. McNee, S., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., and Riedl, J. (2002). On the recommending of citations for research paperss. In Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work, pages 116-125.
  16. Middleton, S. E., Shadbolt, N. R., and Roure, D. C. D. (2003). Capturing interest through inference and visualization: ontological user profiling in recommender systems. In International Conference on Knowledge Capture KCAP03, pages 62-69, New York. ACM Press.
  17. Pedrycz, W. (1993). Fuzzy neural networks and neurocomputations. Fuzzy Sets and Systems, 56(1):1-28.
  18. Rashid, A. M., Albert, I., Cosley, D., Lam, S. K., McNee, S. M., Konstan, J. A., and Riedl, J. (2002). Getting to know you: learning new user preferences in recommender systems. In IUI 7802: Proceedings of the 7th international conference on Intelligent user interfaces, pages 127-134, New York, NY, USA. ACM.
  19. Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
  20. Sowa, J. F. (2000). Knowledge representation: logical, philosophical, and computational foundations. Brooks/Cole Publishing Co, Pacific Grove, CA.
  21. Spertus, E., Sahami, M., and Buyukkokten, O. (2005). Evaluating similarity measures: a large-scale study in the orkut social network. In Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery and data mining KDD 05, pages 678- 684.
  22. Stoilova, L., Holloway, T., Markines, B., Maguitman, A. G., and Menczer, F. (2005). Givealink: mining a semantic network of bookmarks for web search and recommendation. In Proceedings of the 3rd International Workshop on Link discovery LinkKDD, pages 66-73.
  23. Wang, J., de Vries, A. P., and Reinders, M. J. T. (2006). Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR 2006, pages 501-508, Washington, USA.
  24. Willet, P. (1998). Recent trends in hierarchic document clustering: a critical review. Information Processing & Management, 24(5):577-597.
  25. Yarowsky, D. and Florian, R. (1999). Taking the load off the conference chairs: towards a digital paper-routing assistant. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 220-230, Washington, USA.

Paper Citation

in Harvard Style

Loh S., Lorenzi F., Granada R., Lichtnow D., Krug Wives L. and Palazzo Moreira de Oliveira J. (2009). IDENTIFYING SIMILAR USERS BY THEIR SCIENTIFIC PUBLICATIONS TO REDUCE COLD START IN RECOMMENDER SYSTEMS . In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-8111-81-4, pages 589-596. DOI: 10.5220/0001823405890596

in Bibtex Style

author={Stanley Loh and Fabiana Lorenzi and Roger Granada and Daniel Lichtnow and Leandro Krug Wives and José Palazzo Moreira de Oliveira},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},

in EndNote Style

JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
SN - 978-989-8111-81-4
AU - Loh S.
AU - Lorenzi F.
AU - Granada R.
AU - Lichtnow D.
AU - Krug Wives L.
AU - Palazzo Moreira de Oliveira J.
PY - 2009
SP - 589
EP - 596
DO - 10.5220/0001823405890596