PEOPLE RETRIEVAL LEVERAGING TEXTUAL AND SOCIAL DATA

Amin Mantrach, Jean-Michel Renders

Abstract

The growing importance of social media and heterogeneous relational data emphasizes to the fundamental problem of combining different sources of evidence (or modes) efficiently. In this work, we are considering the problem of people retrieval where the requested information consists of persons and not of documents. Indeed, the processed queries contain generally both textual keywords and social links while the target collection consists of a set of documents with social metadata. Traditional approaches tackle this problem by early or late fusion where, typically, a person is represented by two sets of features: a word profile and a contact/link profile. Inspired by cross-modal similarity measures initially designed to combine image and text, we propose in this paper new ways of combining social and content aspects for retrieving people from a collection of documents with social metadata. To this aim, we define a set of multimodal similarity measures between socially-labelled documents and queries, that could then be aggregated at the person level to provide a final relevance score for the general people retrieval problem. Then, we examine particular instances of this problem: author retrieval, recipient recommendation and alias detection. For this purpose, experiments have been conducted on the ENRON email collection, showing the benefits of our proposed approach with respect to more standard fusion and aggregation methods.

References

  1. Backstrom, L. and Leskovec, J. (2011). Supervised random walks: predicting and recommending links in social networks. In Proceedings of the Forth International Conference on Web Search and Web Data Mining, WSDM 2011, Hong Kong, China, pages 635-644.
  2. Balog, K., Azzopardi, L., and de Rijke, M. (2009). A language modeling framework for expert finding. Inf. Process. Manage., 45(1):1-19.
  3. Calado, P., Cristo, M., Moura, E., Ziviani, N., Ribeiro-Neto, B., and Gonc¸alves, M. (2003). Combining link-based and content-based methods for web document classification. In Proceedings of the twelfth international Conference on Information and Knowledge Management (CIKM 2003, pages 394-401. ACM.
  4. Chakrabarti, S., Dom, B., and Indyk, P. (1998). Enhanced hypertext categorization using hyperlinks. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 307-318.
  5. Clinchant, S., Renders, J.-M., and Csurka, G. (2007). Transmedia pseudo-relevance feedback methods in multimedia retrieval. In CLEF, pages 569-576.
  6. Cohn, D. A. and Hofmann, T. (2000). The missing link - a probabilistic model of document content and hypertext connectivity. In Neural Information Processing Systems conference (NIPS 2000), pages 430-436.
  7. Fisher, M. and Everson, R. (2003). When are links useful? Experiments in text classification. In Advances in information retrieval: proceedings of the 25th European Conference on Information Retrieval Research (ECIR 2003), pages 41-56. Springer Verlag.
  8. Joachims, T., Cristianini, N., and Shawe-Taylor, J. (2001). Composite kernels for hypertext categorisation. In Proceedings of the International Conference on Machine Learning (ICML 2001), pages 250-257.
  9. Klimt, B. and Yang, Y. (2004). The enron corpus: A new dataset for email classification research. In Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, pages 217- 226.
  10. Macskassy, S. A. (2007). Improving learning in networked data by combining explicit and mined links. In Proceedings of the 22th conference on Artificial Intelligence (AAAI 2007), pages 590-595.
  11. Macskassy, S. A. and Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8:935-983.
  12. Maes, F., Peters, S., Denoyer, L., and Gallinari, P. (2009). Simulated iterative classification: a new learning procedure for graph labeling. In Proceedings of the European Conference on Machine Learning (ECML 2009), pages 47-62.
  13. McCallum, A., Wang, X., and Corrada-Emmanuel, A. (2007). Topic and role discovery in social networks with experiments on enron and academic email. J. Artif. Intell. Res. (JAIR), 30:249-272.
  14. McDonald, K. and Smeaton, A. F. (2005). A comparison of score, rank and probability-based fusion methods for video shot retrieval. In Proceedings of 4th International Conference on Image and Video Retrieval, CIVR 2005, Singapore, July 20-22, 2005, pages 61- 70.
  15. Mensink, T., Verbeek, J. J., and Csurka, G. (2010). Trans media relevance feedback for image autoannotation. In Proceedings of British Machine Vision Conference, BMVC 2010, Aberystwyth, UK, August 31 - September 3, 2010, pages 1-12.
  16. Mimno, D. M. and McCallum, A. (2007). Expertise modeling for matching papers with reviewers. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007, pages 500-509.
  17. Oh, H., Myaeng, S., and Lee, M. (2000). A practical hypertext catergorization method using links and incrementally available class information. In Proceedings of the 23rd international ACM conference on Research and Development in Information Retrieval (SIGIR 2000), pages 264-271. ACM.
  18. Slattery, S. and Mitchell, T. (2000). Discovering test set regularities in relational domains. In Proceedings of the 7th international conference on Machine Learning (ICML 2000), pages 895-902.
  19. Smirnova, E. and Balog, K. (2011). A user-oriented model for expert finding. In Advances in Information Retrieval - 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings, pages 580-592.
  20. Tang, W., Lu, Z., and Dhillon, I. S. (2009). Clustering with multiple graphs. In Proceeding of The Ninth IEEE International Conference on Data Mining, Miami, Florida, USA, 6-9 December 2009, pages 1016- 1021.
  21. Zhai, C. and Lafferty, J. D. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 9- 13, 2001, New Orleans, Louisiana, USA, pages 334- 342.
  22. Zhou, D. and Burges, C. J. C. (2007). Spectral clustering and transductive learning with multiple views. In Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvalis, Oregon, USA, June 20- 24, 2007, pages 1159-1166.
  23. Zhu, S., Yu, K., Chi, Y., and Gong, Y. (2007). Combining content and link for classification using matrix factorization. In Proceedings of the 30th international ACM conference on Research and Development in Information Retrieval (SIGIR 2007), pages 487-494. ACM.
Download


Paper Citation


in Harvard Style

Mantrach A. and Renders J. (2011). PEOPLE RETRIEVAL LEVERAGING TEXTUAL AND SOCIAL DATA . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011) ISBN 978-989-8425-79-9, pages 325-333. DOI: 10.5220/0003669203330341


in Bibtex Style

@conference{kdir11,
author={Amin Mantrach and Jean-Michel Renders},
title={PEOPLE RETRIEVAL LEVERAGING TEXTUAL AND SOCIAL DATA},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)},
year={2011},
pages={325-333},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003669203330341},
isbn={978-989-8425-79-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2011)
TI - PEOPLE RETRIEVAL LEVERAGING TEXTUAL AND SOCIAL DATA
SN - 978-989-8425-79-9
AU - Mantrach A.
AU - Renders J.
PY - 2011
SP - 325
EP - 333
DO - 10.5220/0003669203330341