RANKING CLASSES OF SEARCH ENGINE RESULTS

Zheng Zhu, Mark Levene, Ingemar Cox

2010

Abstract

Ranking search results is an ongoing research topic in information retrieval. The traditional models are the vector space, probabilistic and language models, and more recently machine learning has been deployed in an effort to learn how to rank search results. Categorization of search results has also been studied as a means to organize the results, and hence to improve users search experience. However there is little research to-date on ranking categories of results in comparison to ranking the results themselves. In this paper, we propose a probabilistic ranking model that includes categories in addition to a ranked results list, and derive six ranking methods from the model. These ranking methods utilize the following features: the class probability distribution based on query classification, the lowest ranked document within each class and the class size. An empirical study was carried out to compare these methods with the traditional ranked-list approach in terms of rank positions of click-through documents and experimental results show that there is no simpler winner in all cases. Better performance is attained by class size or a combination of the class probability distribution of the queries and the rank of the document with the lowest list rank within the class.

References

  1. Bar-Ilan, J., Zhu, Z., and Levene, M. (2009). Topic-specific analysis of search queries. In WSCD 7809: Proceedings of the 2009 workshop on Web Search Click Data, pages 35-42, New York, NY, USA. ACM.
  2. Beitzel, S. M., Jensen, E. C., Frieder, O., Grossman, D., Lewis, D. D., Chowdhury, A., and Kolcz, A. (2005). Automatic web query classification using labeled and unlabeled training data. In SIGIR 7805: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 581-582, New York, NY, USA. ACM.
  3. Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. (2007). Robust classification of rare queries using web knowledge. In SIGIR 7807: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 231-238, New York, NY, USA. ACM.
  4. Cao, H., Hu, D. H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., and Yang, Q. (2009). Context-aware query classification. In SIGIR 7809: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 3-10, New York, NY, USA. ACM.
  5. Carpineto, C., OsiƁski, S., Romano, G., and Weiss, D. (2009). A survey of web clustering engines. ACM Comput. Surv., 41(3):1-38.
  6. Chen, H. and Dumais, S. (2000). Bring order to the web: Automatically categorizing search results. In CHI 7800: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 145-152, New York, NY, USA. ACM Press.
  7. Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. In SIGIR 7805: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 154-161, New York, NY, USA. ACM.
  8. Li, Y. and Zheng, Z. (2005). Kdd cup 2005. Online at http://www.acm.org/sigs/sigkdd/kdd2005/kddcup.html.
  9. Lin, C.-J., Weng, R. C., and Keerthi, S. S. (2007). Trust region newton methods for large-scale logistic regression. In ICML 7807: Proceedings of the 24th international conference on Machine learning, pages 561- 568, New York, NY, USA. ACM.
  10. Liu, Y., Fu, Y., Zhang, M., Ma, S., and Ru, L. (2007). Automatic search engine performance evaluation with click-through data analysis. In WWW 7807: Proceedings of the 16th international conference on World Wide Web, pages 1133-1134, New York, NY, USA. ACM.
  11. Shen, D., Pan, R., Sun, J.-T., Pan, J. J., Wu, K., Yin, J., and Yang, Q. (2006). Query enrichment for web-query classification. ACM Trans. Inf. Syst., 24(3):320-352.
  12. Zamir, O. and Etzioni, O. (1999). Grouper: A dynamic clustering interface to web search results. Computer Networks, pages 1361-1374.
  13. Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y., and Ma, J. W. (2004). Learning to cluster web search results. In SIGIR 7804: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 210-217, New York, USA. ACM Press.
  14. Zhu, Z., Cox, I. J., and Levene, M. (2008). Ranked-listed or categorized results in ir: 2 is better than 1. In NLDB 7808: Proceedings of the 13th international conference on Natural Language and Information Systems, pages 111-123, Berlin, Heidelberg. Springer-Verlag.
Download


Paper Citation


in Harvard Style

Zhu Z., Levene M. and Cox I. (2010). RANKING CLASSES OF SEARCH ENGINE RESULTS . In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010) ISBN 978-989-8425-28-7, pages 294-301. DOI: 10.5220/0003100902940301


in Bibtex Style

@conference{kdir10,
author={Zheng Zhu and Mark Levene and Ingemar Cox},
title={RANKING CLASSES OF SEARCH ENGINE RESULTS},
booktitle={Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)},
year={2010},
pages={294-301},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003100902940301},
isbn={978-989-8425-28-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Knowledge Discovery and Information Retrieval - Volume 1: KDIR, (IC3K 2010)
TI - RANKING CLASSES OF SEARCH ENGINE RESULTS
SN - 978-989-8425-28-7
AU - Zhu Z.
AU - Levene M.
AU - Cox I.
PY - 2010
SP - 294
EP - 301
DO - 10.5220/0003100902940301