Result Diversity for RDF Search

Hiba Arnaout, Shady Elbassuoni

Abstract

RDF repositories are typically searched using triple-pattern queries. Often, triple-pattern queries will return too many results, making it difficult for users to find the most relevant ones. To remedy this, some recent works have proposed relevance-based ranking-models for triple-pattern queries. However it is often the case that the top-ranked results are homogeneous. In this paper, we propose a framework to diversify the results of triple-pattern queries over RDF datasets. We first define different notions for result diversity in the setting of RDF. We then develop an approach for result diversity based on the Maximal Marginal Relevance. Finally, we develop a diversity-aware evaluation metric based on the Discounted Cumulative Gain and use it on a benchmark of 100 queries over DBPedia.

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S. (2009). Diversifying search results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM 7809, pages 5-14, New York, NY, USA. ACM.
  2. Allan, J., Wade, C., and Bolivar, A. (2003). Retrieval and novelty detection at the sentence level. In SIGIR, pages 314-321.
  3. Auer, S., Bizer, C., Cyganiak, R., Kobilarov, G., Lehmann, J., and Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In ISWC/ASWC.
  4. Carbonell, J. and Goldstein, J. (1998). The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR.
  5. Chaudhuri, S., Das, G., Hristidis, V., and Weikum, G. (2006). Probabilistic information retrieval approach for ranking of database query results. SIGMOD Record, 35(4).
  6. Chen, H. and Karger, D. R. (2006). Less is more: probabilistic models for retrieving fewer relevant documents. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 7806, pages 429-436, New York, NY, USA. ACM.
  7. Chen, Z. and Li, T. (2007). Addressing diverse user preferences in SQL-query-result navigation. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, SIGMOD 7807, pages 641- 652, New York, NY, USA. ACM.
  8. Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., B├╝ttcher, S., and MacKinnon, I. (2008). Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 7808, pages 659-666, New York, NY, USA. ACM.
  9. Dali, L., Fortuna, B., Tran Duc, T., and Mladenic, D. (2012). Query-independent learning to rank for rdf entity search. In ESWC, pages 484-498.
  10. Elbassuoni, S., Ramanath, M., Schenkel, R., Sydow, M., and Weikum, G. (2009). Language-model-based ranking for queries on RDF-graphs. In CIKM.
  11. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378 - 382.
  12. Gollapudi, S. and Sharma, A. (2009). An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, WWW 7809, pages 381-390, New York, NY, USA. ACM.
  13. Jrvelin, K. and Keklinen, J. (2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), pages 422-446.
  14. Kasneci, G., Suchanek, F. M., Ifrim, G., Ramanath, M., and Weikum, G. (2008). Naga: Searching and ranking knowledge. In ICDE.
  15. Lin., J. (1991). Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, pages 145-151.
  16. RDF (2004). W3c: Resource description framework (rdf). www.w3.org/RDF/.
  17. SPARQL (2008). W3c: Sparql query language for rdf. www.w3.org/TR/rdf-sparql-query/.
  18. Suchanek, F. M., Kasneci, G., and Weikum, G. (2008). Yago: A large ontology from wikipedia and wordnet. J. Web Sem., 6(3).
  19. Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., and Yahia, S. A. (2008). Efficient Computation of Diverse Query Results. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, pages 228-236, Washington, DC, USA. IEEE Computer Society.
  20. Zhai, C. X., Cohen, W. W., and Lafferty, J. (2003). Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, SIGIR 7803, pages 10-17, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Arnaout H. and Elbassuoni S. (2016). Result Diversity for RDF Search . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 249-256. DOI: 10.5220/0006046402490256


in Bibtex Style

@conference{kdir16,
author={Hiba Arnaout and Shady Elbassuoni},
title={Result Diversity for RDF Search},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={249-256},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006046402490256},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Result Diversity for RDF Search
SN - 978-989-758-203-5
AU - Arnaout H.
AU - Elbassuoni S.
PY - 2016
SP - 249
EP - 256
DO - 10.5220/0006046402490256