Pseudo Relevance Feedback Technique and Semantic Similarity for Corpus-based Expansion

Masnizah Mohd, Jaffar Atwan, Kiyoaki Shirai

2015

Abstract

The adaptation of a Query Expansion (QE) approach for Arabic documents may produce the worst rankings or irrelevant results. Therefore, we have introduced a technique, which is to utilise the Arabic WordNet in the corpus and query expansion level. A Point-wise Mutual Information (PMI) corpus-based measure is used to semantically select synonyms from the WordNet. In addition, Automatic Query Expansion (AQE) and Pseudo Relevance Feedback (PRF) methods were also explored to improve the performance of the Arabic information retrieval (AIR) system. The experimental results of our proposed techniques for AIR shows that the use of Arabic WordNet in the corpus and query level together with AQE, and the adaptation of PMI in the expansion process have successfully reduced the level of ambiguity as these techniques select the most appropriate synonym. It enhanced knowledge discovery by taking care of the relevancy aspect. The techniques also demonstrated an improvement in Mean Average Precision by 49%, with an increase of 7.3% in recall in comparison to the baseline.

References

  1. Al Ameed, H. K., Al Ketbi, S. O., Al Kaabi, A. A., Al Shebli, K. S., Al Shamsi, N. F., Al Nuaimi, N. H. & Al Muhairi, S. S. 2006. Arabic Search Engines Improvement: A New Approach Using Search Key Expansion Derived from Arabic Synonyms Structure. 6th International Conference on Innovations in Information Technology, pp. 944-951.
  2. Al-Eroud, A. F., Al-Ramahi, M. A., Al-Kabi, M. N., Alsmadi, I. M. & Al-Shawakfa, E. M. 2011. Evaluating Google Queries Based on Language Preferences. Journal of Information Science, vol. 37, pp. 282-292.
  3. Al-Kabi, M., Wahsheh, H., Alsmadi, I., Al-Shawakfa, E., Wahbeh, A. & Al-Hmoud, A. 2012. Content-Based Analysis to Detect Arabic Web Spam. Journal of Information Science. vol. 38, pp. 284-296.
  4. Attar, R. & Fraenkel, A. S. 1997. Local Feedback in FullText Retrieval Systems. Journal of the ACM (JACM), vol. 24, pp. 397-417.
  5. Attia, M. A. 2007. Arabic tokenization system. Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. Prague, Czech Republic: Association for Computational Linguistics, pp. 65-72.
  6. Atwan, J., Mohd, M., Kanaan, G., Bsoul, Q. 2014. Impact of stemmer on arabic text retrieval. The Tenth Asia Information Retrieval Societies Conference (AIRS 2014). Sarawak, Malaysia, pp. 314-326.
  7. Hoseini Ma-S. 2011. Modeling the arabic language through verb based ontology. International Journal of Academic Research; 3(3): 67-74.
  8. Jarrar M. 2011. Building a formal arabic ontology methodology and progress. In: Experts meeting on Arabic Ontologies and Semantic Networks, 2011, Alecso, Arab League, Tunis, pp. 497-503.
  9. Larkey L. S., Ballesteros L. and Connell M. E. 2002. Improving stemming for arabic information retrieval: light stemming and co-occurrence analysis. In: 25th Annual International ACM SIGIR Conference on Research and development in information retrieval, Tampere, Finland, 11-15 Aug 2002, pp. 275-282.
  10. Liu, S., Liu, F., Yu, C. & Meng, W. 2004. An effective approach to document retrieval via utilizing WordNet and recognizing phrases." Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 266-272.
  11. Menai, M. E. B. & Alsaeedan, W. 2012. Genetic Algorithm for Arabic Word Sense Disambiguation. International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel & Distributed Computing (SNPD), 13th ACIS, pp. 195-200.
  12. Mitra, M., Singhal, A. & Buckley, C. 1998. Improving Automatic Query Expansion. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 206-214.
  13. Otair, M., Kanaan, G. & Kanaan, R. 2013. Optimizing an Arabic Query Using Comprehensive Query Expansion Techniques. International Journal of Computer Applications, vol. 71, pp. 42-49.
  14. Taghva K., Elkhoury R. and Coombs J. 2005. Arabic stemming without a root dictionary. In: International Conference on Information Technology: Coding and Computing (ITCC), Las Vegas, USA, 4-6 April 2005, pp. 152-157.
  15. Turney, P. 2001. Mining the web for synonyms: PMI-IR Versus LSA on TOEFL. Proceedings of the Twelfth European Conference on Machine Learning, Freiburg, Germany, Springer, pp. 491-502.
Download


Paper Citation


in Harvard Style

Mohd M., Atwan J. and Shirai K. (2015). Pseudo Relevance Feedback Technique and Semantic Similarity for Corpus-based Expansion . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015) ISBN 978-989-758-158-8, pages 445-450. DOI: 10.5220/0005626904450450


in Bibtex Style

@conference{kdir15,
author={Masnizah Mohd and Jaffar Atwan and Kiyoaki Shirai},
title={Pseudo Relevance Feedback Technique and Semantic Similarity for Corpus-based Expansion},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)},
year={2015},
pages={445-450},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005626904450450},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2015)
TI - Pseudo Relevance Feedback Technique and Semantic Similarity for Corpus-based Expansion
SN - 978-989-758-158-8
AU - Mohd M.
AU - Atwan J.
AU - Shirai K.
PY - 2015
SP - 445
EP - 450
DO - 10.5220/0005626904450450