A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS

Fabio Clarizia, Luca Greco, Paolo Napoletano

2010

Abstract

In this paper we present a new technique for retrieving relevant web pages in informational queries results. The proposed technique, based on a probabilistic model of language, is embedded in a traditional web search engine. The relevance of aWeb page has been obtained through the judgment of human beings which, referring to continue scale, have assigned a degree of importance to each of the analyzed websites. In order to validate the proposed method a comparison with a classic engine is presented showing comparison based on a measure of Precision and Recall and on a measure of distance with respect to the measure of significance obtained by humans.

References

  1. Bar-Ilan, J. (2004). Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology, 53(308- 319).
  2. Berners-Lee, T., Hendler, J., and Lassila, O. (2001). The semantic web. Scientific American, May.
  3. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(993-1022).
  4. Brin, S. (1998). The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, pages 107-117.
  5. Christopher D. Manning, P. R. and Schtze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
  6. Colace, F., Santo, M. D., and Napoletano, P. (2008). A note on methodology for designing ontology management systems. In AAAI Spring Symposium.
  7. Heting Chu, M. R. (1996). Search engines for the world wide web: a comparative study and evaluation methodology. In In Proceedings of the 59th annual meeting of the American Society for Information Science, pages 127-135.
  8. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the Twenty-Second Annual International SIGIR Conference.
  9. Howard Greisdorf, A. S. (2001). Median measure: an approach to ir systems evaluation. Information Processing and Management, 37(6)(843-857).
  10. Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
  11. Michael Gordon, P. P. (1999). Finding information on the world wide web: the retrieval effectiveness of search engines. Information Processing and Management, 35(141-180).
  12. Saari, D. G. (2001). Chaotic Elections! A Mathematician Looks at Voting. American Mathematical Society, Providence.
  13. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR Forum, 40(677-691).
  14. T. L. Griffiths, M. Steyvers, J. B. T. (2007). Topics in semantic representation. Psychological Review, 114(2):211-244.
  15. Vaughan, L. (2004). New measurements for search engine evaluation. Information Processing and Management, 40(677-691).
  16. Voorhees, E. M. (2003). Overview of trec 2003. In In Proceedings of the 12th Text Retrieval Conference, pages 1-13.
Download


Paper Citation


in Harvard Style

Clarizia F., Greco L. and Napoletano P. (2010). A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS, ISBN 978-989-8425-06-5, pages 70-79. DOI: 10.5220/0002903100700079


in Bibtex Style

@conference{iceis10,
author={Fabio Clarizia and Luca Greco and Paolo Napoletano},
title={A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS,},
year={2010},
pages={70-79},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002903100700079},
isbn={978-989-8425-06-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS,
TI - A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS
SN - 978-989-8425-06-5
AU - Clarizia F.
AU - Greco L.
AU - Napoletano P.
PY - 2010
SP - 70
EP - 79
DO - 10.5220/0002903100700079