ExtraWeb - An Extrinsic Task-oriented Evaluation of Webpage Extracts

Patrick Pedreira Silva, Lucia Helena Machado Rino

2013

Abstract

This paper focuses on the usefulness of extracts of webpages in Brazilian Portuguese as the means to filter information for Web users to quickly and consistently judge the relevance of search engine results. ExtraWeb, an ontology- and HTML-based summarizer, has been built aiming at providing an alternative to query-biased extracts typically made available by Web search engines. An extrinsic evaluation of ExtraWeb was carried out under a controlled experiment that retrieves webpages in Portuguese and generates a set of extracts for an Internet user to evaluate. Only the relevance judgment of the extracts was assessed. Results show that the system is promising in helping users to filter relevant webpages.

References

  1. Amitay, E., 2001. What lays in the layout: Using anchorparagraph arrangements to extract descriptions of Web documents. PhD Thesis. Department Mani of Computing, Macquarie University.
  2. Barros, F. A., Gonçalves, P. F., Santos, T. L. V. L., 1998. Providing Context to Web Searches: The Use of Ontologies to Enhance Search Engine's Accuracy. Journal of the Brazilian Computer Society, 5(2):45-55.
  3. Chirita, P. A., Nejdl, W., Paiu, R., Kohlschütter, C., 2005. Using ODP meta-data to personalize search. In the Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 178-185.
  4. Conklin, J., 1987. Hypertext: An Introduction and Survey. IEEE Computer, 20(9), pp.17-41.
  5. Dorr, B., Monz, C., President, S., Schwartz, R., Zajic, D., 2005. A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate? In the Proc. of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 1-8.
  6. Edmundson, H. P., 1969. New Methods in Automatic Extracting. Journal of the ACM, 16(2):264-285.
  7. Greghi, J. G., Martins, R. T., Nunes, M. G. V., 2002. Diadorim: a lexical database for brazilian portuguese. In the Proc. of the Third International Conference on language Resources and Evaluation. 4:1346-1350.
  8. Griesbaum, J., 2004. Evaluation of three German search engines: Altavista.de, Google.de and Lycos.de. Information Research, 9(4), paper 189.
  9. Hachey, B., Murray, G., Reitter, D., 2005. Embra System at DUC 2005: Query-oriented multi-document summarization with a very large latent semantic. Document Understanding Conference 2005, Vancouver, British Columbia, Canada.
  10. Haveliwala, T. H., 2002. Topic-sensitive PageRank. In the Proc. of the Eleventh International World Wide Web Conference, Honolulu, Hawaii.
  11. Inktomi-Corp., 2003. Web search relevance test. Ve-ritest. Available at http://www.veritest.com/clients/reports/ inktomi/inktomi_Web_search_test.pdf [March 2006].
  12. Jansen, B. J., Spink, A., Saracevic, T., 2000. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2):207-227.
  13. Lewis, J. R., 1995. Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use. International Journal of HumanComputer Interaction, 7(1):57-78.
  14. Liang, S. F., Devlin, S., Tait, J., 2004. Feature Selection for Summarising: The Sunderland DUC 2004 Experience. Document Understanding Conference 2004, Boston, USA.
  15. Lin, C. Y., 1995. Knowledge-Based Automatic Topic Identification. In the Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 308-310.
  16. Luhn, H. P., 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research Development, 2(2):159-165.
  17. Mani, I., Klein, G., House, D., Hirschman, L., Firmin, T., Sundheim, B., 2002. SUMMAC: a text summarization evaluation. Natural Language Engineering, 8(1):43- 68.
  18. Mladenic, D., 1998. Turning Yahoo into an Automatic Web-Page Classifier. In the Proc. of the 13th European Conference on Artificial Intelligence (ECAI'98), pp. 473-474.
  19. Page, L., Brin, S., Motwani, R., Winograd, T., 1998. The PageRank Citation Ranking: Bringing Order to the Web. Tech. Rep., Stanford University, Stanford, CA.
  20. Perlman, G., 2011. Web-Based User Interface Evaluation with Questionnaires. Available at http://www.acm.org/perlman/question.html [January 2013].
  21. Tang, R., Shaw, W. M., Vevea, J. L., 1999. Towards the identification of the optimal numbers of relevance categories. Journal of the American Society for Information Science (JASIS), 50(3):254-264.
  22. Tiun, S., Abdullah, R., Kong, T. E., 2001. Automatic Topic Identification Using Ontology Hierarchy. In the Proc. of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2001), pp. 444-453.
  23. White, R., Ruthven, I., Jose, J. M., 2002. Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes. In the Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 57-64.
Download


Paper Citation


in Harvard Style

Pedreira Silva P. and Machado Rino L. (2013). ExtraWeb - An Extrinsic Task-oriented Evaluation of Webpage Extracts . In Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-59-4, pages 467-474. DOI: 10.5220/0004449104670474


in Bibtex Style

@conference{iceis13,
author={Patrick Pedreira Silva and Lucia Helena Machado Rino},
title={ExtraWeb - An Extrinsic Task-oriented Evaluation of Webpage Extracts},
booktitle={Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2013},
pages={467-474},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004449104670474},
isbn={978-989-8565-59-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 15th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - ExtraWeb - An Extrinsic Task-oriented Evaluation of Webpage Extracts
SN - 978-989-8565-59-4
AU - Pedreira Silva P.
AU - Machado Rino L.
PY - 2013
SP - 467
EP - 474
DO - 10.5220/0004449104670474