USING ONTOLOGIES TO PROSPECT OFFERS ON THE WEB

Rafael Cunha Cardoso, Fernando da Fonseca de Souza, Ana Carolina Salgado

Abstract

Nowadays, information retrieval and extraction systems perform an important role getting relevant information from the World Wide Web (WWW). Semantic Web, which can be seen as the Web’s future, introduces a set of concepts and tools that are being used to insert “intelligence” into contents of the current WWW. Among such concepts, Ontologies play a fundamental role in this new environment. Through ontologies, software agents can cover the Web “understanding” its meaning in order to execute more complex and useful tasks. This work presents an architecture that uses Semantic Web concepts allied to Regular Expressions (regex) to develop a device that retrieves/extracts specific domain information from the Web (HTML documents). The prototype developed, based on this architecture, gets data about offers announced on supermarkets Web sites, using Ontologies and regex to achieve this goal.

References

  1. Arocena, G. O.; Mendelzon, A. O., 1998. WebOQL: Restructuring documents, databases and webs. In: Proceedings of the 14th International Conference on Data Engineering (Orlandos, FL, 1998), pp. 24-33.
  2. Baeza-Yates, R., Ribeiro-Neto, B., 1999. Information Retrieval. ACM Press, Nova York.
  3. Berners-Lee, T.; Cailliau, A.; Luotenem, A.; Nielsen, H. F.; Secret, A.. 1994. The World Wide Web. In Communication of the ACM, 37(8): 76-82. ACM Press.
  4. Berners-Lee, T.; Hendler, J.; Lassila, O., 2001. The Semantic Web. Internet: http://www.sciam.com/article.cfm?articleID=00048144 -10D2-1C70-84A9809EC588EF21 (March, 2003).
  5. Chandrasekaran, B.; Josephson, J. R.; Benjamins, V. R., 1999. What Are Ontologies, and Why Do We Need Them? In IEEE Intelligent Systems. 1999. Magazine. pp 20 - 26.
  6. Clark, A., 2003. CYBERNEKO HTML PARSER. Internet: http://www.apache.org/andyc/neko/doc/html/index.ht ml (October 2003).
  7. Decker, S.; Mitra, P.; Melnik, S., 2002. Framework for the Semantic Web: An RDF Tutorial. Internet: http://www.computer.org/internet/v5n6/ic-rdf.pdf (February, 2003).
  8. Doorenbos, R. B.; Etzioni, O.; Daniel S. Weld, D. S., 1997. A Scalable Comparison-Shopping Agent for the World Wide Web. In: Proceedings of the First International Conference on Autonomous Agents 1997.
  9. Embley, D. W.; Campbell, D. M.; Jiang, Y.S.; Liddle, S.W.; Kai Ng, U.; Quass, D.; Smith, R. D., 1999. Conceptual-model-based data extraction from multiplerecord Web pages. Data and Knowledge Engineering 31, 3 (1999), 227-251.
  10. Genesereth, M. R.; Keller, A. M.; Duschka, O. M., 1997 Infomaster: An information integration system. In Proceedings of the ACM SIGMOD Conference, May 1997.
  11. Heaton, J., 2002. Programming Spiders, Bots and Aggregators in Java. Sybex Inx, Alameda.
  12. JSP, 2004. JSP: Java Server Pages Technology. Internet: http://java.sun.com/products/jsp/ (January 2004).
  13. Kosala, R.; Blockeel, H., 1997. Web Mining Research: A Survey. In Proceedings of the ACM SIGMOD Conference, May 1997.
  14. Laender, A.H.F; Ribeiro-Neto B.A.; da Silva, A.S.; Teixeira J.S., 2002. A brief survey of web data extraction tools. SIGMOD Record, 31(2): 84-93.
  15. Lopatenko, A. S., 2001. Information Retrieval in Current Research Information Systems. Internet: http://semannot2001.aifb.unikarlsruhe.de/positionpapers/Lopatenko.pdf (July 2003).
  16. McGuinness, D. L.; Fikes, R.; Hendler, J.; Stein, L. A., DAML+OIL: An Ontology Language for the Semantic Web. Internet: http://dsonline.computer.org/0211/f/x5mcg.pdf (October 2003).
  17. McLaughlin B., 2001. Java & XML, Second Edition. OReilly, 2001, 528p.
  18. Moura, A. M. C., 2001. A Web Semântica: Fundamentos e Tecnologias. Internet: http://www.udabol.edu.bo/biblioteca/congresos/cicc/ci cc2001/datos/Tutoriales/Tutorial4/T4.pdf (June 2003).
  19. MySQL, 2003. MySQL: The World's Most Popular Open Source Database. Internet: http://www.mysql.com (October, 2003).
  20. Soderland, S., 1999. Learning information extraction rules for semi-structured and free text. Machine Learning 34, 1-3 (1999), 233-272.
  21. W3C, 2003. W3C: World Wide Web Consortium. Internet: http://www.w3c.org (October, 2003).
Download


Paper Citation


in Harvard Style

Cunha Cardoso R., da Fonseca de Souza F. and Carolina Salgado A. (2005). USING ONTOLOGIES TO PROSPECT OFFERS ON THE WEB . In Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 4: ICEIS, ISBN 972-8865-19-8, pages 200-207. DOI: 10.5220/0002555902000207


in Bibtex Style

@conference{iceis05,
author={Rafael Cunha Cardoso and Fernando da Fonseca de Souza and Ana Carolina Salgado},
title={USING ONTOLOGIES TO PROSPECT OFFERS ON THE WEB},
booktitle={Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 4: ICEIS,},
year={2005},
pages={200-207},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002555902000207},
isbn={972-8865-19-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Seventh International Conference on Enterprise Information Systems - Volume 4: ICEIS,
TI - USING ONTOLOGIES TO PROSPECT OFFERS ON THE WEB
SN - 972-8865-19-8
AU - Cunha Cardoso R.
AU - da Fonseca de Souza F.
AU - Carolina Salgado A.
PY - 2005
SP - 200
EP - 207
DO - 10.5220/0002555902000207