A XML-BASED BOOTSTRAPPING METHOD FOR PATTERN ACQUISITION

Xingjie Zeng, Fang Li, Dongmo Zhang, Athena I.Vakali

2004

Abstract

Extensible Markup Language (XML) has been widely used as a middleware because of its flexibility. Fixed domain is one of the bottlenecks of Information Extraction (IE) technologies. In this paper we present a XML-based domain-adaptable bootstrapping method of pattern acquisition, which focuses on minimizing the cost of domain migration. The approach starts from a seed corpus with some seed patterns; extends the corpus based on the seed corpus through the Internet and acquires the new patterns from extended corpus. Positive and negative examples classified from training corpus are used to evaluate the patterns acquired. The result shows our method is a practical way in pattern acquisitions.

References

  1. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, 1997. The Lorel query language for semi-structured data. International Journal on Digital Libraries, 1(1):68-88, April
  2. S.Cluet, C.Delobel, J.Siemon and K. Smaga, 1998. Your Mediators Need Data Conversion! in Processing of ACM-SIGMOD International Conference on Management of Data, 177-188.
  3. S.Cluet, S. Jacqmin, and J.Siemon, 1999. The New YTAL: Design and Specifications. Technical Report, INRIA.
  4. A. Deutsch, M. Fernandex, D. Florescu, A.Levy and D. Suciu. A query language for XML. in International World Wide Web Conference, 1999
  5. M. Fernandez, J. Siemon, P. Wadler, 1999. XML Query Languages: Experiences and Examples. http://wwwdb.research.bell-labs.com/user/simeon/xqu ery.html
  6. E. Agichtein & L. Granvno, 2000. Snowball: Extracting Relations from Large Plain-Text Collections. in Proceedings of the 5th ACM International Conference on Digital Libraries.
  7. R. Grishman, S. Huttunen & R. Yangarber, 2002, Real-Time Event Extraction for Infectious Disease Outbreaks. in Proceedings of Human Language Technology Conference (HLT)
  8. S. Soderland, D. Fisher, J. Aseltine & Wendy Lhenert, 1995. CRYSTAL: Inducing a Conceptual Dictionary. in proceedings of the 14th IJCAI' 95.
  9. Ellen Riloff & Janyee Wiebe, 2003. Learning Extraction Patterns for Subjective Extractions. University of Utah, The Association for Computational Linguistics (ACL)
  10. Roman Yangarber, 2003. Counter_Training in Discovery of Semantic Patterns. New York University, the Association for Computational Linguistics (ACL)
  11. Ellen Riloff, 1995. Little Words Can Make a Big Difference for Text Classification. University of Utah, in proceedings of the 18th Annual International Conference on Research and Development in Information Retrieval (SIGIR'95)
Download


Paper Citation


in Harvard Style

Zeng X., Li F., Zhang D. and I.Vakali A. (2004). A XML-BASED BOOTSTRAPPING METHOD FOR PATTERN ACQUISITION . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 972-8865-00-7, pages 303-308. DOI: 10.5220/0002607803030308


in Bibtex Style

@conference{iceis04,
author={Xingjie Zeng and Fang Li and Dongmo Zhang and Athena I.Vakali},
title={A XML-BASED BOOTSTRAPPING METHOD FOR PATTERN ACQUISITION},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2004},
pages={303-308},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002607803030308},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - A XML-BASED BOOTSTRAPPING METHOD FOR PATTERN ACQUISITION
SN - 972-8865-00-7
AU - Zeng X.
AU - Li F.
AU - Zhang D.
AU - I.Vakali A.
PY - 2004
SP - 303
EP - 308
DO - 10.5220/0002607803030308