Page Analysis by 2D Conditional Random Fields

Atsuhiro Takasu

2013

Abstract

This paper applies two-dimensional conditional random fields (2D CRF) to page analysis and information extraction. In this paper we discuss features and labels for information extraction by 2D CRF. We evaluated the method by applying it to the problem of extracting bibliographic components from scanned title pages of academic papers. The experimental results show that 2D CRF improves the performance of information extraction compared to chain-model CRF.

References

  1. Councill, I. G., Giles, C. L., and Kan, M.-Y. (2008). Parscit: An open-source crf reference string parsing package. In Intl. Conf. on Language Resources and Evaluation (LREC 2008), pages 661 - 667.
  2. Lafferty, J., McCallum, A., and Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (ICML 2001), pages 282 - 289.
  3. Montreuil, F., Grosicki, E., Heutte, L., and Nicolas, S. (2007). Unconstrained handwritten document layout extraction using 2d conditional random fields. In International Conference on Document Analysis and Recognition (ICDAR 2009), pages 407 - 411.
  4. Nicolas, S., Dardenne, J., Paquet, T., and Heutte, L. (2007). Document image segmentation using a 2d conditional random field model. In International Conference on Document Analysis and Recognition (ICDAR 2007), pages 407 - 411.
  5. Ohta, M., Inoue, R., and Takasu, A. (2010). “Empirical Evaluation of Active Sampling for CRF-based Analysis of Pages”. In International Conference on Information Reuse and Integration (IEEE IRI2010), pages 13-18.
  6. Takasu, A. (2008). “Information Extraction by Two Dimensional Parser”. In Proc. IEEE Intl. Conf. on Tools with Artificial Intelligence, pages 333-340.
  7. Zhu, J., Nie, Z., Wen, J.-R., Zhang, B., and Ma, W.-Y. (2005). 2d conditional random fields for web information extraction. In International Conference on Machine Learning (ICML 2005).
Download


Paper Citation


in Harvard Style

Takasu A. (2013). Page Analysis by 2D Conditional Random Fields . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 564-567. DOI: 10.5220/0004266505640567


in Bibtex Style

@conference{icpram13,
author={Atsuhiro Takasu},
title={Page Analysis by 2D Conditional Random Fields},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={564-567},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004266505640567},
isbn={978-989-8565-41-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Page Analysis by 2D Conditional Random Fields
SN - 978-989-8565-41-9
AU - Takasu A.
PY - 2013
SP - 564
EP - 567
DO - 10.5220/0004266505640567