High Performance Layout Analysis of Medieval European Document Images

Syed Saqib Bukhari, Ashutosh Gupta, Anil Kumar Tiwari, Andreas Dengel

Abstract

Layout analysis, mainly including binarization and page segmentation, is one of the most important performance determining steps of an OCR system for complex medieval document images, which contain noise, distortions and irregular layouts. In this paper, we present high performance page segmentation techniques for medieval European document images which include a novel main-body and side-notes segregation and an improved version of OCRopus (OCRopus, ) based text line extraction. In order to complete the high performance layout analysis pipeline, we have also presented the application of the percentile based binarization (Afzal et al., 2014) and the multiresolution morphology based text and non-text segmentation (Bukhari et al., 2011) methods over historical document images. presented layout analysis techniques are applied to a collection of the 15th century Latin document images, which achieved more than 90% accuracy for each of the segmentation techniques.

References

Download


Paper Citation


in Harvard Style

Bukhari S., Gupta A., Tiwari A. and Dengel A. (2018). High Performance Layout Analysis of Medieval European Document Images.In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-276-9, pages 324-331. DOI: 10.5220/0006574603240331


in Bibtex Style

@conference{icpram18,
author={Syed Saqib Bukhari and Ashutosh Gupta and Anil Kumar Tiwari and Andreas Dengel},
title={High Performance Layout Analysis of Medieval European Document Images},
booktitle={Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2018},
pages={324-331},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006574603240331},
isbn={978-989-758-276-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - High Performance Layout Analysis of Medieval European Document Images
SN - 978-989-758-276-9
AU - Bukhari S.
AU - Gupta A.
AU - Tiwari A.
AU - Dengel A.
PY - 2018
SP - 324
EP - 331
DO - 10.5220/0006574603240331