Improving Projection Profile for Segmenting Characters from Javanese Manuscripts

Aditya W Mahastama, Lucia D Krisnawati

2019

Abstract

The emergence of non-latin scripts in the Unicode character set has opened the possibilities to do Optical Character Recognition (OCR) for manuscripts written in non-alphabetic scripts. Javanese is one of the Southeast Asian languages which has vast collections of manuscripts. Unfortunately, these manuscripts are prone to damage due to lack of maintenance. Therefore, digitising them through OCR has become the most obvious option. This research focuses on the segmentation process of our OCR project which implements the Projection-Profile Cutting (PPC). The rationale is that PPC is well known as having a low computational cost. As the object of segmentation, we sampled 72 scanned pages of Serat Mangkunegara IV, Wulang Maca, and Kitab Rum. Our preliminary evaluation showed that implementing PPC per se exhibits unsatisfactory results. Hence, we refined it by applying a statistical analysis to segment lines of characters whose distance is too low. The proposed algorithm results in 19.112 segments. To evaluate the system outputs, we conducted two levels of evaluation: the line and character segmentations. The refinement of PPC has proved to increase the line segmentation accuracy by 32.84%. To evaluate the character segmentation, we collaborated with Javanese Wikipedia Community which verified them manually in 4 batches. Only 15.386 segments were verified, in which 73.59% (11.322) system outputs are correctly segmented, 22.5% (3.464) are over-segmented, 1.3% (206) are under-segmented, and the rest has not been labelled as either one of three categories above.

Download


Paper Citation


in Harvard Style

Mahastama A. and Krisnawati L. (2019). Improving Projection Profile for Segmenting Characters from Javanese Manuscripts.In Proceedings of the 1st International Conference on Intermedia Arts and Creative Technology - Volume 1: CREATIVEARTS, ISBN 978-989-758-430-5, pages 77-82. DOI: 10.5220/0008526900770082


in Bibtex Style

@conference{creativearts19,
author={Aditya W Mahastama and Lucia D Krisnawati},
title={Improving Projection Profile for Segmenting Characters from Javanese Manuscripts},
booktitle={Proceedings of the 1st International Conference on Intermedia Arts and Creative Technology - Volume 1: CREATIVEARTS,},
year={2019},
pages={77-82},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008526900770082},
isbn={978-989-758-430-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Intermedia Arts and Creative Technology - Volume 1: CREATIVEARTS,
TI - Improving Projection Profile for Segmenting Characters from Javanese Manuscripts
SN - 978-989-758-430-5
AU - Mahastama A.
AU - Krisnawati L.
PY - 2019
SP - 77
EP - 82
DO - 10.5220/0008526900770082