loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Hien Thi Ha 1 ; Aleš Horák 1 and Minh Tuan Bui 2

Affiliations: 1 NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic ; 2 Le Quy Don Technical University, Vietnam

Keyword(s): Information Extraction, Scanned Documents, Document Metadata, Contract Metadata Extraction, Czech.

Abstract: Although nowadays digital-born documents are generally prevalent, exchange of business documents often consists in processing their scanned image form as a general human-readable format with one-to-one correspondence to paper documents. Bulk processing of such scanned documents then requires human intervention to extract and enter the main document metadata. In this paper, we present the design and evaluation of a contract processing module in the OCRMiner system. The information extraction process allows to combine layout properties with text analysis as input to a rule-based extraction with confidence score propagation. The first results are evaluated with public Czech contract documents reaching the item extraction accuracy of almost 88%.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.119.133.96

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Ha, H.; Horák, A. and Bui, M. (2021). Contract Metadata Identification in Czech Scanned Documents. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-484-8; ISSN 2184-433X, SciTePress, pages 795-802. DOI: 10.5220/0010243807950802

@conference{icaart21,
author={Hien Thi Ha. and Aleš Horák. and Minh Tuan Bui.},
title={Contract Metadata Identification in Czech Scanned Documents},
booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2021},
pages={795-802},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010243807950802},
isbn={978-989-758-484-8},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Contract Metadata Identification in Czech Scanned Documents
SN - 978-989-758-484-8
IS - 2184-433X
AU - Ha, H.
AU - Horák, A.
AU - Bui, M.
PY - 2021
SP - 795
EP - 802
DO - 10.5220/0010243807950802
PB - SciTePress