Historical Document Processing: A Survey of Techniques, Tools, and Trends

James Philips; Nasseh Tabrizi

doi:10.5220/0010177403410349

Historical Document Processing: A Survey of Techniques, Tools, and Trends

James Philips, Nasseh Tabrizi

2020

Abstract

Historical Document Processing (HDP) is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from computer vision, document analysis and recognition, natural language processing, and machine learning to convert images of ancient manuscripts and early printed texts into a digital format usable in data mining and information retrieval systems. As libraries and other cultural heritage institutions have scanned their historical document archives, the need to transcribe the full text from these collections has become acute. Since HDP encompasses multiple sub-domains of computer science, knowledge relevant to its purpose is scattered across numerous journals and conference proceedings. This paper surveys the major phases of HDP, discussing standard algorithms, tools, and datasets and finally suggests directions for further research.

Download

Paper Citation

in Harvard Style

Philips J. and Tabrizi N. (2020). Historical Document Processing: A Survey of Techniques, Tools, and Trends. In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR; ISBN 978-989-758-474-9, SciTePress, pages 341-349. DOI: 10.5220/0010177403410349

in Bibtex Style

@conference{kdir20,
author={James Philips and Nasseh Tabrizi},
title={Historical Document Processing: A Survey of Techniques, Tools, and Trends},
booktitle={Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR},
year={2020},
pages={341-349},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010177403410349},
isbn={978-989-758-474-9},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR
TI - Historical Document Processing: A Survey of Techniques, Tools, and Trends
SN - 978-989-758-474-9
AU - Philips J.
AU - Tabrizi N.
PY - 2020
SP - 341
EP - 349
DO - 10.5220/0010177403410349
PB - SciTePress