loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Michael Sildatke 1 ; Hendrik Karwanni 1 ; Bodo Kraft 1 and Albert Zündorf 2

Affiliations: 1 FH Aachen, University of Applied Sciences, Germany ; 2 University of Kassel, Germany

Keyword(s): Modelling of Distributed Systems, Model Driven Architectures and Engineering, Software Metrics and Measurement, Agile Methodologies and Applications, Domain Specific and Multi-aspect IS Engineering.

Abstract: Companies often have to extract information from PDF documents by hand since these documents only are human-readable. To gain business value, companies attempt to automate these processes by using the newest technologies from research. In the field of table analysis, e.g., several hundred approaches were introduced in 2019. The formats of those PDF documents vary enormously and may change over time. Due to that, different and high adjustable extraction strategies are necessary to process the documents automatically, while specific steps are recurring. Thus, we provide an architectural pattern that ensures the modularization of strategies through microservices composed into pipelines. Crucial factors for success are identifying the most suitable pipeline and the reliability of their result. Therefore, the automated quality determination of pipelines creates two fundamental benefits. First, the provided system automatically identifies the best strategy for each input document at runtim e. Second, the provided system automatically integrates new microservices into pipelines as soon as they increase overall quality. Hence, the pattern enables fast prototyping of the newest approaches from research while ensuring that they achieve the required quality to gain business value. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.218.129.100

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Sildatke, M.; Karwanni, H.; Kraft, B. and Zündorf, A. (2022). ARTIFACT: Architecture for Automated Generation of Distributed Information Extraction Pipelines. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS; ISBN 978-989-758-569-2; ISSN 2184-4992, SciTePress, pages 17-28. DOI: 10.5220/0010987000003179

@conference{iceis22,
author={Michael Sildatke. and Hendrik Karwanni. and Bodo Kraft. and Albert Zündorf.},
title={ARTIFACT: Architecture for Automated Generation of Distributed Information Extraction Pipelines},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS},
year={2022},
pages={17-28},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010987000003179},
isbn={978-989-758-569-2},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS
TI - ARTIFACT: Architecture for Automated Generation of Distributed Information Extraction Pipelines
SN - 978-989-758-569-2
IS - 2184-4992
AU - Sildatke, M.
AU - Karwanni, H.
AU - Kraft, B.
AU - Zündorf, A.
PY - 2022
SP - 17
EP - 28
DO - 10.5220/0010987000003179
PB - SciTePress