ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools

Laura Pandolfo, Luca Pulina

2021

Abstract

The amount of data available on the Web has grown significantly in the past years, increasing thus the need for efficient techniques able to retrieve information from data in order to discover valuable and relevant knowledge. In the last decade, the intersection of the Information Extraction and Semantic Web areas is providing new opportunities for improving ontology-based information extraction tools. However, one of the critical aspects in the development and evaluation of this type of system is the limited availability of existing annotated documents, especially in domains such as the historical one. In this paper we present the current state of affairs about our work in building a large and real-world RDF dataset with the purpose to support the development of Ontology-Based extraction tools. The presented dataset is the result of the efforts made within the ARKIVO project and it counts about 300 thousand triples, which are the outcome of the manually annotation process executed by domain experts. ARKIVO dataset is freely available and it can be used as a benchmark for the evaluation of systems that automatically annotate and extract entities from documents.

Download


Paper Citation


in Harvard Style

Pandolfo L. and Pulina L. (2021). ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools. In Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-536-4, pages 341-345. DOI: 10.5220/0010677000003058


in Bibtex Style

@conference{webist21,
author={Laura Pandolfo and Luca Pulina},
title={ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools},
booktitle={Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2021},
pages={341-345},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010677000003058},
isbn={978-989-758-536-4},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - ARKIVO Dataset: A Benchmark for Ontology-based Extraction Tools
SN - 978-989-758-536-4
AU - Pandolfo L.
AU - Pulina L.
PY - 2021
SP - 341
EP - 345
DO - 10.5220/0010677000003058