Moving towards a General Metadata Extraction Solution for Research Data with State-of-the-Art Methods

Benedikt Heinrichs, Marius Politze


Many research data management processes, especially those defined by the FAIR Guiding Principles, rely on metadata for making it findable and re-usable. Most Metadata workflows however require the researcher to describe their data manually, a tedious process which is one of the reasons it is sometimes not done. Therefore, automatic solutions have to be used in order to ensure the findability and re-usability. Current solutions only focus and are effective on extracting metadata in single disciplines using domain knowledge. This paper aims, therefore, at identifying the gaps in current metadata extraction processes and defining a model for a general extraction pipeline for research data. The results of implementing such a model are discussed and a proof-of-concept is shown in the case of video-based data. This model is basis for future research as a testbed to build and evaluate discipline-specific automatic metadata extraction workflows.


Paper Citation