Authors: Johannes Theissen-Lipp 1 ; 2 ; Niklas Schäfer 3 ; Max Kocher 2 ; Philipp Hochmann 2 ; Michael Riesener 3 and Stefan Decker 2 ; 1

Affiliations: 1 Fraunhofer Institute for Applied Information Technology FIT, Sankt Augustin, Germany ; 2 Chair of Databases and Information Systems, RWTH Aachen University, Aachen, Germany ; 3 Laboratory for Machine Tools and Production Engineering (WZL), RWTH Aachen University, Aachen, Germany

Keyword(s): RML, RDF Mappings, Heterogeneous Formats, Domain-Specific Language, IFC, UML.

Abstract: Across many domains, the growing amount of data presents a challenge in extracting meaningful insights. A significant hurdle is the accurate interpretation and integration of data from diverse sources, often dictated by their specific applications. The RDF Mapping Language (RML), based on the W3C recommendation R2RML, can be used to transform heterogeneous data formats to RDF using defined mappings. However, existing RML implementations only support a limited set of (semi-)structured data sources such as CSV, SQL, XML, and JSON, neglecting numerous use-cases relying on other formats. This work overcomes this limitation by proposing a methodology to flexibly extend RML to support additional source formats. We systematically analyze RML and its implementations to derive a generic concept for the extension of RML. Our contributions include a general workflow for extending RML with new formats and demonstrative implementations of the RML Mapper for two examples from Building Information Modeling (BIM) and UML class diagrams. Leveraging open-source code forks and a demonstrative domain-specific language ensures easy portability to any other source format. The evaluation covers authoring of mappings, runtime performance, and practical applicability. The results affirm the effectiveness of our generic methodology for extending RML mappings to include additional source formats. (More)


