Graph-based ETL Processes for Warehousing Statistical Open Data

Alain Berro, Imen Megdiche, Olivier Teste

2015

Abstract

Warehousing is a promising mean to cross and analyse Statistical Open Data (SOD). But extracting structures, integrating and defining multidimensional schema from several scattered and heterogeneous tables in the SOD are major problems challenging the traditional ETL (Extract-Transform-Load) processes. In this paper, we present a three step ETL processes which rely on RDF graphs to meet all these problems. In the first step, we automatically extract tables structures and values using a table anatomy ontology. This phase converts structurally heterogeneous tables into a unified RDF graph representation. The second step performs a holistic integration of several semantically heterogeneous RDF graphs. The optimal integration is performed through an Integer Linear Program (ILP). In the third step, system interacts with users to incrementally transform the integrated RDF graph into a multidimensional schema.

References

  1. Bergamaschi, S., Guerra, F., Orsini, M., Sartori, C., and Vincini, M. (2011). A semantic approach to etl technologies. Data and Knowledge Engineering, 70(8):717 - 731.
  2. Berro, A., Megdiche, I., and Teste, O. (2014). A contentdriven ETL processes for open data. In New Trends in Database and Information Systems II - Selected papers of the 18th East European Conference on Advances in Databases and Information Systems and Associated Satellite Events, ADBIS 2014 Ohrid, Macedonia, pages 29-40.
  3. Birkhoff, G. (1967). Lattice Theory. American Mathematical Society, 3rd edition.
  4. Etcheverry, L., Vaisman, A., and Zimnyi, E. (2014). Modeling and querying data warehouses on the semantic web using QB4OLAP. In Proceedings of the 16th International Conference on Data Warehousing and Knowledge Discovery, DaWaK'14, Lecture Notes in Computer Science. Springer-Verlag.
  5. Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2):37-50.
  6. Lenz, H.-J. and Shoshani, A. (1997). Summarizability in olap and statistical data bases. In Scientific and Statistical Database Management, 1997. Proceedings., pages 132-143.
  7. Malinowski, E. and Zimányi, E. (2006). Hierarchies in a multidimensional model: From conceptual modeling to logical representation. Data Knowl. Eng., 59(2):348-377.
  8. Mansmann, S. and Scholl, M. H. (2007). Empowering the olap technology to support complex dimension hierarchies. IJDWM, 3(4):31-50.
  9. Mazón, J.-N., Lechtenbrger, J., and Trujillo, J. (2010). A survey on summarizability issues in multidimensional modeling. In JISBD, pages 327-327. IBERGARCETA Pub. S.L.
  10. Plastria, F. (2002). Formulating logical implications in combinatorial optimisation. European Journal of Operational Research, 140(2):338 - 353.
  11. Prat, N., Megdiche, I., and Akoka, J. (2012). Multidimensional models meet the semantic web: defining and reasoning on OWL-DL ontologies for OLAP. In DOLAP 2012, ACM 15th International Workshop on Data Warehousing and OLAP, pages 17-24.
  12. Rahm, E. and Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. VLDB JOURNAL, 10.
  13. Ravat, F., Teste, O., Tournier, R., and Zurfluh, G. (2008). Algebraic and graphic languages for OLAP manipulations. International Journal of Data Warehousing and Mining, 4(1):17-46.
  14. Romero, O. and Abelló, A. (2007). Automating multidimensional design from ontologies. In Proceedings of the ACM Tenth International Workshop on Data Warehousing and OLAP, DOLAP 7807, pages 1-8. ACM.
  15. Wang, X. (1996). Tabular abstraction, editing, and formatting. Technical report, University of Waretloo, Waterloo, Ontaria, Canada.
  16. Wu, Z. and Palmer., M. (1994). Verb semantics and lexical selection. In In 32nd. Annual Meeting of the Association for Computational Linguistics, New Mexico State University, Las Cruces, New Mexico., pages 133-138.
Download


Paper Citation


in Harvard Style

Berro A., Megdiche I. and Teste O. (2015). Graph-based ETL Processes for Warehousing Statistical Open Data . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-096-3, pages 271-278. DOI: 10.5220/0005363302710278


in Bibtex Style

@conference{iceis15,
author={Alain Berro and Imen Megdiche and Olivier Teste},
title={Graph-based ETL Processes for Warehousing Statistical Open Data},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2015},
pages={271-278},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005363302710278},
isbn={978-989-758-096-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Graph-based ETL Processes for Warehousing Statistical Open Data
SN - 978-989-758-096-3
AU - Berro A.
AU - Megdiche I.
AU - Teste O.
PY - 2015
SP - 271
EP - 278
DO - 10.5220/0005363302710278