databases in a Data Lake. The paper (Diamantini et 
al., 2018) proposes an approach to structure the data 
of a Data Lake by linking the data sources in the form 
of  a  graph  composed  of  keywords.  Other  works 
propose  a  metamodel  unifying  several  data  source 
metamodels  thus  allowing  the  unification  of  data 
from multiple sources. The authors in (Candel, Ruiz, 
&  García-Molina,  2021)  proposed  a  metamodel 
unifying  the  logical  schemas  of  the  four  most 
common  types  of  NoSQL  systems  and  relational 
systems. Our solution is based on extracting data from 
a data lake and loading it into a data warehouse. To 
do so, we first proposed two metamodels representing 
the  physical  models  of  each  database:  the  first 
metamodel concerns relational  databases as  source, 
the  second  metamodel  concerns  document-oriented 
NoSQL databases as target database of our solution. 
We  used  EMF  as  our  metamodeling  tools.  EMF 
allowed us to formalize the transformation rules from 
a  source  metamodel  to  the  target  metamodel.  We 
relied  on  the  QVT  standard  to  express  our 
transformation rules. The solution we propose allows 
the  interrogation  of  data  contained  in  a  Data  Lake 
thanks to the creation of a Data Warehouse. 
9  CONCLUSION AND 
PERSPECTIVES 
This paper proposed a process to ingest data from a 
Data Lake to a Data Warehouse; this one is made of 
a unique NoSQL database and the Data Lake contains 
several databases. We have limited the content of the 
Data  Lake  to  relational  databases.  Three  modules 
ensure  the  ingestion  of  the  data.  The  CreateDW 
module  transforms  each  relational  database  into  a 
unique NoSQL database by applying MDA rules.  
This  mechanism  will  be  used  and  extended  to 
transform other types of databases in the Data Lake. 
The ConvertLinks module translates relational links 
(keys)  into  references  in  accordance  with  the 
principles  of  object  databases  supported  by  the 
OrientDB system. Finally, the MergeClasses module 
merges semantically equivalent classes from different 
Data  Lake  databases;  this  merge  is  based  on  an 
ontology provided by business experts.    
Currently,  we  are  continuing  our  work  on  the 
ingestion of other types of data sources from a Data 
Lake. Indeed, the Data Lake of our medical case study 
contains various database types. 
 
 
 
REFERENCES 
Alotaibi,  R.,  Cautis,  B.,  Deutsch, A., Latrache, M., 
Manolescu,  I.,  &  Yang,  Y.  (2020).  ESTOCADA : 
Towards scalable polystore systems. Proceedings of the 
VLDB Endowment, 13(12), 2949‑2952. 
Bruel, J., Combemale, B., Guerra, E., Jézéquel, J., Kienzle, 
J., Lara, J., Mussbacher, G., and al. (2019). Comparing 
and classifying model transformation reuse approaches 
across metamodels. Software and Systems Modeling. 
Candel, C. J. F., Ruiz, D. S., & García-Molina, J. J. (2021). 
A  Unified  Metamodel  for  NoSQL  and  Relational 
Databases. ArXiv:2105.06494 [cs].  
Chickerur,  S.,  Goudar,  A.,  &  Kinnerkar,  A.  (2015). 
Comparison  of  Relational  Database  with  Document-
Oriented  Database  (MongoDB)  for  Big  Data 
Applications.  2015 8th International Conference on 
Advanced Software Engineering Its Applications 
(ASEA) (p. 41‑47).  
Diamantini, C., Lo Giudice, P., Musarella, L., Potena, D., 
Storti, E., & Ursino, D. (2018). A New Metadata Model 
to  Uniformly  Handle  Heterogeneous  Data  Lake 
Sources:  ADBIS  2018  Short  Papers  and  Workshops, 
AI*QA, BIGPMED, CSACDB, M2U, BigDataMAPS, 
ISTREND,  DC,  Budapest,  Hungary,  September,  2-5, 
2018, Proceedings (p. 165‑177). 
Duggan, J., Kepner, J., Elmore, A. J., & Madden, S. (2015). 
The  BigDAWG Polystore  System.  SIGMOD Record, 
44(2), 6. 
El Malki, M., Kopliku, A., Sabir, E., & Teste, O. (2018). 
Benchmarking Big Data OLAP NoSQL Databases. In 
N. Boudriga, M.-S. Alouini, S. Rekhis, E. Sabir, & S. 
Pollin (Éds.), Ubiquitous Networking, Lecture Notes in 
Computer  Science  (Vol.  11277,  p.  82‑94).  Cham: 
Springer International Publishing.  
Erraissi,  A.,  &  Banane,  M.  (2020).  Managing  Big  Data 
using Model Driven Engineering: From Big Data Meta-
model to Cloudera PSM meta-model (p. 1235‑1239). 
Hanine, M., Bendarag, A., & Boutkhoum, O. (2015). Data 
Migration  Methodology  from  Relational  to  NoSQL 
Databases, 9(12), 6. 
Khine,  P.  P.,  &  Wang,  Z.  S.  (2018).  Data  lake :  A  new 
ideology in big data era. ITM Web of Conferences, 17.  
Liyanaarachchi, G., Kasun, L., Nimesha, M., Lahiru, K., & 
Karunasena, A. (2016). MigDB - relational to NoSQL 
mapper.  2016 IEEE International Conference on 
Information and Automation for Sustainability (ICIAfS) 
(p. 1‑6).  
Mahmood,  A.  A.  (2018).  Automated Algorithm  for  Data 
Migration  from  Relational  to  NoSQL  Databases.  Al-
Nahrain Journal for Engineering Sciences, 21(1), 60. 
Meehan, J., Tatbul, N., Aslantas, C., & Zdonik, S. (s. d.). 
Data Ingestion for the Connected World, 11. 
Nargesian, F., Zhu, E., Miller, R. J., Pu, K. Q., & Arocena, 
P. C. (2019). Data lake management:  Challenges  and 
opportunities. Proceedings of the VLDB Endowment, 
12(12), 1986‑1989. 
Stanescu, L., Brezovan, M., & Burdescu, D. D. Federated 
Conference  on  Computer  Science  and  Information 
Systems  (2016).  Automatic  Mapping  of  MySQL 
Databases to NoSQL MongoDB (p. 837‑840).