5.4  Validation 
Concerning  the  model  extraction  of  schema-less 
NoSQL databases, our approach allows to display to 
the developer simultaneously a conceptual model and 
a physical model; the first to understand the semantics 
of the  database and the  second to write  queries. To 
evaluate the relevance of our approach, our prototype 
(section 4) was implemented by three developers at 
Trimane,  a  digital  services  company  specialized  in 
business  intelligence  and  Big  Data.  The  three 
experienced  developers  (IT  consulting  engineers) 
were tasked with providing maintenance for three 
separate applications. None of the developers know, 
previously,  the  data  model  of  the  concerned 
applications.  For  each  application,  each  developer 
writes ten queries that have an increasing complexity 
according  to  three  different  cases:  (1)  without  any 
data model, (2) with  the physical  data model  or (3) 
with  the  both  conceptual  and  physical  models. 
Figures 7(a) and 7(b) show respectively an example 
of the conceptual and physical models corresponding 
to one of the three applications. Note that due to lack 
of  place,  we  present  data  models  (conceptual  and 
physical one) of only one application.  
We  should  also  highlight  that  for  reasons  of 
visibility,  models  are  represented  to  the  user  in  the 
same  screen and  with an  appropriate format: JSON 
for the physical model and the graphic format for the 
conceptual one. Each time we click on a class on the 
conceptual model, we will have its equivalent on the 
physical model. For example, the part of the physical 
model  written  in  bold  corresponds  to  the  selected 
class (Trials). 
Each database is associated with a set of queries 
whose natural language statements are provided to the 
three  developers.  In  Table  3,  we  calculated  the 
average  time  of  writing  the  queries  by  the  three 
developers  in  each  situation:  (1)  without  any  data 
model, (2) with the physical data model or (3) with 
the both conceptual and physical models. 
Our  initial  hypothesis  was  verified  in  the 
situations  considered.  This  establishes  that  a 
knowledge of semantics and data structure allows the 
developer  to  write  queries  faster  on  a  schema-less 
NoSQL database. The small difference noted between 
the use of the single physical diagram and the use of 
the two models (conceptual and physical), is probably 
due to the experience of the three developers. 
 
6  CONCLUSION AND FUTURE 
WORK 
Our  work  is  part  of  Big  Data  databases.  They  are 
currently  dealing  with  the  reverse  engineering 
mechanisms  of  schema-less  NoSQL  databases  to 
provide  users  with  models  to  manipulate  NoSQL 
databases.  
In  this  article,  we  have  proposed  an  automatic 
process  ToConceptualModel  which  focuses  on  the 
transformation of a physical model into a conceptual 
model  represented  using  a  UML  class  diagrams  by 
applying  a  set  of  rules.  The  resulting  conceptual 
model  makes  it  easier  for  developers  and  decision-
makers to understand the database and write queries. 
To  formalize  and  automate our  process,  we  use  the 
Model Driven Architecture (MDA) proposed by the 
OMG,  which  provides  a  formal  framework  for 
automating model transformations. 
The  major  contribution  of  our  solution  is  the 
consideration  of  structured  attributes,  association 
relationships,  composition  relationships  as  well  as 
association  classes.  We  have  experimented  our 
process  on  the case  of  a  medical  application  which 
relates  to  scientific  programs  of  follow-up  of 
pathologies;  the  database  is  stored  on  a  document-
oriented NoSQL Database. 
As  future  work,  we  plan  to  complete  our 
transformation process to have more semantics in the 
conceptual model by considering other types of links 
such as inheritance, aggregation and N-ary. 
REFERENCES 
Angadi,  A.  B.,  &  Gull,  K.  C.  (2013).  Growth  of  New 
Databases  &  Analysis  of  NOSQL  Datastores. 
International  Journal  of  Advanced  Research  in 
Computer Science and Software Engineering, 3, 1307-
1319. 
Baazizi, M. A., Lahmar, H. B., Colazzo, D., Ghelli, G., & 
Sartiani,  C.  (2017,  March).  Schema  inference  for 
massive  JSON  datasets.  In Extending  Database 
Technology (EDBT). 
Baazizi,  M.  A.,  Colazzo,  D.,  Ghelli,  G.,  &  Sartiani,  C. 
(2019). Parametric schema inference for massive JSON 
datasets. The VLDB Journal, 1-25. 
Bondiombouy,  C.  (2015).  Query  processing  in  cloud 
multistore  systems.  In  BDA  :  Bases  de  Données 
Avancées. 
Budinsky, F., Steinberg, D., Ellersick, R., Grose, T. J., & 
Merks,  E.  (2004).  Eclipse  modeling  framework:  a 
developer's guide. Addison-Wesley Professional. 
Chen,  CL  Philip  et  Zhang,  Chun-Yang.  Data-intensive 
applications,  challenges,  techniques  and technologies: