A Novel Approach for Real-time Extracting Data From NoSQL
Embedded Data Bases
Afef Gueidi
1,2
, Hamza Gharsellaoui
1,3,4
and Samir Ben Ahmed
1,2
1
LISI Laboratory, National Institute of Applied Sciences and Technology (INSAT), Carthage University, Tunis, Tunisia
2
Faculty of Mathematical, Physical and Natural Sciences of Tunis (FST), Tunis El Manar University, Tunis, Tunisia
3
National Engineering School of Carthage (ENIC), Carthage University, Tunis, Tunisia
4
Al Jouf College of Technology, TVTC, Sakaka, K.S.A.
Keywords:
A NoSQL Database, Embedded Databases, Optimization Problems, Real-time Interrogation.
Abstract:
In nowadays industry, embedded databases which display a mixture of multimedia signals and big data with
modal interfaces need to be highly reconfigurable to meet real-time constraints, data stores optimization prob-
lems and to solve requirements problem in order to achieve high scalability and availability. This paper deals
with a multi-objective extracting, managing and interrogating problem for a reconfigurable real-time embed-
ded databases. In this case, new methods are required and NoSQL database created for solving the mentioned
sub-problems and to be able to store big data effectively, demand for high performance when extracting, man-
aging and interrogating these embedded databases. We also discuss the advantages and disadvantages of a
NoSQL approach.
1 INTRODUCTION
Nowadays, due to the growing class of portable sys-
tems, such as personal computing and communica-
tion devices, embedded and real-time systems contain
new complex software which are increasing by the
time. This complexity is growingbecause many avail-
able software development models don’t take into ac-
count the specific needs of embedded and systems de-
velopment. The software engineering principles for
embedded system should address specific constraints
such as hard timing constraints, limited memory and
power use, predefined hardware platform technology,
and hardware costs. On the other hand, the new gen-
erations of embedded control systems are addressing
new criteria such as flexibility and agility (Gharsel-
laoui, 2012). For these reasons, there is a need to de-
velop tools, methodologies in embedded software en-
gineering and dynamic reconfigurable embedded con-
trol systems as an independent discipline. Each sys-
tem is a subset of tasks. Each task is characterized
by its worst case execution times (WCETs) C
p,ψ
h
i
, an
offset (release time) a
p,ψ
h
i
, a period T
p,ψ
h
i
and a dead-
line D
p,ψ
h
i
for each reconfiguration scenario ψ
h
, (h
1..M, we assume that we have M reconfiguration sce-
narios) and on each processor p, (p 1..K, we as-
sume that we have K identical processors numbered
from 1 to K), and n real-time tasks numbered from 1
to n that composed a feasible subset of tasks entitled
ξ
old
and need to be scheduled (Gharsellaoui, 2012).
The general goal is to be reassured that any reconfig-
uration scenario ψ
h
changing the implementation of
the embedded system does not violate real-time con-
straints: i.e. the system is feasible and meets real-
time constraints even if we change its implementation
after any reconfiguration scenario. In (Gharsellaoui,
2013) and in order to obtain this optimization, the au-
thors propose an intelligent agent-based architecture
in which a software agent is deployed to dynamically
adapt the system to its environment by applying re-
configuration scenarios. A reconfiguration scenario
ψ
h
means the addition, removal or update of tasks in
order to save the whole system on the occurrence of
hardware/software faults, or also to improve its per-
formance when random disturbances happen at run-
time. In this paper, a software agent in the intelligent
agent-based architecture interrogates NoSQL embed-
ded database which ensures predictive and organized
software reuse to achieve high scalability and avail-
ability and to optimize response time. This process
for solving the mentioned sub-problems and to be
able to store big data effectively, demand for high per-
formance when extracting, managing and interrogat-
ing these embedded databases is proven. Our study
226
Gueidi, A., Gharsellaoui, H. and Ahmed, S.
A Novel Approach for Real-time Extracting Data From NoSQL Embedded Data Bases.
DOI: 10.5220/0005987502260231
In Proceedings of the 11th International Joint Conference on Software Technologies (ICSOFT 2016) - Volume 1: ICSOFT-EA, pages 226-231
ISBN: 978-989-758-194-6
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
concerns embedded databases which display a mix-
ture of multimedia signals and big data with modal in-
terfaces need to be highly reconfigurable to meet real-
time constraints, data stores optimization problems
and to solve requirements problem in order to achieve
high scalability and availability. Indeed, currently,
embedded databases are omnipresent electronic de-
vices and embedded technologies presented in all in-
dustrial areas like telecommunication, avionic, auto-
motive, medicine, etc,. On one hand, the volume of
data is increasing at an enormous rate in these em-
bedded technologies and on the other hand, the cost
associated with scaling of the relational RDBMs is
began also very expensive. Also, nowadays, industry
and especially embedded systems have now to deal
with faster and faster evolutions of their users require-
ments and must be able to adapt their behavior system
and to meet real-time constraints. As a consequence,
embedded databases have to evolve in order to be-
come more reconfigurable which satisfy embedded
systems needs and has very speed response time and
low-power cost at the same time (Gajendran, 2012).
In contrast, NoSQL data stores are designed to scale
well horizontally and run on commodity hardware.
The term ”NoSQL” was first coined in 1998 by Carlo
Strozzi to distinguish his solution from other RDMBS
solutions which utilize SQL (Strozzis NoSQL still
adheres to the relational model). He used the term
NoSQL just for the reason that his database did not
expose a SQL interface. Recently, the term NoSQL
(meaning ”not only SQL”) has come to describe a
large class of databases which do not have proper-
ties of traditional relational databases and which are
generally not queried with SQL (structured query lan-
guage) (Gajendran, 2012). The term revived in the re-
cent times with big companies like Google/Amazon
using their own data stores to store and process huge
amounts of data as they appear in their applications
and inspiring other vendors as well on these terms
(Gajendran, 2012).
The ability of optimizing the resource allocation in a
distributed environment through the management of
expansion and contraction of available tasks is an im-
portant feature in NoSQL DBMS. With the changes in
available embedded systems, it is possible to automat-
ically redistribute the data or to have different shards
of data. This ability is important to the performance
of the database because it influences the latency of
the system. NoSQL database was designed to over-
come limitations of relational database in supporting
distributed processing of data. For this reason and
in order to optimize the whole reconfigurable real-
time embedded systems, we will adopt in our present
and original work the NoSQL database for the opti-
mization of multi-objective extracting, managing and
interrogating of a reconfigurable real-time embedded
databases to meet real-time constraints and to reduce
storage memory and response time in some critical
applications.
The paper is organized as follows. In the next sec-
tion, we describe the NoSQL database statement and
its four categories. In section 3, we present the liter-
ature review study and given the advantages and dis-
advantages of NoSQL database. We give in section 4,
a our original proposed NoSQL-based approach for
real-time managing of embedded databases to high-
light our study. Finally, we present conclusions and
perspectives to our work in section 5.
2 NoSQL DATABASES
Generally, NoSQL isn’t relational, and it is designed
for distributed data stores for very large scale data
needs (e.g. Facebook or Twitter accumulate Ter-
abits of data every day for millions of its users),
there is no fixed schema and no joins. Meanwhile,
relational database management systems (RDBMS)
”scale up” by getting faster and faster hardware and
adding memory. NoSQL, on the other hand, can take
advantage of ”scaling out” - which means spreading
the load over many commodity systems (Mikayel,
2011). The acronym NoSQL was coined in 1998,
and while many think NoSQL is a derogatory term
created to poke fun at SQL, in reality it means ”Not
Only SQL” rather than ”No SQL at all. The idea is
that both technologies (NoSQL and RDBMSs) can
co-exist and each has its place. Companies like Face-
book, Twitter, Digg, Amazon, LinkedIn and Google
all use NoSQL in some way - so the term has been in
the current news often over the past few years.
2.1 Services Model
The processing to be performed as part of activi-
ties management can be grouped into ve broad cat-
egories: computing services, analysis, archive ser-
vices, display and CRUD services. The CRUD ser-
vices (Create, Read, Update and Delete) are low-level
services that enable document management. Indeed,
RDBMSs have their limitations like these three fol-
lowing problems:
1. RDBMSs use a table-based normalization ap-
proach to data, and that’s a limited model. Cer-
tain data structures cannot be represented without
tampering with the data, programs, or both.
2. They allow versioning or activities like: Create,
Read, Update and Delete. For databases, up-
A Novel Approach for Real-time Extracting Data From NoSQL Embedded Data Bases
227
dates should never be allowed, because they de-
stroy information. Rather, when data changes, the
database should just add another record and note
duly the previous value for that record.
3. Performance falls off as RDBMSs normalize data.
The reason: Normalization requires more tables,
table joins, keys and indexes and thus more in-
ternal database operations for implement queries.
Pretty soon, the database starts to grow into
the terabytes, and that’s when things slow down
(Mikayel, 2011).
2.2 The Four Categories of NoSQL
In this subsection, we will describe the four categories
of NoSQL databases.
2.2.1 Key-values Stores
The main idea here is using a hash table where there
is a unique key and a pointer to a particular item of
data. The Key/value model is the simplest and eas-
iest to implement. But it is inefficient when you are
only interested in querying or updating part of a value,
among other disadvantages (Mikayel, 2011).
2.2.2 Column Family Stores
These were created to store and process very large
amounts of data distributed over many machines.
There are still keys but they point to multiple
columns. The columns are arranged by column fam-
ily (Mikayel, 2011).
2.2.3 Document Databases
These were inspired by Lotus Notes and are similar
to key-value stores. The model is basically versioned
documents that are collections of other key-value col-
lections. The semi-structured documents are stored
in formats like JSON. Document databases are essen-
tially the next level of Key/value, allowing nested val-
ues associated with each key. Document databases
support querying more efficiently (Mikayel, 2011).
2.2.4 Graph Databases
Instead of tables of rows and columns and the rigid
structure of SQL, a flexible graph model is used
which, again, can scale across multiple machines.
NoSQL databases do not provide a high-level declar-
ative query language like SQL to avoid overtime in
processing. Rather, querying these databases is data-
model specific. Many of the NoSQL platforms allow
for RESTful interfaces to the data, while other offer
query APIs. Generally, the best places to use NoSQL
technology is where the data model is simple; where
flexibility is more important than strict control over
defined data structures; where high performance is
a must; strict data consistency is not required; and
where it is easy to map complex values to known keys
(Mikayel, 2011).
2.3 Some Examples of When to Use
NoSQL
The following are some examples when to use
NoSQL database:
Logging/Archiving.
Log-mining tools are handy because they can access
logs across servers, relate them and analyze them.
Social Computing Insight.
Many enterprises today have provided their users
with the ability to do social computing through
message forums, blogs etc.
External Data Feed Integration.
Many companies need to integrate data coming from
business partners. Even if the two parties conduct
numerous discussions and negotiations, enterprises
have little control over the format of the data coming
to them. Also, there are many situations where
those formats change very frequently - based on the
changes in the business needs of partners.
Front-end Order Processing Systems.
Today, the volume of orders, applications and ser-
vice requests flowing through different channels to
retailers, bankers and Insurance providers, enter-
tainment service providers, logistic providers, etc.
is enormous. These requests need to be captured
without any interruption whenever an end user makes
a transaction from anywhere in the world. After,
a reconciliation system typically updates them to
back-end systems as well as updates the end user on
his/her order status.
Real-time Stats/Analytics.
Sometimes, it is necessary to use the database as
a way to track real-time performance metrics for
websites (page views, unique visits, etc.). Tools
like Google Analytics are great but not real-time -
sometimes it is useful to build a secondary system
that provides basic real-time stats. Other alternatives,
such as 24/7 monitoring of web traffic, are a good
way to go, too.
ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications
228
3 LITERATURE REVIEW
In the last days, there are so many NoSQL systems
that it’s hard to get a quick overview of the ma-
jor trade-offs involved when evaluating relational and
non-relational systems in non-single-server environ-
ments. We will give in this section some preliminaries
about NoSQL database and its characteristics.
3.1 Background
NoSQL databases are finding significance in bigdata
analytics, real time applications and Online Transac-
tion Processing (OLTP). Also, most RDBMSs guar-
antee so called ACID transactions. ACID is an
acronym of four properties. These are, in order (Ull-
man, 2008):
Atomicity: Transactions are atomic. That is, a
transaction is executed either in its entirety or not
at all.
Consistency: Every transaction takes the
database from one valid state to another.
Isolation: A running transaction will not interfere
with another transaction.
Durability: The effect of a transaction must per-
sist and thus never be lost.
Most NoSQL databases share a common set of char-
acteristics (Strauch, 2012). Naturally, since NoSQL
is a broad concept, more or less all of these character-
istics have exceptions. Still, they can serve to give a
general idea about what NoSQL databases are:
Distributed: NoSQL databases are often dis-
tributed systems where several machines cooper-
ate in clusters to provide clients with data. Each
piece of data is commonly replicated over several
machines for redundancy and high availability.
Horizontal Scalability: Nodes can often be dy-
namically added to (or removed from) a cluster
without any downtime, giving linear effects on
storage and overall processing capacities. Usu-
ally, there is no (realistic) upper bound on the
number of machines that can be added.
Built for Large Volumes: Many NoSQL systems
were built to be able to store and process enor-
mous amounts of data quickly.
BASE instead of ACID: Brewer’s CAP theorem
(Brewer., 2012) states that a distributed system
can have at most two of the three properties
Consistency, Availability and Partition tolerance.
For a growing number of applications, having the
last two are most important. Building a database
with these while providing ACID properties is
difficult, which is why Consistency and Isolation
often are forfeited, resulting in the so called
BASE approach (Brewer., 2012).
SQL is unsupported: While most RDBMSs sup-
port some dialect of SQL, NoSQL variants gener-
ally do not. Instead, each individual system has
its own query interface. Recently, the Unstruc-
tured Data Query Language 6 was proposed as an
attempt to unify the query interfaces of NoSQL
databases (Jackson., 2012).
3.2 CAP theorem and NoSQL Database
Classification
In 2000, Professor Eric Brewer put forward the fa-
mous CAP theorem. That is, Consistency, Availabil-
ity, tolerance of network Partition. CAP theorem’s
core idea is a distributed system cannot meet the three
district needs simultaneously, but can only meet two.
According to CAP theorem and different concerns
of NoSQL database, a preliminary classification of
NOSQL databases is as follows (Nathan, 2010):
Concerned about Consistency and Availability
(CA): Part of the database is not concerned about
the partition tolerance, and mainly use of Repli-
cation approach to ensure data consistency and
availability. Systems concern the CA are: the
traditional relational database, Vertica (Column-
oriented), Aster Data (Relational), Greenplum
(Relational) and so on.
Concerned about Consistency and Partition
Tolerance (CP): Such a database system stores
data in the distributed nodes, but also ensure
the consistency of these data, but support not
good enough for the availability. The main CP
system: BigTable (Column-oriented), Hypertable
(Column-oriented), HBase (Column-oriented),
MongoDB (Document), Terrastore (Document),
Redis (Key-value), Scalaris (Key-value), Mem-
cacheDB (Key-value), Berkeley DB (Key-value).
Concerned about Availability and Partition
Tolerance (AP): Such systems ensure availabil-
ity and partition tolerance primarily by achiev-
ing consistency, AP’s system: Voldemort (Key-
value), Tokyo Cabinet (Keyvalue), KAI (Key-
value), CouchDB (Documentoriented), Sim-
pleDB (Document-oriented), Riak (Document-
oriented).
In our work, we will adopt the last one (the avail-
ability and partition tolerance (AP)) in order to solve
A Novel Approach for Real-time Extracting Data From NoSQL Embedded Data Bases
229
Figure 1: Figure 2: CAP Theorem.
a multi-objective extracting, managing and interrogat-
ing problem for a reconfigurable real-time embedded
databases and to achieve high scalability and avail-
ability. In this case, NoSQL database created for solv-
ing the mentioned problem and to be able to store big
data effectively, demand for high performance when
extracting, managing and interrogating these embed-
ded databases.
4 PROPOSED APPROACH
In this section and in order to solve the described
problem and to achieve the objective, in the follow-
ing we will provide an optimization algorithm that
takes smaller interrogation response time and big data
store in embedded data bases using NoSQL. NoSQL
database was designed to handle large volumes of
data processing by removing some supports that ex-
isted in RDBMS. One of these supports is ad-hoc
query. Our proposed algorithm is described like the
following:
4.1 Proposed Algorithm
The dynamic based NoSQL database guarantee test in
terms of residual time, which is a convenient parame-
ter to deal with both normal and overload conditions
in embedded databases is presented here.
Algorithm GUARANTEE(ξ; σ
a
)
begin t = get current time();
R
0
= 0;
d
0
= t;
Insert σ
a
in the ordered task linked list;
ξ`= ξ
S
σ
a
;
k = position of σ
a
in the task set ξ`;
for each task σ
i
`such that i k do {
R
i
= R
i1
+ (d
i
- d
i1
) - c
i
;
if (R
i
< 0) then return (”Not Guaranteed”);
}
return (”Guaranteed”);
end
4.2 Design Elements Extraction and
SPL Design Extraction
In order to position our proposed approach in a uni-
fied framework that fulfills the real-time constraints,
we propose some characteristics to be respected.
The organization and structure of the unified frame-
work that fulfills the real-time constraints is retrieved
based on the following construct rules:
R1: Each real-time embedded system is geo-
localized, either to an industry or to mobile de-
vices. thus, it is necessary to physically move re-
sources around to perform the real-time embed-
ded systems.
R2: A real-time embedded system is composed of
tasks. A task is characterized by its worst case ex-
ecution times (WCETs) C
p,ψ
h
i
, an offset (release
time) a
p,ψ
h
i
, a period T
p,ψ
h
i
and a deadline D
p,ψ
h
i
.
The task must respect these temporal characteris-
tics with more or less regular repetitions.
R3: The tasks to be scheduled are predefined and
listed in a NoSQL database ordered by such cri-
teria. For each task, we must define the types of
parameters. This NoSQL database must be avail-
able and regularly maintained and updated in or-
der to be ready for anyembedded system reconfig-
uration, interrogation, tasks modifications or data
extraction.
R4: Continually assess tasks and allow for modi-
fications concerning the evolution of the real-time
system at run-time providing intervention and re-
configuration in order to save the whole system
on the occurrence of hardware/software faults, or
also to improve its performance when random dis-
turbances happen at run-time.
R5: A random disturbance is defined as any ex-
ternal event like adding, removal of tasks or just
modification of task parameters in order to opti-
mize the response time and the storage capacity
of the embedded system database.
4.3 Discussion of NoSQL Proposed
Approach
This proposed process permits us, first, to derive de-
ICSOFT-EA 2016 - 11th International Conference on Software Engineering and Applications
230
sign enriched with optional and mandatory real-time
constraints to be highly reconfigurable to meet real-
time constraints, data stores optimization problems
and to solve requirements problem in order to achieve
high scalability and availability.
To the authors knowledge, no unified approach ex-
ists to handle similar situations for different real-time
embedded systems. This paper work allows service
providers to manage geo-localized (distributed) tasks
based on the NoSQL database regardless of the na-
ture of their services. Finally, we propose to imple-
ment the design of our proposed NoSQL database us-
ing MongoDB and UML Marte profile for the con-
ception model enriched with the information and real-
time constraints in order to re-obtain the feasibility of
the embedded systems under study. This NoSQL ap-
proach presents several advantages over other tech-
nologies in the sense that is easier to structure the
data to specific real-time needs; the design is closer
to the implementation and development is very sim-
ple and it is easier to modify processing and evolve
data structures. Nonetheless, our proposed approach
suffers from some drawbacks, including difficulty ex-
pressing integrity real-time constraints in embedded
databases, installation difficulty and lack of the real-
time control of the transactions. We will work hardly
in the next issues in order to handle these difficulties.
5 CONCLUSION
This paper describes the background of NoSQL, after
understand the four categories of NoSQL, then the ad-
vantages and disadvantages of NoSQL database were
analyzed. After a brief introduction of the CAP theo-
rem, classification of NoSQL database was proposed.
Finally, our proposed approach based on the NoSQL
approach, was presented in such away that will help
user to choose NoSQL database in practice. How-
ever, NoSQL database still have various limitations,
like using NoSQL in cloud computing and internet of
things which is not clear enough and we will be to
strengthen our future research work in these two ar-
eas.
REFERENCES
Brewer., E. (2012). Towards robust distributed systems. Ac-
cessed January 25, 2012. July 2000.
url: http://www.cs.berkeley.edu/brewer/cs262b-2004/
PODC-keynote.pdf.
Gajendran, S. K. (2012). A survey on nosql databases. mas-
ters.dgtu.donetsk.ua.
Jackson., J. (2012). Couchbase, sqlite launch uni-
fied nosql query language. Accessed January 25,
2012. July 2011. url: http://www.arnnet.com.au/ ar-
ticle/395469/couchbase sqlite launch unified nosql
query language., pages 395–469.
Strauch, C. (2012). NoSQL Databases. Accessed Jan-
uary 25, 2012. Feb. 2011. url: http://www.christof-
strauch.de/nosqldbs.pdf.
Nathan H. (2010). Visual Guide to NoSQL Systems.
http://blog.nahurst.comlvisual-guide-to-NoSQL-
systems.
Ullman J.D., Garcia-Molina H., and Widom J.,. (2008).
Database Systems: The Complete Book. Prentice Hall.
Mikayel V.,. (2011). Picking the Right NoSQL Database
Tool. http://www.monitis.com/blog/2011/05/22/pick
ing-the-right-nosql-database-tool/.
Gharsellaoui H., Khalgui M., and Ben Ahmed S.,.
(2012). New Optimal Solutions for Real-Time Re-
configurable Periodic Asynchronous Operating Sys-
tem Tasks with Minimizations of Response Time.
IJSDA 1.4 (2012): 88-131. Web. 2 Feb. 2016.
doi:10.4018/ijsda.2012100105.
Gharsellaoui H., Khalgui M., and Ben Ahmed S.,. (2013).
An EDF-based Scheduling Algorithm for Real-time
Reconfigurable Sporadic Tasks. Proceedings of
the 8th International Joint Conference on Software
Technologies, ”ICSOFT 2013: 377-388, Reykjavik,
Iceland.
A Novel Approach for Real-time Extracting Data From NoSQL Embedded Data Bases
231