main purpose is to find an executor for a query and 
route the query to him as fast as possible. It makes a 
decision based on the map of a cluster. 
 
 
Figure 1: Multi-tenant database cluster architecture. 
It is important to note that a query routing server 
has a small choice of executors for each query. If the 
query implies data modification, there is no 
alternative than to route it to the master database of a 
tenant, because only their data modification is 
permitted. If the query is read-only, it can also be 
routed to a slave server, but in the general case there 
would be just one or two slaves for a given master, 
so even in this case the choice is very limited. 
The data distribution and load balancing server is 
the most important and complicated component of 
the system. Its main functions are: 
  initial distribution of tenants data among servers 
of a cluster during the system deployment or 
addition of new servers or tenants; 
  management of tenant data distribution, based on 
the collected statistics, including the creation of 
additional data copies and moving data to 
another server; 
  diagnosis of the system for the need of adding 
new computing nodes and storage devices; 
  managing the replication. 
This component of the system has the highest value 
since the performance of an application depends on 
the success of its work. 
3  ANALYSIS OF EXISTING 
APPLICATION 
Analysis of existing applications and their mode of 
operation is the first thing to study when designing 
an imitation model. In the context of the multi-tenant 
cluster theme the most interesting question is the 
characteristics of the query flow, since this 
component has the greatest impact on the results 
obtained during the modelling. As the multi-tenant 
cluster is a queuing system, the Poisson flow of 
events is a good basic model of a query flow. The 
key points to explore are: 
1.  intensity distribution of incoming query flows 
among clients; 
2.  presence or absence of dependency between an 
average time of query execution and 
characteristics of the client which this query 
belongs to; 
3.  characteristics of a customer base; 
4.  characteristics of customer base changes over 
time. 
Since questions 1 and 2 have a significant impact on 
the distribution of queries between servers thus 
making a decisive contribution to the assessment of 
the efficiency of load balancing across the cluster as 
a whole, they are very important. The answer to the 
fourth question will allow us to adequately simulate 
the dynamism inherent to all cloud systems and 
therefore offer an effective long-term data 
management strategy. 
There are many factors that possibly can affect 
parameters of a client query flow. At the initial stage 
of the study it was decided to take the size of the 
data that the client stores in the cloud as its key 
characteristic. The relationship between this 
parameter and the intensity of the query flow or an 
average time of query execution has been studied. 
The following assumptions seemed to be reasonable: 
1.  the most of client schemas are approximately of 
the same size, but there are also significant (but 
rare) variations in both directions; 
2.  client query flow intensity is directly dependent 
on the size of client data (the greater data the 
client has, the more often they are accessed); 
3.  the query execution time is directly dependent on 
the size of client data (the greater data the client 
has, the more data are accessed by the average 
query, thus its execution time increases); 
4.  client data size and activity smoothly change 
over time. 
The verification of the above assumptions has been 
performed on the basis of statistics and logs of the 
existing multi-tenant cloud application. This 
application is the online service that provides an 
electronic flow of documents and accounting. The 
diversity of offered services leads to the diversity of 
possible scenarios of interaction between a client 
and the application, thus making a complicated 
query flow. The application uses Postgres SQL 
server as its primary data storage. All management 
stuff is performed by a set of specialized services 
and routers. Currently, the cluster consists of about 
Third International Symposium on Business Modeling and Software Design
238