Implementation of Heuristic Principles for Index Recommendations
using Java in the MariaDB Database
Radityo P. Wibowo
1
, Risa Perdana Sujanawati
1
, Faizal Mahananto
1
and Prasasti Karunia Farista
Ananto
1
1
Information System Departement, Faculty of Information and Communication Technology, Institut Teknologi Sepuluh
Nopember (ITS)
Keywords:
Index, Database Performance, Java, MariaDB.
Abstract:
Almost all fields in life today use database and demanding the most optimal performance from a database.
Therefore, a database must have the ability to optimize performance that can be obtained one of them by
choosing an optimal index because the index plays an important role in determining the performance of queries
that are executed against the system. The right index can increase speed in data processing. According to
previous research, Index can improve database performance, namely the number of transactions per minute
which increases three times and the cost is reduced by about 50% to 90%. That way index selection is a
dilemma for database users. The output of the following research is a Java-based software that can be used in
the MariaDB Database to obtain recommended indexes based on heuristic principles that have been used by
previous researchers. The following software can evaluate database performance based on response time and
Queries per Second or QPS. The recommendations from the following research can improve the performance
of the Monitordom database by 43.65% and the Adventureworks 2012 database to 2.77%.
1 INTRODUCTION
The database is an integral part of our daily lives that
we use unconsciously (Connolly and Begg, 2005).
Database performance is often an important measure-
ment related to user satisfaction. The performance of
a database depends on the logical structure, physical
structure and design of the database. So that the better
the logical structure, physical structure and database
design, the better the performance of the database
(Osman and Knottenbelt, 2012). Almost all fields
in life today have used databases and demanded the
most optimal performance from a database. There-
fore, a database must have the ability to optimize per-
formance that can be obtained, one of which is by se-
lecting an optimal index.
Index is an orderly arrangement that is logically
used to sort rows in a table (Connolly and Begg,
2005). index plays an important role in determin-
ing the performance of queries executed against the
system. (Chaudhuri et al., 2004). The right index
can increase speed in data processing (Ameri et al.,
2015). Misunderstanding often occurs, the more in-
dexes the better, where it can increase database main-
tenance such as storage and does not provide optimal
database performance results (Winand, 2012). The
more indexes the more storage will be used, and this
will have a negative effect on the performance of the
database. The solution to overcome this is to choose
the optimal index as proven in a database performance
measured by transaction events per minute, can in-
crease up to three times and can reduce the cost of
a workload of 50% to 90% (Pedrozo and Vaz, 2014)
(Valentin et al., 2000), so the index selection with a
certain method needs to be done to get optimal per-
formance improvement.
There are several index selection methods accord-
ing to previous research, one of which is by using an
algorithm developed based on the heuristic method
and using the na
¨
ıve probability method in the algo-
rithm (S. Choenni, 1993) (Chaudhuri and Narasayya,
1997) (Chaudhuri et al., 2004). Almost all research
for index recommendations is to use the heuristic
method principle by developing an algorithm that is
oriented towards optimal results.
Some DBMS have provided a default feature
for index recommendations including SQL server
(Chaudhuri et al., 2004). Various studies on index
optimization with several other DBMS cases have
also been developed such as AISIO for PostgreSQL
92
Wibowo, R., Sujanawati, R., Mahananto, F. and Ananto, P.
Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database.
DOI: 10.5220/0009906200920097
In Proceedings of the International Conferences on Information System and Technology (CONRIST 2019), pages 92-97
ISBN: 978-989-758-453-4
Copyright
c
2020 by SCITEPRESS Science and Technology Publications, Lda. All r ights reserved
8.4 (Pedrozo and Vaz, 2014), other recommendation
index studies make MySQL 5.5 a trial and imple-
mentation. Even no exception for NoSQL databases
such as MongoDB has become the object of research
for index recommendations using mining algorithms
(Ameri et al., 2015). However, open source such
as Maria DB does not yet have a default feature in-
dex recommendation and there has not been much re-
search on recommendations with the MariaDB case
study. One way to overcome the problem of index se-
lection on MariaDB can be developed by implement-
ing heuristic principles using Java.
Java is an object-based programming language
(Hermawan, 2004). Java is also multiplatform which
can be interpreted for various operating systems so
that java will be used by various operating system
users. Java has a pattern matcher method such as Reg-
ular Expression which can filter the SQL you want to
optimize.
This research is intended to make a software as
a java-based index recommendation by implementing
a heuristic method that has an index selection algo-
rithm as previously done ((Chaudhuri et al., 2004)but
what differs in the following research is in the index
selection process using the General log or can be in
the form of query as input and provide an output in
the form of a recommended column name as an index
as a result of the selection process carried out in the
Maria DB database.
2 BACKGROUND
2.1 Index Selection
Logical and description, which is designed to meet
the information needs of an organization (Connolly
and Begg, 2005). Meanwhile, to control the database
users can use a DBMS that is the Database Manage-
ment System. (Connolly and Begg, 2005). According
to Conolly, Database has three design phases namely
Conceptual, Logical, and Physical. While the tools of
this research are in the scope of the Physical Design
phase where the output is intended to help the phys-
ical design database process. Performance measure-
ment from the database can be done by assessing sev-
eral aspects according to winand, as follows (Winand,
2012): Data Volume, System Load, and Response
Time and Throughput. In this study will use measure-
ments in the form of response time and throughput
which can be interpreted as Query per Second (QPS)
that calculate from average query execution in itera-
tion of 60 seconds.
Index is a data structure that organizes data
records in storage to optimize certain types of re-
trieval operations (Raghu and Gehrke, 2004). The
right index can increase speed in data processing
(Ameri et al., 2015). The basis of the algorithm in
the following research is the AISIO heuristic principle
(Pedrozo and Vaz, 2014) where in the selection of the
index has four main processes but there are some de-
tails that are still not very clear, such as when making
single and multi-level-index candidates so that they
can use some other research references such as the in-
dex selection process on SQL server (Chaudhuri and
Narasayya, 1997) is by doing permutations for col-
umn names, and obtaining heuristics that are in ac-
cordance with MariaDB and applied in the following
research:
1. Parsing and Filter Queries. To get the initial in-
dex candidates in the form of the names of the
columns contained in the input as did all research
on the previous index recommendations.
2. Retrieving previous indexes. The following is an
addition to this study because in previous stud-
ies the existing index was rarely considered, even
though the index could be an optimal index.
3. Identifying index Configuration. After getting an
index candidate in the form of a column name
the next is to make an index configuration or in
the form of a set of indexes based on a combina-
tion of column names so that a multi-level-index
will appear like the main heuristic and making this
configuration adheres to the details of making the
configuration set in research on sql server (Chaud-
huri and Narasayya, 1997) which found that the
optimal combination is with the value j = 2, the
combination is concerned with sequences so that
it is done by permutation.
4. Configuration cost evaluation Cost evaluation is
an evaluation activity with measurements in the
form of QPS or Response time which is per-
formed to find the best performance of all index
configurations prior to enumeration as in previous
studies (Chaudhuri and Narasayya, 1997).
5. Configuration enumeration. After getting a set of
index configurations with the best performance,
the enumeration was done as in the research for
sql server (Chaudhuri and Narasayya, 1997), but
the selection of the best enumeration candidates
was not based on the most fit storage but used
the selection factor by selecting the candidates
who most often appeared in the workload based
on other studies (Ameri, 2016). This selection
factor in other references is considered a selec-
tivity factor calculated using the na
¨
ıve probabil-
Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database
93
ity (Choenni, Blanken and Chang, 1993). To ob-
tain better correlation probabilities this research
enhances the selectivity factor using na
¨
ıve bayes
probability.
6. Generate recommended index query After get-
ting the selected index results, the recommenda-
tion will be generated to be made to create an in-
dex query that is ready to be implemented in the
database.
2.2 Data Sample
The data used in the following research is Monitor-
dom and Adventureworks 2012 which represent two
different database types, namely a database with a
simple table structure condition and an existing index
only as a foreign key, and other database conditions
that have a complex table structure with a number of
tables of 70 and more than one supporting index for
each.
3 METHODOLOGY
Adopting previous heuristic algorithm method of this
research is in Figure 1. Index Recommendation and
more detail explanation is conducted in this section
3.1 Settings
When making the initial encoding settings done with
java and using the library in the form of keneksi
owned by MariaDB with java which can be obtained
on the official MariaDB website openly (MariaDB,
2017). Next is to prepare the page for log and query
input because the database admin can only choose one
of these inputs.
The main purpose of this coding is to be able to
connect to the database and determine the parsing
steps to be carried out and starting from here is the
application of the heuristic algorithm (Chaudhuri and
Narasayya, 1997) from previous studies to the process
of selecting selectivity factors.
space
Figure 1: Index Recommendation
3.2 Parsing and Taking Index
Beforehand
The process of retrieving column and table names
when parsing is done using the JSqlParser library that
is obtained openly on the GitHub forum (Binus uni-
versity, 2016). Meanwhile, to display the previous
index, a query is called which calls the index from
informatics schema from the database.
3.3 Create Configuration Index and
Selecting the Performance of the
Best Configuration
ow to make configuration is to combine several col-
umn names that appear and the possibility of the index
that occurs per configuration can be calculated with
the following equation (1). Number of Configuration
(Raghu and Gehrke, 2004):
Σ
n
i
= 1
n!
(n! j!)
(1)
Where n is the number of columns that are index
candidates and j is the number of different columns
that you want to index. Due to the combination of
index configurations that can be very numerous, as
in the study of chauduri (Chaudhuri and Narasayya,
1997) that the most effective number of column com-
binations in an index is 2 so j = 2. Cost Evalua-
tion itself is calculated based on Response time and
throughput in the form of Queries per Second.
3.4 Configuration Enumeration
The enumeration configuration that is dividing the in-
dex candidates from the index configuration into units
CONRIST 2019 - International Conferences on Information System and Technology
94
and selecting candidate units using the na
¨
ıve bayes
method as in equation (2). Selectivity factor:
S f (I j|S) =
P(S|I j)P(I j)
P(S)
(2)
Where S is the number of query statements con-
tained in the workload and Ij is the probability of
the index candidate appearing in the workload against
other index candidates. Index candidates with the
highest Sf will be the index to be recommended
(Ameri, 2016). But beforehand it was determined in
advance the type of cluster or non-clustered it.
3.5 Create Index Recommendation
Query
Last is to display the index recommendation in the
form of a query that has been made based on the re-
sults of the index recommendation. In addition to
displaying the recommended index, this section will
show the results of increasing the percentage of per-
formance by calculating in equation (3). QPS Eval-
uation and equation (4). Response time Evaluation.
The two assessments of improvement are different be-
cause for QPS the more the better and the less the Re-
sponse time the better.
QPS =
QPS(rec) QP(initial)
QPS(initial)x100%
(3)
Responsetime(RT ) =
RT (rec) RT (initial)
RT (initial)x100%
(4)
4 RESULT AND DISCUSSION
4.1 Implementation Environment
Implementation and result of this research depended
on the environment since performance was influ-
enced by hardware. This research was developed
and implemented in the environment type Sony
SVF14319SGB, Processor Intel Core i5, RAM 8GB,
Hard Disk Drive 1000G. And software environment
in this research is MariaDB 10.1.19 as Database and
NetBeans IDE 8.2 for development process of appli-
cation.
4.2 Monitordom Result
The index recommendation system uses response
time, each query executed is iterated 50 times to get
the stability of the average response time. Stability is
tested by testing several times with the same environ-
mental conditions. With 50 iterations, the system can
provide consistent recommendations. Result of re-
sponse time increase in average up to 30.84%. Ressult
from monitordom in Figure 2 shows using this rec-
ommendation index, the responsetime decrease from
average 0.020 to 0.015 second.
Queries in Monitordom are executed in only one
table using similar pattern that is using type ‘SE-
LECT’ and operator like ‘GROUP BY’, ‘ORDER
BY’, ‘DISTINC’, and ‘COUNT’. Result of Query
Persecond (QPS) in this database sample give signifi-
cant number increase after recommendation as shows
in Figure 3.
Figure 2: Responsetime result from Monitordom
Figure 3: QPS Result Monitordom.
4.3 Adventureworks2013 Result
Testing conducted with same method as database
sample before in Adventureworks2012 resulting in-
crease on average response time 1.26% and detail
each test shows in Figure 2. Tests also carried out us-
ing QPS measurements performance increase on av-
erage is 9% in queries that executed in Adventure-
Works2012 that detailed in Figure 5. The queries is
quite different from Monitordom because it has more
complex operator and executed in seven tables that
representing sales and marketing activities. The in-
significant increase even decrease performance is due
to AdventureWorks is a database that has an index that
is in accordance with their needs. This is also evi-
denced in the AdventureWorks 2012 Data Dictionary
documentation (sourceforge, 2009) regarding which
indexes are already owned. Performance also found
decreases in one of the response time test because
the new index created does not actually help search
Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database
95
but burdens database work because the Adventure-
works2012 database already has an index with col-
umn names as recommended by the following soft-
ware.
Figure 4: Responsetime result from AdventureWorks
Figure 5: QPS Result AdventureWorks.
5 CONCLUSIONS
This section explains what the conclusion that was
made from this research and author’s suggestion for
further research. After conducting the implementa-
tion and testing of the research ”Implementation of
Heuristic Principles for Index Recommendations us-
ing Java on MariaDB” using the exact environment
explained, it can be concluded that performance of a
database according to response time using the index
of recommendations from the following studies can
increase by around 1.26% in database that already in-
dexed before and up to 30.84% in database that has
not indexed before. according to QPS can increase
by around 9% in database that already indexed before
and up to four times faster in database that has not
indexed before.
From the two sample databases used for imple-
mentation, it can be concluded that the selection
of indexes with the following heuristic principles
works better on databases that do not yet have in-
dexes supporting operational activities and query type
‘SELECT’ that used operator like ‘’ORDER BY’,
‘GROUB BY’, ‘DISTINC’ and ‘COUNT’.
This study also found that the measurement of the
response value of a query that was executed had the
best value for 50 times. The results of these values
are taken from the results of the experiment for sev-
eral times to get consistent recommendations, so that
with consistent recommendations it can be said that
the average response time has been stable.
Author’s suggestions that can be done for further
research are develop an application by adding features
such as combining QPS and response time assess-
ments for example by using linear regression equa-
tions or other metrics assessments. Pay attention to
the database structure such as the storage engine used
in order to determine recommendations in more detail
such as the type of index cluster or non-cluster.
Implement other algorithms in the selection of in-
dex recommendations such as Genetic algorithm to
get better performance. This research could be ref-
erence to perform index selection algorithms imple-
mentation in other programming languages or other
database objects.
Algorithm improvement also might improve by
applying the configuration of the table name not only
configuring column name in the step of creating can-
didate of index recommendations.
REFERENCES
Ameri, P. (2016). On a self-tuning index recommendation
approach for databases. In 2016 IEEE 32nd Inter-
national Conference on Data Engineering Workshops
(ICDEW), pages 201–205. IEEE.
Ameri, P., Meyer, J., and Streit, A. (2015). On a new ap-
proach to the index selection problem using mining
algorithms. In 2015 IEEE International Conference
on Big Data (Big Data), pages 2801–2810. IEEE.
Binus university, W. (2016). Pengertian methode, class dan
objek dalam oop.
Chaudhuri, S., Datar, M., and Narasayya, V. (2004). Index
selection for databases: A hardness study and a princi-
pled heuristic solution. IEEE Transactions on Knowl-
edge and Data Engineering, 16(11):1313–1323.
Chaudhuri, S. and Narasayya, V. R. (1997). An effi-
cient, cost-driven index selection tool for microsoft sql
server. In VLDB, volume 97, pages 146–155. Citeseer.
Connolly, T. M. and Begg, C. E. (2005). Database systems:
a practical approach to design, implementation, and
management. Pearson Education.
Hermawan, B. (2004). Menguasai java 2 & object oriented
programming. Yogyakarta: Penerbit Andi.
MariaDB (2017). Mariadb connector/j 2.0.3 stable.
CONRIST 2019 - International Conferences on Information System and Technology
96
Osman, R. and Knottenbelt, W. J. (2012). Database sys-
tem performance evaluation models: A survey. Per-
formance evaluation, 69(10):471–493.
Pedrozo, W. G. and Vaz, M. S. M. G. (2014). A tool for au-
tomatic index selection in database management sys-
tems. In 2014 International Symposium on Computer,
Consumer and Control, pages 1061–1064. IEEE.
Raghu, R. and Gehrke, J. (2004). Sistem manajemen
database edisi 3.
S. Choenni, H. B. o. T. C. (1993). Index selection relational
database.
sourceforge (2009). Adventureworks database for mysql.
Valentin, G., Zuliani, M., Zilio, D. C., Lohman, G., and
Skelley, A. (2000). Db2 advisor: An optimizer smart
enough to recommend its own indexes. In Proceed-
ings of 16th International Conference on Data Engi-
neering (Cat. No. 00CB37073), pages 101–110. IEEE.
Winand, M. (2012). SQL performance explained: ev-
erything developers need to know about SQL perfor-
mance;[covers all major SQL databases]. M. Winand.
Implementation of Heuristic Principles for Index Recommendations using Java in the MariaDB Database
97