Extending the Hybridization of Metaheuristics with Data Mining
to a Broader Domain
Marcos Guerine, Isabel Rosseti and Alexandre Plastino
Institute of Computing, Federal Fluminense University, Niterói, 24210-240 Rio de Janeiro, Brazil
Keywords:
Hybrid Metaheuristic, Data Mining, GRASP, 1-PDTSP.
Abstract:
The incorporation of data mining techniques into metaheuristics has been efficiently adopted to solve several
optimization problems. Nevertheless, we observe in the literature that this hybridization has been limited to
problems in which the solutions are characterized by sets of (unordered) elements. In this work, we develop a
hybrid data mining metaheuristic to solve a problem for which solutions are defined by sequences of elements.
This way, we extend the domain of combinatorial optimization problems which can benefit from the com-
bination of data mining and metaheuristic. Computational experiments showed that the proposed approach
improves the pure algorithm both in the average quality of the solution and in execution time.
1 INTRODUCTION
Over the last decades, strategies based on metaheuris-
tics have been proposed to solve a large set of hard op-
timization problems, achieving sub-optimal solutions
in an acceptable computational time. Each meta-
heuristic is supported by a different paradigm and of-
fers mechanisms to escape from local optimal solu-
tions (Gendreau and Potvin, 2010).
A trend in metaheuristic research is to combine
components of classical metaheuristics, providing ro-
bust hybrid heuristics (Talbi, 2002). Moreover, con-
cepts and processes from other research areas may
also be used to improve metaheuristics. An example
of this latter case is a hybrid version of the GRASP
metaheuristic which incorporates a data mining (DM)
process, called Data Mining GRASP (DM-GRASP
for short) (Ribeiro et al., 2004).
GRASP (Feo and Resende, 1995), which stands
for Greedy Randomized Adaptive Search Procedures,
is an iterative metaheuristic that has been success-
fully applied to a large class of optimization prob-
lems (Festa and Resende, 2009a; Festa and Resende,
2009b). Each GRASP iteration is divided into two
phases. First, a feasible solution is built into a con-
struction phase. Then, in a second phase, its neigh-
bourhood is explored by a local search procedure in
order to find a better solution. The best solution found
over all iterations is taken as result.
In its original form, GRASP has independent it-
erations that do not use information about solutions
from previous iterations. Because of this, GRASP is
considered memoryless. In order to overcome this
weakness, some ideas on keeping track of recurrent
good sub-optimal solutions and fixing variables have
been successfully investigated, e.g., adaptive mem-
ory (Fleurent and Glover, 1999), vocabulary building
(Berger et al., 2000) and path relinking (Resende and
Ribeiro, 2005).
Based on the hypothesis that patterns found in
good quality solutions may be used to guide the
exploration of the solution space, the hybrid DM-
GRASP metaheuristic was proposed (Ribeiro et al.,
2004; Ribeiro et al., 2006). Data mining refers to the
automatic extraction of knowledge from datasets, ex-
pressed in terms of patterns or rules (Han and Kam-
ber, 2011). Some techniques that extract these pat-
terns or rules have been used to improve state-of-the-
art metaheuristics for different optimization problems
(Plastino et al., 2011; Santos et al., 2008).
The main idea of this hybridization is to mine a
subset of elements that frequently occur in an elite
set of high quality solutions and use these patterns
to guide the search in the solution space. This ap-
proach was first introduced by (Ribeiro et al., 2004;
Ribeiro et al., 2006), combining a frequent itemset
mining technique with GRASP metaheuristic, and ap-
plying it to the set packing problem, achieving very
promising results both in terms of solution quality
and computational time. This framework was also
evaluated in other problems, such as the maximum
diversity problem (Santos et al., 2005), the efficient
395
Guerine M., Rosseti I. and Plastino A..
Extending the Hybridization of Metaheuristics with Data Mining to a Broader Domain.
DOI: 10.5220/0004891303950406
In Proceedings of the 16th International Conference on Enterprise Information Systems (ICEIS-2014), pages 395-406
ISBN: 978-989-758-027-7
Copyright
c
2014 SCITEPRESS (Science and Technology Publications, Lda.)
server replication for reliable multicast problem (San-
tos et al., 2006b), the p-median problem (Plastino
et al., 2009; Plastino et al., 2011) and recently to the
2-path network design problem (2PNDP) (Barbalho
et al., 2013).
All these applications of DM-GRASP have a com-
mon property: their solutions are represented by sub-
sets of elements, without setting any ordering. In the
p-median problem, for example, the order of the cho-
sen facilities does not change the solution. However,
in some optimization problems the order is essential,
which is the case of the one-commodity pickup-and-
delivery travelling salesman problem (1-PDTSP).
In this work, we propose the incorporation of
a data mining technique into an existing heuris-
tic for the 1-PDTSP, based on both GRASP meta-
heuristic and the local improvement Variable Neigh-
borhood Descent (VND), developed by (Hernández-
Pérez et al., 2009). We intend to show, as the main
contribution of this work, that the hybridization of
metaheuristics with data mining is successfully ap-
plied not only to problems in which solutions are rep-
resented by a subset of elements, but also to problems
in which solutions are represented by a sequence of
elements, considering the order. Extensive computa-
tional analysis shows that the addition of a data min-
ing module into the original heuristic outperforms the
pure algorithm both in terms of the solution quality
and computational efforts.
The remainder of this work is organized as fol-
lows: Section 2 presents the 1-PDTSP and some re-
lated work. In Section 3, the GRASP/VND heuris-
tic proposed in (Hernández-Pérez et al., 2009) for the
1-PDTSP is revised. Section 4 describes how the hy-
brid data mining technique is adapted to consider the
order of custumers and how this technique is inserted
into the original heuristic. In Section 5, computa-
tional results obtained by this strategy and the original
GRASP/VND are compared. Finally, Section 6 pro-
vides the conclusions and some future work is pointed
out.
2 THE OPTIMIZATION
PROBLEM
Introduced by (Hernández-Pérez and Salazar-
González, 2004a), the 1-PDTSP consists of a
generalization of the well-known travelling sales-
man problem (TSP) by associating to each city (or
customer) a demand of a given product. As in the
TSP, each customer must be visited exactly once
by a capacitated vehicle, minimizing the distance
route for the vehicle and satisfying the customers’
requirements without violating vehicle capacity. The
order of this path over the customers is important
to both the quality and viability of the route. The
exchanging of two or more clients in a path, for
example, may affect all other clients and the entire
solution might become infeasible.
When the vehicle capacity is extremely large, the
1-PDTSP coincides with the TSP and, hence, is N P-
Hard. Moreover, the verification of the existence of a
feasible solution to a given instance is N P -Complete.
On the other hand, check if a given solution is feasible
is a linear task (Hernández-Pérez, 2004).
(Hernández-Pérez and Salazar-González, 2004a)
presented an integer linear programming formulation
for the 1-PDTSP. Let G = (V,A) be a complete
graph, where V = {1, . ..,n} is the vertex set and
A = {(i, j) : i, j V } the arc set between all vertices.
Each vertex i V is associated to an integer demand
q
i
, with q
i
< 0 for a delivery customer and q
i
> 0 for
pickup customers. The travel distance c
i j
from i to
j is given for all pairs of locations. For each sub-
set S V , let δ
+
(S) = {(i, j) A : i S, j / S} and
δ
(S) = {(i, j) A : i / S, j S} be, respectively, the
set of arcs going out from and in to S.
Vehicle capacity is represented by Q and q
1
is de-
mand of the depot. The latter can be considered as
a customer that receives or provides an amount of
goods to ensure that equation q
1
=
n
i=2
q
i
is sat-
isfied.
Equation 1 guarantees the overall flow conserva-
tion for 1-PDTSP solutions.
iV:q
i
>0
q
i
+
iV:q
i
<0
q
i
= 0 (1)
Let x
i j
be a binary decision variable that indicates
whether the arc (i, j) is (x
i j
= 1) or not (x
i j
= 0) in the
solution and f
i j
a continuous variable indicating the
flow through arc (i, j) A. The mathematical formu-
lation for 1-PDTSP is given below.
min
(i, j)A
c
i j
x
i j
(2)
subject to:
(i, j)δ
+
({i})
x
i j
= 1, i V (3)
(i, j)δ
({i})
x
i j
= 1, i V (4)
(i, j)δ
+
(S)
x
i j
1, S V (5)
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
396
(i, j)δ
+
({i})
f
i j
(i, j)δ
({i})
f
i j
= q
i
, i V (6)
0 f
i j
Qx
i j
, (i, j) A (7)
The objective function presented in Equation 2
aims at minimizing the total sum of costs (travel dis-
tance) in the solution. Constraints (3) and (4) ensure
that each customer must be visited once. Equation (5)
prohibits subcycles and disconnected routes by ensur-
ing that for each client subset there will be at least one
arc going out from it. Equation (6) guarantees that
each customer is attended in relation to its demand by
assuming that this value is exactly the difference be-
tween the flows going in and out from this customer.
Finally, constraint (7) defines the domain of flow vari-
ables, ranging from zero to the capacity of the vehicle.
The 1-PDTSP has some real practical applications
in the repositioning scenario (Hernández-Pérez and
Salazar-González, 2004a). For example, this problem
arises from a given store that needs to restock some of
its products in its whole set of stores. Let’s suppose
that some stores have an amount of products left in
stock while in other stores the same product is lack-
ing. Then one could move products from the former
store to the latter so that both are attended.
There are a few methods to solve the 1-PDTSP
in the literature. (Hernández-Pérez and Salazar-
González, 2004a) described an exact branch-and-
cut algorithm to solve instances with up to 60 cus-
tomers. The same authors proposed two heuristics
(Hernández-Pérez and Salazar-González, 2004b) to
deal with bigger instances. The first one consists of
a local search to provide a primal upper bound to the
previous branch-and-cut algorithm, and the second is
the same branch-and-cut considering only a subset of
variables, associated to promising edges, and hence
reducing the search space.
A new version of the branch-and-cut method
was developed later (Hernández-Pérez and Salazar-
González, 2007) with a new set of restrictions for the
problem, based on some valid inequalities of the ca-
pacitated vehicle routing problem. (Martinovic et al.,
2008) presented a Simulated Annealing, modified and
iterative, that uses a greedy randomized construction.
(Hernández-Pérez et al., 2009) proposed a hybrid
heuristic based on GRASP and VND. The initial so-
lution is built iteratively, selecting a new client over
a restricted candidate list to be inserted at the end of
the path. The single local search phase of the GRASP
is replaced by a VND procedure that contains a mod-
ified version of 2-opt and 3-opt moves. In Section 3,
we revise this approach in detail.
(Zhao et al., 2009) presented a Genetic Algorithm
composed by a new constructive heuristic to gener-
ate the initial population and a local search proce-
dure to speed up the convergence of the search. (Paes
et al., 2010) proposed a multi-start algorithm based
on GRASP (as constructive phase), ILS (as main
method) and VND procedure with a random order of
neighbourhoods (as local search).
Recently, (Mladenovi
´
c et al., 2012) developed
an algorithm based on the Variable Neighbourhood
Search (VNS) that uses a new and efficient way to
verify the viability of solutions. This check uses a
binary indexed tree that stores specific data on the so-
lutions to reduce the computational effort on the local
search phase.
In the next section we review the hybrid
GRASP/VND heuristic proposed by (Hernández-
Pérez et al., 2009). This heuristic was chosen as the
base of the proposed data mining hybrid strategy be-
cause it is a competitive heuristic for the 1-PDTSP
and because the GRASP has been successfully com-
bined with data mining procedures (Ribeiro et al.,
2004; Ribeiro et al., 2006; Santos et al., 2005; San-
tos et al., 2006b; Barbalho et al., 2013).
3 GRASP/VND HEURISTIC FOR
THE 1-PDTSP
The hybrid heuristic presented in (Hernández-Pérez
et al., 2009) has the same structure of a classic
GRASP metaheuristic, as shown in the Algorithm 1.
This strategy consists of a main loop, where the ter-
mination criterion is the number of iterations. Each
iteration of this loop has a construction phase (line 4)
and a local search phase (line 5). At the end, after
all iterations, a post-optimization phase is run, using
another local search procedure (line 10) trying to im-
prove the best overall solution found.
Algorithm 1: Hybrid GRASP/VND for 1-PDTSP.
1: GRASP/VND ( maxIter )
2: f (s
) ;
3: for iter = 1 until maxIter do
4: s ConstructionGRASP();
5: s V ND
1
(s);
6: if s is feasible and f (s) < f (s
) then
7: s
s ;
8: end if;
9: end for;
10: s
V ND
2
(s
);
11: return s
;
ExtendingtheHybridizationofMetaheuristicswithDataMiningtoaBroaderDomain
397
In the construction phase, one client is selected at
random to be the depot. After that, clients are inserted
iteratively at the end of the path as below. In each iter-
ation, clients that can be feasibly inserted into the so-
lution under construction are sorted by their distance
to the last customer in the solution, and only the first
l elements will be part of the restricted candidate list
(RCL). If there is no client that can be feasibly added
to the path, the RCL is built by the first l closest cus-
tomers to the one at the end of the current solution.
Finally, one client of the RCL is chosen at random
and inserted at the end of the path. The construction
ends when all clients are in the solution.
The local search phase, named V ND
1
, is based
on the variable neighbourhood descent procedure
(Mladenovi
´
c and Hansen, 1997), which consists of
applying multiple neighbourhood structures to a given
solution in a predefined order, and whenever the cur-
rent solution is improved, the procedure returns to the
first neighbourhood structure. The VND
1
is made by
two classic moves, 2-opt and 3-opt, modified to ac-
cept infeasible solutions as a start point. These moves
are applied in the following order. First, the 2-opt
heuristic, which removes two non-adjacent edges and
inserts them in another way to build a new route. And
next, the 3-opt, which is almost the same as the previ-
ous one, but handling three edges.
After the end of the main loop, the post-
optimization phase is performed with another VND
procedure, named V ND
2
, which is applied to the best
solution found so far. The V ND
2
consists of two
other neighbourhood structures based on the Rein-
sertion move, also well-known for TSP. This move
is divided into two smaller structures, applied in this
order: first, the Reinsertion Forward, that removes a
client and reinserts it in a position after its original
position, and secondly, Reinsertion Backward, simi-
lar to the first one but the removed client is reinserted
in a previous position.
In the next section, we present the proposed
data mining hybrid heuristic for the 1-PDTSP,
called DM-GRASP/VND, which is a hybrid ver-
sion of the GRASP/VND metaheuristic presented in
(Hernández-Pérez et al., 2009) with a data mining
technique.
4 THE HYBRID DATA MINING
PROPOSAL: DM-GRASP/VND
The data mining area offers several techniques to ex-
tract patterns and rules from databases. Among them,
Frequent Itemset Mining (FIM) techniques extracts
subsets of items that appear frequently in a dataset
of transactions, where each transaction is a subset of
elements from the application domain.
In this work, the dataset is a set of sub-optimal
solutions, also called an elite set. Each transaction
corresponds to a solution of the 1-PDTSP. The main
idea is to use a FIM technique to mine patterns from
the elite set and use them to guide the construction of
new solutions.
The proposed hybrid DM-GRASP/VND heuristic
is divided into two main phases. The first one is called
the elite set generation phase and consists of execut-
ing n pure GRASP/VND iterations which generate a
set of different solutions, storing the d best solutions
in the elite set. For this reason, the elite set can be
viewed as a long term memory added to the original
GRASP/VND heuristic.
Having built the elite set, an intermediate step is
executed to apply a FIM technique and obtain the pat-
terns. At this point, it is important to remember that
a solution of the 1-PDTSP is a sequence of elements
and their order is important, which makes the use of
a FIM technique not directly applicable. To allow the
use of a FIM technique, we propose to transform the
solutions of the elite set in a way that each solution
is represented by a set of elements, but without losing
its sequence.
For each pair of consecutive clients (i and j) from
a solution, an arc (i, j) is generated, mapping each so-
lution to a set of arcs. After that, we can apply a FIM
technique to mine patterns over the elite set, selecting
the l p largest patterns. Each pattern mined consists
of a group of arcs that appeared together in at least
sup
min
solutions of the elite set, a parameter known as
minimum support. The quantity and size of the mined
patterns may vary according to this parameter.
Inside a pattern, an arc (i, j) has an origin client
i and a destination client j. Moreover, one can find
that, in the same pattern, two or more arcs may be
consecutive and can be easily connected to set up a
bigger route segment, named path segment (PS). This
way, each pattern is made of one or more PS.
The second phase of the DM-GRASP/VND pro-
posed consists of executing other n iterations, replac-
ing the original construction phase by an adapted con-
struction which uses the patterns extracted in the first
phase to build new solutions.
Algorithm 2 presents the adapted construction. In
each construction, one pattern p from the l p patterns
is selected in a round-robin way (line 2). After that,
one PS from p is chosen, ps (line 3). In the first use
of p, we chose the largest PS, in the second, the next
largest one, and so on.
Once ps is chosen, the construction is guided as
follows. We identify all the solutions from the elite set
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
398
Algorithm 2: Adapted construction using mined pat-
terns.
1: AdaptedConstruction( listOfPatterns, ES)
2: p SelectPattern(listOfPatterns);
3: ps SelectPS(p);
4: s
cho
SelectSolutionWithPS(ps,ES);
5: s ExtractSubroute(ps,s
cho
);
6: s ConstructionGRASP(s);
7: return s;
that contains ps as a subroute and choose one at ran-
dom, s
cho
(line 4). The solution s to be built initially
receives a part of s
cho
which holds all clients from the
depot until the end of ps on s
cho
(line 5). From now
on, a distinct route is built, inserting unvisited clients
at the end of the solution, applying the same idea of
the original constructive heuristic (line 6).
Algorithm 3 presents the hybrid heuristic with
data mining. The main modification regarding Al-
gorithm 1 is represented by lines 9, 11, and 13. It
is possible to see that this algorithm consists of two
loops that are almost identical to the main loop of
Algorithm 1, using half the number of iterations in
each one. The elite set is built in the first loop (line
9), the data mining process is called between those
loops (line 11) and the new construction heuristic is
performed in the second phase (line 13).
Algorithm 3: Hybrid heuristic with data mining.
1: DM-GRASP/VND ( maxIter, sup
min
, d, l p)
2: f (s
) ; ES
/
0;
3: for iter = 1 until maxIter/2 do
4: s ConstructionGRASP();
5: s V ND
1
(s);
6: if s is feasible and f (s) < f (s
) then
7: s
s ;
8: end if;
9: UpdateEliteSet(s,ES,d);
10: end for;
11: listOfPatterns Mine(ES,sup
min
,l p);
12: for iter = 1 until maxIter/2 do
13: s AdaptedConstruction(listOfPatterns,ES);
14: s V ND
1
(s);
15: if s is feasible and f (s) < f (s
) then
16: s
s ;
17: end if;
18: end for;
19: s
V ND
2
(s
);
20: return s
;
In the next section we present the computational
experiments conducted with both GRASP/VND and
DM-GRASP/VND strategies.
5 COMPUTATIONAL RESULTS
In this section, the computational results obtained for
GRASP/VND and the proposed DM-GRASP/VND
are presented and compared. Since the GRASP/VND
original implementation was not available, we had to
develop it based on (Hernández-Pérez et al., 2009).
Both heuristics were coded in C++, using the g++
version 4.6.3 compiler and all tests were carried out
on a personal computer with Intel
R
Core
TM
i5 CPU
650 @ 3.20GHz with 4GB RAM and running Linux
Fedora version 15. The parallel capability of the pro-
cessor was not used.
In order to evaluate the algorithms, we used a set
of instance problems for the 1-PDTSP provided by
(Hernández-Pérez et al., 2009). This set contains a
few randomly generated instances from 100 to 500
clients, using a vehicle capacity equal to 10. These
instances are the biggest in terms of number of clients
and the most difficult in terms of vehicle capacity. The
maximum number of iterations (maxIter), the elite
set size (d), the minimum support value (sup
min
) and
the number of patterns selected (l p) are, respectively,
200, 10, 20% and 10. Except for the number of the
iterations, which was chosen according to the original
parameter reported in (Hernández-Pérez et al., 2009),
the others were defined based on the settings used in
(Plastino et al., 2011).
The remainder of this section is organized thus:
first, we compare the computational results obtained
by both strategies and then check whether the dif-
ferences of mean values reached by the evaluated
algorithms are statistically significant. Finally, we
present some additional analysis on the computational
experiments to illustrate the behaviour of the DM-
GRASP/VND after the mining step.
5.1 Comparing GRASP/VND and
DM-GRASP/VND
In this section, we report the computational results ob-
tained for the GRASP/VND and DM-GRASP/VND
approaches, comparing the best solutions reached, the
average cost solution values obtained, and the aver-
age running times required by each method. Both
GRASP/VND and DM-GRASP/VND were run 10
times with a different random seed in each run.
In Table 1, the results related to the quality of the
solutions obtained are shown. The first column shows
the instance identifier. The second and fifth columns
have the best cost values obtained by the original
and the DM-GRASP/VND approaches, respectively.
The third and seventh columns present the average
cost values obtained by them. The fourth and ninth
ExtendingtheHybridizationofMetaheuristicswithDataMiningtoaBroaderDomain
399
Table 1: Computational results for GRASP/VND and DM-GRASP/VND.
Instances
GRASP/VND DM-GRASP/VND
Best Average Average Best Diff % Average Diff % Average Diff %
Solution Solution Time (s) Solution Best Solution Average Time (s) Time
n100q10A 12369 12514.4 4.01 11915 -3.67 12375.5 -1.11 2.97 -25.98
n100q10B 13668 13885.7 3.86 13596 -0.53 13823.1 -0.45 2.77 -28.07
n100q10C 14619 14810.8 4.01 14310 -2.11 14603.0 -1.40 2.85 -28.92
n100q10D 14806 14993.4 4.15 14666 -0.95 14772.7 -1.47 3.12 -24.76
n100q10E 12594 12819.7 3.94 12018 -4.57 12587.1 -1.81 2.63 -33.27
n100q10F 12082 12297.2 3.57 11891 -1.58 12125.1 -1.40 2.67 -25.24
n100q10G 12344 12623.4 3.84 12176 -1.36 12481.5 -1.12 2.71 -29.56
n100q10H 13405 13590.7 3.72 13362 -0.32 13459.8 -0.96 2.68 -27.93
n100q10I 14512 14715.9 3.74 14514 0.01 14698.0 -0.12 2.60 -30.58
n100q10J 13700 13992.0 4.00 13713 0.09 13905.9 -0.62 2.99 -25.28
Group Average -1.50 -1.05 -27.96
n200q10A 18707 19053.1 34.34 18319 -2.07 18725.7 -1.72 24.00 -30.10
n200q10B 19046 19406.7 33.27 18689 -1.87 19273.4 -0.69 21.90 -34.18
n200q10C 17445 17740.2 37.19 17430 -0.09 17630.7 -0.62 27.45 -26.17
n200q10D 22428 22772.4 33.65 22047 -1.70 22524.4 -1.09 22.69 -32.58
n200q10E 20409 20738.2 36.77 20323 -0.42 20639.7 -0.47 24.63 -33.02
n200q10F 22483 22709.4 37.10 22295 -0.84 22615.9 -0.41 27.22 -26.63
n200q10G 18585 18855.3 34.72 18147 -2.36 18735.5 -0.64 21.81 -37.16
n200q10H 22165 22588.2 39.85 21907 -1.16 22348.4 -1.06 26.65 -33.12
n200q10I 19533 19859.3 34.22 19362 -0.88 19504.1 -1.79 22.76 -33.47
n200q10J 20179 20471.6 32.80 20011 -0.83 20244.1 -1.11 23.15 -29.42
Group Average -1.22 -0.96 -31.59
n300q10A 24942 25148.1 136.01 24392 -2.21 24738.4 -1.63 92.17 -32.23
n300q10B 24413 24802.3 133.15 24347 -0.27 24595.0 -0.84 89.63 -32.68
n300q10C 23212 23418.2 142.24 22838 -1.61 23170.2 -1.06 92.90 -34.69
n300q10D 27080 27614.3 147.46 26325 -2.79 27113.1 -1.82 99.18 -32.74
n300q10E 28643 28914.2 147.16 27980 -2.31 28425.1 -1.69 99.90 -32.11
n300q10F 25843 26213.9 143.07 25592 -0.97 25895.3 -1.22 108.49 -24.17
n300q10G 25631 25814.5 144.66 25105 -2.05 25413.8 -1.55 108.70 -24.86
n300q10H 23590 23795.3 138.41 23143 -1.89 23512.1 -1.19 93.02 -32.79
n300q10I 26018 26358.4 136.85 25444 -2.21 25965.2 -1.49 94.40 -31.02
n300q10J 24050 24466.0 140.90 23806 -1.01 24139.1 -1.34 98.85 -29.84
Group Average -1.73 -1.38 -30.71
n400q10A 33087 33266.8 393.04 32170 -2.77 32620.1 -1.94 282.19 -28.20
n400q10B 26677 26797.2 347.47 26107 -2.14 26395.1 -1.50 246.68 -29.01
n400q10C 30394 30682.2 399.14 29838 -1.83 30235.7 -1.46 269.07 -32.59
n400q10D 25814 26267.5 400.79 25291 -2.03 25750.1 -1.97 264.62 -33.98
n400q10E 26795 27313.9 355.53 26393 -1.50 26824.5 -1.79 260.04 -26.86
n400q10F 28107 28910.0 361.85 28188 0.29 28539.2 -1.28 256.23 -29.19
n400q10G 25697 26220.6 398.57 25113 -2.27 25492.7 -2.78 279.50 -29.88
n400q10H 27158 27773.1 393.40 26813 -1.27 27238.1 -1.93 278.53 -29.20
n400q10I 30115 30898.7 387.77 30208 0.31 30549.5 -1.13 263.87 -31.95
n400q10J 27655 28059.0 383.00 26921 -2.65 27536.1 -1.86 268.10 -30.00
Group Average -1.59 -1.76 -30.08
n500q10A 29874 30661.4 825.94 29558 -1.06 30246.4 -1.35 579.69 -29.82
n500q10B 28559 29042.9 846.08 28253 -1.07 28583.9 -1.58 573.82 -32.18
n500q10C 32360 33162.5 867.24 32065 -0.91 32569.1 -1.79 577.93 -33.36
n500q10D 32750 33074.3 863.71 32117 -1.93 32484.5 -1.78 593.99 -31.23
n500q10E 32298 32667.1 881.04 31704 -1.84 32263.6 -1.24 598.28 -32.09
n500q10F 30856 31354.6 813.26 30432 -1.37 30991.2 -1.16 511.36 -37.12
n500q10G 28879 29123.4 885.22 28357 -1.81 28642.5 -1.65 597.69 -32.48
n500q10H 38579 39023.5 849.81 37926 -1.69 38350.5 -1.72 596.57 -29.80
n500q10I 32718 33217.7 858.72 32330 -1.19 32624.5 -1.79 547.15 -36.28
n500q10J 32407 33131.7 873.12 32530 0.38 32720.7 -1.24 576.76 -33.94
Group Average -1.25 -1.53 -32.83
Overall Average -1.46 -1.34 -30.63
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
400
columns show the average execution time (in sec-
onds) for the GRASP/VND and DM-GRASP/VND.
The sixth, eighth and tenth columns report the per-
centual difference (Diff %) of the DM-GRASP/VND
over the GRASP/VND for each criteria, as evaluated
by Equation 8.
Di f f % =
DM-GRASP/VND GRASP/VND
GRASP/VND
(8)
The intermediate rows show the partial averages
of the percentual differences for each group of the
same size instances and the last row of the table
presents the overall average of the percentual differ-
ences. The smallest values considering the best solu-
tion, the average solution and the average time, i.e.,
the best results among them, are bold-faced.
These results show that the proposed DM-
GRASP/VND method produced better solutions in
less computational time for almost all instances. Only
in five out of 50 instances, the DM-GRASP/VND did
not outperform GRASP/VND in terms of best solu-
tion found, giving an overall percentual difference of
1.46%, and being on average 30.63% faster than the
original method. In terms of average quality of the
solution, the average percentual difference between
these heuristics was of 1.34%.
There are two main reasons for the faster be-
haviour of DM-GRASP/VND. First, the adapted con-
struction is faster than the original one because it uses
a subroute of an existing high-quality solution. Sec-
ondly, the quality of the solutions constructed after the
data mining process is usually better than that of the
original construction and, therefore, makes the local
search effort considerably smaller.
5.2 Analysis of Statistical Significance
In order to verify whether or not the differences
of mean values obtained by the evaluated strategies
shown in Table 1 are statistically significant, we em-
ployed the Non-parametric Friedman test technique
(Siegel and Castellan Jr, 1988), with a p-value equal
to 0.05. This test is usually applied to compare al-
gorithms with some random features and identify if
the difference in performance between them is due to
random causes.
Table 2 shows the number of better average so-
lutions found by each strategy, for each group of the
same size instances. The number of cases where p-
value is less than 0.05 is shown in brackets. When
comparing DM-GRASP/VND with GRASP/VND,
we see that the DM-GRASP/VND obtained the best
result for all the instances and, in almost all cases, the
difference is statistically significant. These results in-
dicate the superiority of the proposed strategy.
Table 2: Analysis of statistical significance.
Algorithm
Instance Group
n100 n200 n300 n400 n500
GRASP/VND 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
DM-GRASP/VND 10(6) 10(4) 10(7) 10(9) 10(8)
The Wilcoxon-Mann-Whitney non-parametric
test was also applied to check if the DM-
GRASP/VND method could find better solutions
than the original approach. According to (Siegel and
Castellan Jr, 1988), this statistical test is commonly
used when two independent samples are analysed
and whenever it is necessary to have a statistical
test to reject the null hypothesis (i.e., there are no
significant differences between these two samples),
with a significance level of α (i.e., it is possible to
reject the null hypothesis with the probability of
(1 α) × 100%). Two hypotheses were used in this
test:
null hypothesis (H0): there are no significant dif-
ferences between the solutions found by DM-
GRASP/VND and the original method; and
alternative hypothesis (H1): there are significant
differences between the solutions found by DM-
GRASP/VND and the original algorithm.
Considering the results shown in Table 1, using
the R package (The R Project for Statistical Comput-
ing, 2013), it is possible to reject H0 with α = 2.2 ×
10
16
. Thus, with a probability greater than 99%, we
can conclude that there are significant differences be-
tween the solutions found by DM-GRASP/VND and
GRASP/VND heuristics.
5.3 Complementary Analysis
Figures 1 and 2 illustrate the behaviour of the
construction and local search phases, for both
GRASP/VND and DM-GRASP/VND, in terms of the
solution cost values obtained, along the execution of
1000 iterations for the n500q10G instance with a spe-
cific random seed. We could see that, as the 1-PDTSP
is a minimization problem, the local search reduces
the cost of the solution obtained by the construction
phase.
In Figure 1, we notice that the GRASP/VND
heuristic behaves similarly throughout the iterations.
Furthermore, we could also see that the GRASP/VND
and the DM-GRASP/VND (see Figure 2) has exactly
the same behaviour until the 500th iteration, where
the data mining procedure is executed. From this
ExtendingtheHybridizationofMetaheuristicswithDataMiningtoaBroaderDomain
401
point on, the quality of the solutions obtained by DM-
GRASP/VND, both in construction and local search
procedures, is improved.
0
50000
100000
150000
200000
250000
300000
350000
0 500 1000
Cost
Iteration
Construction
Local Search
Figure 1: Cost X iteration plot of GRASP/VND for instance
n500q10G.
0
50000
100000
150000
200000
250000
300000
350000
0 500 1000
Cost
Iteration
Construction
Local Search
Figure 2: Cost X iteration plot of DM-GRASP/VND for
instance n500q10G.
Towards making visible the improvement of the
local search phase after the data mining call, we ex-
panded Figures 1 and 2, as shown in Figures 3 and
4. In them, each algorithm presents the cost of so-
lution obtained along 1000 iterations, but we reduce
the gap in the cost axis for the values from 27000 to
33000. By looking at Figure 3, we can see that the
GRASP/VND heuristic has found only a few solu-
tions with cost less than 29000 throughout the itera-
tions. However, the DM-GRASP/VND approach, af-
ter the 500th iteration, reached several solutions with
cost less than 29000 (see Figure 4). We can also
notice that the DM-GRASP/VND strategy constructs
initial solutions, which are based on the adapted con-
struction method, as good as those already explored
by the local search phase.
27000
28000
29000
30000
31000
32000
33000
0 250 500 750 1000
Cost
Iteration
Construction
Local Search
Figure 3: Cost X iteration enlarged plot of GRASP/VND
for instance n500q10G.
27000
28000
29000
30000
31000
32000
33000
0 250 500 750 1000
Cost
Iteration
Construction
Local Search
Figure 4: Cost X iteration enlarged plot of DM-
GRASP/VND for instance n500q10G.
An additional experiment was run to evalu-
ate the time required for GRASP/VND and DM-
GRASP/VND to achieve a solution as good as a tar-
get solution value. Each strategy was run 100 times
(with different random seeds) until a target solution
cost value was reached for a specific instance. The
instance n500q10G was used, with the target value
equal to 29123. For each seed, the time (in seconds)
in which the target was reached is plotted, as shown
in Figure 5. We see that in almost all executions
the DM-GRASP/VND reached the target before the
GRASP/VND.
Figure 6 presents another comparison between
these algorithms, based on the time-to-target plots
(TTT-plots) (Aiex et al., 2007), which are used to
analyse the behaviour of algorithms with some ran-
dom components. These plots show the cumulative
probability, vertical axis, for an algorithm to reach a
prefixed target solution in the indicated running time,
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
402
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 10 20 30 40 50 60 70 80 90 100
Time (s)
Seed
GRASP/VND
DM-GRASP/VND
Figure 5: Analysis of convergence with a target for instance
n500q10G.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1000 2000 3000 4000 5000 6000 7000 8000
Cumulative probability
Time to target solution (s)
GRASP/VND
DM-GRASP/VND
Figure 6: Time-to-target plot for a target for instance
n500q10G.
as defined by the horizontal axis.
In the TTT-plots experiment, we sorted out the ex-
ecution times required for each algorithm to reach a
solution at least as good as a target solution (these
times were already shown in Figure 5). Then, the i-
th sorted running time, t
i
, is associated with a prob-
ability p
i
= (i 0.5)/100 and points z
i
= (t
i
, p
i
) are
plotted. We see that the proposed strategy outper-
forms the pure GRASP/VND. The cumulative proba-
bility for DM-GRASP/VND to find, for example, the
prefixed target in 1000 seconds is almost 100% while
the same probability for the pure GRASP/VND is of
about 55%.
Figures 7 and 8 illustrate the running time spent
by the construction and the local search phases
of both algorithms evaluated for the n500q10G in-
stance. While the computational time required by the
GRASP/VND for both construction and local search
phases is the same throughout the iterations (see Fig-
ure 7), the hybrid DM-GRASP/VND method man-
aged a significant time reduction after the 500th iter-
ation, when the data mining call occurs. This time
reduction, seen in Figure 8, for both construction
and local search phases, corroborates the fact that the
adapted construction is faster than the original con-
struction. It also shows that the local search bene-
fits from the patterns making the DM-GRASP/VND
strategy converge faster.
0.001
0.01
0.1
1
10
100
0 500 1000
Time (s)
Iteration
Local Search
Construction
Figure 7: Time X iteration plot of one execution of
GRASP/VND for instance n500q10G.
0.001
0.01
0.1
1
10
100
0 500 1000
Time (s)
Iteration
Local Search
Construction
Figure 8: Time X iteration plot of one execution of DM-
GRASP/VND for instance n500q10G.
In Figure 9, we analyze the impact of fixing clients
in the adapted construction, which depends on how
far from the depot a pattern is fixed, i.e., patterns
far from the depot fix more clients in the adapted
construction, while patterns closer to the depot fix
less clients. This figure indicates that the larger the
amount of clients fixed, the smaller the cost of the so-
lution is.
In the last experiment, each strategy was run with
100, 200, 400, 600, 800, 1000, 1200, 1600, and 2000
iterations, evaluating the best solution, average qual-
ExtendingtheHybridizationofMetaheuristicswithDataMiningtoaBroaderDomain
403
30000
40000
50000
60000
70000
80000
90000
100000
0 100 200 300 400 500
Cost
Number of xed clients
Construction
Local Search
Figure 9: Solution cost with different number of fixed clients in the construction phase for instance n500q10G.
1
1.5
2
3
4
6
8
10
15
20
30
40
60
80
100
100 200 400 600 800 1000 1200 1600 2000
Percentual dierence (%)
Iteration
Percentual dierence over GRASP/VND
Best
Average
Time
Figure 10: Variating number of iterations.
ity of the solution and computing time for execution.
Figure 10 shows the percentual difference of DM-
GRASP/VND over GRASP/VND for each of the cri-
teria. We see that the percentual difference for best
solution and average quality of solution rose as more
iterations were performed, stabilizing apparently only
after 1200 iterations. As regards execution time, the
percentual difference varies slightly, though always
remaining above 30%.
6 CONCLUSIONS
The hybridization of GRASP heuristics with data
mining techniques has been successfully applied to
different combinatorial optimization problems. Until
now, all the problems explored had in common the
fact that their solutions were characterized by a set
of elements. We showed, as the main contribution of
this work, that this hybridization can also be applied
to problems in which solutions are represented by a
sequence of elements.
In this work we developed a hybrid data mining
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
404
heuristic for the 1-PDTSP (called DM-GRASP/VND)
by incorporating a frequent itemset mining technique
into a GRASP/VND existing algorithm, as presented
in (Hernández-Pérez et al., 2009). The experimen-
tal results showed that the DM-GRASP/VND method
outperformed the GRASP/VND strategy as the for-
mer was able to obtain better solutions in less compu-
tational time.
As future work, the goal is to implement a multi-
mining version of the DM-GRASP/VND, running
the data mining procedure more than once. This
idea, successfully applied in other hybrid data min-
ing strategies (Barbalho et al., 2013; Plastino et al.,
2013), consists of executing the data mining method
whenever the elite set becomes stable.
ACKNOWLEDGEMENTS
The authors would like to thank CNPq and CAPES
for the partial support of this research work.
REFERENCES
Agrawal, R. and Srikant, R. (1994). Fast algorithms for
mining association rules. In Proceedings of 20
th
In-
ternational Conference on VLDB, pages 487–499.
Aiex, R. M., Resende, M. G. C., and Ribeiro, C. C. (2007).
Ttt plots: A perl program to create time-to-target
plots. Optimization Letters, 1:355–366.
Barbalho, H., Rosseti, I., Martins, S. L., and Plastino,
A. (2013). A hybrid data mining GRASP with
path-relinking. Computers & Operations Research,
40:3159–3173.
Berger, D., Gendron, B., Potvin, J.-Y., Raghavan, S., and
Soriano, P. (2000). Tabu search for a network loading
problem with multiple facilities. Journal of Heuris-
tics, 6:253–267.
Feo, T. A. and Resende, M. G. C. (1995). Greedy random-
ized adaptive search procedures. Journal of Global
Optimization, 6:109–133.
Festa, P. and Resende, M. G. C. (2009a). An annotated bib-
liography of GRASP part I: Algorithms. International
Transactions in Operational Research, 16:1–24.
Festa, P. and Resende, M. G. C. (2009b). An annotated bib-
liography of GRASP part II: Applications. Interna-
tional Transactions in Operational Research, 16:131–
172.
Fleurent, C. and Glover, F. (1999). Improved construc-
tive multistart strategies for the quadratic assignment
problem using adaptive memory. INFORMS Journal
on Computing, 11:198–204.
Gendreau, M. and Potvin, J.-Y. (2010). Handbook of Meta-
heuristics, volume 146 of International Series in Op-
erations Research & Management Science. Springer,
2nd edition.
Goethals, B. and Zaki, M. J. (2003). Advances in fre-
quent itemset mining implementations: Introduction
to FIMI03. In Goethals, B. and Zaki, M. J., editors,
Frequent Itemset Mining Implementations (FIMI’03),
Proceedings of the ICDM 2003 Workshop on Frequent
Itemset Mining Implementations. Melbourne, Florida,
USA. Available in http://CEUR-WS.org/Vol-90.
Han, J. and Kamber, M. (2011). Data Mining: Concepts
and Techniques. Morgan Kaufmann Publishers, 3rd
edition.
Han, J., Pei, J., and Yin, Y. (2000). Mining frequent pat-
terns without candidate generation. SIGMOD Record,
29:1–12.
Hernández-Pérez (2004). Travelling salesman problems
with pickups and deliveries. PhD thesis, University
of La Laguna, Spain.
Hernández-Pérez, H. and Salazar-González, J. (2004a).
A branch-and-cut algorithm for a traveling salesman
problem with pickup and delivery. Discrete Applied
Mathematics, 145:453–459.
Hernández-Pérez, H. and Salazar-González, J. (2004b).
Heuristics for the one commodity pickup-and-delivery
traveling salesman problem. Transportation Science,
38:245–255.
Hernández-Pérez, H. and Salazar-González, J. (2007). The
one-commodity pickup-and-delivery traveling sales-
man problem: Inequalities and algorithms. Networks,
50:258–272.
Hernández-Pérez, H., Salazar-González, J., and Rodríguez-
Martín, I. (2009). A hybrid GRASP/VND heuris-
tic for the one-commodity pickup-and-delivery travel-
ing salesman problem. Computers & Operations Re-
search, 36:1639–1645.
Martinovic, G., Aleksi, I., and Baumgartner, A. (2008).
Single-Commodity Vehicle Routing Problem with
Pickup and Delivery Service. Mathematical Problems
in Engineering, pages 1–18.
Mladenovi
´
c, N. and Hansen, P. (1997). Variable neigh-
borhood search. Computers & Operations Research,
24:1097–1100.
Mladenovi
´
c, N., Uroševi
´
c, D., Hanafi, S., and Ili
´
c, A.
(2012). A general variable neighborhood search
for the one-commodity pickup-and-delivery travelling
salesman problem. European Journal of Operational
Research, 220:270–285.
Paes, B. C., Subramanian, A., and Ochi, L. S. (2010). Uma
heurística híbrida para o problema do caixeiro via-
jante com coleta e entrega envolvendo um único tipo
de produto. In Anais do XLII Simpósio Brasileiro
de Pesquisa Operacional, pages 1513–1524. (In Por-
tuguese).
Plastino, A., Barbalho, H., Santos, L., Fuchshuber, R., and
Martins, S. (2013). Adaptive and multi-mining ver-
sions of the DM-GRASP hybrid metaheuristic. Jour-
nal of Heuristics, pages 1–36.
Plastino, A., Fonseca, E. R., Fuchshuber, R., Martins, S. L.,
Freitas, A. A., Luis, M., and Salhi, S. (2009). A hybrid
data mining metaheuristic for the p-median problem.
In Proceedings of the SIAM International Conference
on Data Mining, pages 305–316.
ExtendingtheHybridizationofMetaheuristicswithDataMiningtoaBroaderDomain
405
Plastino, A., Fuchshuber, R., Martins, S. L., Freitas, A. A.,
and Salhi, S. (2011). A hybrid data mining meta-
heuristic for the p-median problem. Statistical Analy-
sis and Data Mining, 4:313–335.
Resende, M. G. C. and Ribeiro, C. C. (2005). GRASP with
path-relinking: Recent advances and applications. In
Ibaraki, T., Nonobe, K., and Yagiura, M., editors,
Metaheuristics: Progress as Real Problem Solvers,
volume 32 of Operations Research/Computer Science
Interfaces Series, pages 29–63. Springer.
Ribeiro, M. H., Plastino, A., and Martins, S. L. (2006). Hy-
bridization of GRASP metaheuristic with data mining
techniques. Journal of Mathematical Modelling Algo-
rithms, 5:23–41.
Ribeiro, M. H., Trindade, V. F., Plastino, A., and Martins,
S. L. (2004). Hybridization of GRASP metaheuristics
with data mining techniques. In Proceedings of the
ECAI workshop on hybrid metaheuristics, pages 69–
78.
Santos, H. G., Ochi, L. S., Marinho, E. H., and Drummond,
L. M. A. (2006a). Combining an evolutionary algo-
rithm with data mining to solve a single-vehicle rout-
ing problem. Neurocomputing, 70:70–77.
Santos, L. F., Martins, S. L., and Plastino, A. (2008).
Applications of the DM-GRASP heuristic: a survey.
International Transactions in Operational Research,
15:387–416.
Santos, L. F., Milagres, R., Albuquerque, C. V., Martins,
S. L., and Plastino, A. (2006b). A hybrid GRASP with
data mining for efficient server replication for reliable
multicast. In Proceedings of the IEEE GLOBECOM
conference, pages 1–6.
Santos, L. F., Ribeiro, M. H., Plastino, A., and Martins,
S. L. (2005). A hybrid GRASP with data mining for
the maximum diversity problem. In Proceedings of
the International Workshop on Hybrid Metaheuristics,
volume 3636 of Lecture Notes in Computer Science,
pages 116–127, Barcelona, Spain.
Siegel, S. and Castellan Jr, N. J. (1988). Nonparametric
Statistics for the Behavioral Sciences. McGraw-Hill,
2nd edition.
Talbi, E.-G. (2002). A taxonomy of hybrid metaheuristics.
Journal of Heuristics, 8:541–564.
The R Project for Statistical Computing (2013).
http://www.r-project.org/, last visit in 10/18/2013.
Zhao, F., Li, S., Sun, J., and Mei, D. (2009). Genetic al-
gorithm for the one-commodity pickup-and-delivery
traveling salesman problem. Computers & Industrial
Engineering, 56:1642–1648.
ICEIS2014-16thInternationalConferenceonEnterpriseInformationSystems
406