the two classes in the pair (Mitzev and Younan, 
2015). The training starts with (N-3) random 
sequences, where N is the length of the shortest time 
series from the dataset. The lengths of these random 
sequences are different. All present sequences are 
considered candidate shapelet. On every iteration of 
the PSO algorithm, the values of the random 
sequences are changed in a way to improve the 
information gain (which measures the separation 
between the two classes). The initial proposal from 
(Mitzev and Younan, 2015) suggested using N-3 
random sequences, but our tests showed that 
decreasing the number of competing sequences does 
not influence significantly the accuracy. Thus, the 
number of competing candidate shapelets was 
reduced to 20.  That saves  processing time and 
decreases the overall training time. Pseudo code 
from Algorithm 1 gives detailed picture of the 
process. The changes of each candidate’s values are 
dictated by the cognitive constants C1 and C2, the 
inertia weight constant W, and the randomness of the 
process is maintained by R1, R2 random values 
(lines 11-15). The function CheckCandidate (line 
21) checks the fitness of the current candidate 
shapelet and maintains the candidate’s best 
information gain. The iteration process stops when 
the best gain from the current iteration is not 
significantly better than the previously found best 
information gain (line 29). The class labels pairs 
along with corresponding shapelets form the nodes 
of the decision tree for a given combination. 
The final step of the training process is building a 
decision pattern for every time series from the train 
dataset. The time series from the train dataset is 
classified by the present decision trees. One decision 
tree produces a decision path during this 
classification, adding character “R” to the decision 
path if the process takes the right tree branch and 
character “L” respectively if the process takes the 
left branch (Fig. 1). The decision paths from all 
present trees are concatenated in order to produce 
the  decision pattern (Fig. 2). It appears that time 
series from the same class have similar decision 
patterns, but significantly differ from the decision 
patterns of the rest of the classes. The decision 
patterns for all the time series from the train dataset 
are kept and used for classification of the incoming 
time series from the test dataset.  
3.1.2 Classification  
The incoming time series from the test dataset that is 
about to be classified also produces decision pattern. 
This decision pattern is compared with the kept 
decision patterns from the training process. The two 
decision pattern strings are compared character by 
character- by value and place (Fig. 3). The 
comparison of the decision pattern is qualified with 
a comparison coefficient. The comparison 
coefficient is equal to the number of the characters 
that coincide by place and value- divided by the 
number of all characters from the decision pattern. 
The incoming time series is associated with the class 
to which it has most similar decision pattern 
(defined by the highest comparison coefficient).  
3.2  CDP Method Extension for 
Datasets with Less Class Labels 
The original algorithm, as specified by (Mitzev and 
Younan, 2016), limits the number of combinations 
into the subset. In case of only two classes, there 
will be only one such combination. In the case of 
“Gun_point” this combination is {1, 2}. Testing that 
decision tree with test time series from the 
“Gun_point” dataset produces 67.33% of accuracy. 
Our research confirmed that on every run the PSO 
algorithm produces different shapelet and an optimal 
split distance associated with the pair {1, 2}. That is 
based on the fact that the initial candidates are 
randomly generated and on every trial they will be 
different. Thus, even if the decision trees have the 
same indexes they have different decision 
conditions. The different decision conditions give 
different viewpoint that contributes to a new 
decision path to the decision pattern. Table 1 
illustrates the concept of using the same indexes 
decision tree with different decision conditions for 
the “Gun_point” dataset. Table 1 shows three 
scenarios- with one, two, and three decision trees. 
As shown, every presented decision tree node has a 
different shapelet and split distance. Increasing the 
pattern length from 1 up to 3 for this particular case 
increases the overall accuracy by almost 10%. 
Experiments with other datasets confirmed that the 
accuracy increases when the CDP re-trains and 
combines paths from the same-indexes decision 
trees. Increasing the pattern length leads to a higher 
accuracy, but there is a certain plateau achieved after 
certain pattern lengths. The reuse of the same-
indexes trees may also be applied to datasets with 
more than 5 class indexes, but the goal of this work 
is to overcome the initial limits of the CDP method 
and show that it is applicable for every dataset. 
 
 
ICPRAM 2017 - 6th International Conference on Pattern Recognition Applications and Methods