Subcaterpillar Isomorphism Between Caterpillars:
Subtree Isomorphism Restricted Text and Pattern Trees to Caterpillars
Tomoya Miyazaki and Kouich Hirata
Kyushu Institute of Technology, Kawazu 680-4, Iizuka 820-8502, Japan
Keywords:
Subcaterpillar Isomorphism Between Caterpillars, Subcaterpillar Isomorphism, Subtree Isomorphism. Rooted
Labeled Caterpillar, Rooted Labeled Unordered Tree, Caterpillar Inclusion.
Abstract:
In this paper, as a pattern matching for rooted labeled caterpillars (caterpillars, for short), we discuss a
subcaterpillar isomorphism between caterpillars whether or not a pattern caterpillar is a subcaterpillar of a
text caterpillar. Then, we design the algorithms to solve it by simplifying the algorithms for subcaterpillar
isomorphism (between a caterpillar and a tree) when a pattern caterpillar is a subcaterpillar of a text tree
designed by Miyazaki and Hirata (2022). These algorithms run in O(hHσ) time and O(h) space, where
h is the height of a pattern caterpillar, H is the height of a text caterpillar and σ is the number of labels
in the caterpillars. Finally, we give experimental results of computing these algorithms by comparing with
subcaterpillar isomorphism and caterpillar inclusion.
1 INTRODUCTION
The pattern matching for tree-structured data such as
HTML and XML documents for web mining or DNA
and glycan data for bioinformatics is one of the fun-
damental tasks for information retrieval or query pro-
cessing. As such pattern matching for rooted labeled
unordered trees (a tree, for short), a subtree isomor-
phism is the problem of determining, for a pattern
tree P and a text tree T , whether or not there ex-
ists a subtree of T which is isomorphic to P. It is
known that the subtree isomorphism can be solved in
O(p
1.5
t/ log p) time (Shamir and Tsur, 1999), where
p is the number of vertices in P and t is the num-
ber of vertices in T . On the other hand, it cannot be
solved in O(t
2ε
) time for every ε (0 < ε < 1) under
SETH (Abboud et al., 2018).
Recently, by focusing on a rooted labeled cater-
pillar (a caterpillar, for short) (cf., (Gallian, 2007))
as the restriction of trees, Miyazaki and Hirata have
discussed the subcaterpillar isomorphism when a pat-
tern tree is a caterpillar (Miyazaki and Hirata, 2022).
Then, they have designed the algorithms of the sub-
caterpillar isomorphism running in (i) O(tDhσ) time
and O(Dh) space and (ii) O(tDσ) time and O(D(h +
H)) space, respectively
1
. Here, h is the height of P,
H is the height of T, D is the degree of T and σ is the
1
In this paper, we ignore the time complexity of the ini-
tialization of storing structures by traversing data, as same
as (Miyazaki et al., 2022).
number of alphabets for labels in a pattern and a text.
Note that these algorithms return all of the positions
in T where P is a subcaterpillar of T .
As another pattern matching for tree-structured
data, it is known the inclusion problem of determining
whether or not a text tree T achieves to a pattern tree P
by deleting vertices in T is NP-complete (Kilpel
¨
ainen
and Mannila, 1995). This statement also holds even
if P is a caterpillar (Kilpel
¨
ainen and Mannila, 1995).
On the other hand, Miyazaki et al. (Miyazaki et al.,
2022) have shown that, if both P and T are cater-
pillars, then we can solve the inclusion problem in
O((h + H)σ) time. We call this problem a caterpillar
inclusion. Note that this algorithm returns “yes” if a
patter caterpillar P is included in a text caterpillar T
and “no” otherwise.
In this paper, we investigate a subcaterpillar iso-
morphism between caterpillars that is a subcaterpillar
isomorphism when both a pattern tree P and a text tree
T are caterpillars. The subcaterpillar isomorphism be-
tween caterpillars is the special problem of not only
the subcaterpillar isomorphism but also the caterpil-
lar inclusion, because it is regarded as a caterpillar
inclusion that T achieves P by deleting leaves or the
roots in T .
In this paper, by simplifying the algorithms (i) and
(ii) for subcaterpillar isomorphism, we design two
algorithms CATCATISO and CATCATISO2 for sub-
caterpillar isomorphism between caterpillars. Then,
both CATCATISO and CATCATISO2 run in O(hHσ)
Miyazaki, T. and Hirata, K.
Subcaterpillar Isomorphism Between Caterpillars: Subtree Isomorphism Restricted Text and Pattern Trees to Caterpillars.
DOI: 10.5220/0011659600003411
In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2023), pages 89-94
ISBN: 978-989-758-626-2; ISSN: 2184-4313
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
89
time and O(h) space. Furthermore, we give experi-
mental results of computing CATCATISO and CAT-
CATISO2, by comparing with subcaterpillar isomor-
phism between a caterpillar and a tree (Miyazaki
and Hirata, 2022) and caterpillar inclusion (Miyazaki
et al., 2022).
2 PRELIMINARIES
A tree is a connected graph without cycles. For a tree
T = (V,E), we denote V and E by V (T ) and E(T ).
We sometimes denote v V (T ) by v T . A rooted
tree is a tree with one vertex r chosen as its root,
which we denote by r(T ).
For each vertex v in a rooted tree with the root r,
let UP
r
(v) be the unique path from v to r. The parent
of v(̸= r), which we denote by par(v), is its adjacent
vertex on UP
r
(v) and the ancestors of v(̸= r) are the
vertices on UP
r
(v) \ {v}. We denote u < v if v is an
ancestor of u, and we denote u v if either u < v
or u = v. The parent and the ancestors of the root r
are undefined. We say that u is a child of v if v is
the parent of u, and u is a descendant of v if v is an
ancestor of u. We denote the set of all children of v by
ch(v). Two vertices with the same parent are called
siblings. A leaf is a vertex having no children and we
denote the set of all the leaves in T by lv(T ). We call
a vertex that is not a leaf an internal vertex.
For a rooted tree T = (V,E) and a vertex v T ,
the complete subtree of T at v, denoted by T (v), is a
rooted tree S = (V
,E
) such that r(S ) = v, V
= {w
V | w v} and E
= {(u,w) E | u, w V
}.
The height h(v) of a vertex v is defined as
|UP
r
(v)| 1 and the height h(T ) of T is the maxi-
mum height for every vertex v T . The degree d(v)
of a vertex v is the number of the children of v, and
the degree d(T ) of T is the maximum degree for every
vertex in T .
We say that a rooted tree is ordered if a left-to-
right order among siblings is given; Unordered other-
wise. For a fixed finite alphabet Σ, we say that a tree
is labeled over Σ if each vertex is assigned a symbol
from Σ. We denote the label of a vertex v by l(v), and
sometimes identify v with l(v). In this paper, we call
a rooted labeled unordered tree over Σ a tree, simply.
Definition 1. Let T and S be trees.
1. We say that T is a subtree of S, denoted by T S,
if T is a tree such that V (T ) V (S) and E(T ) =
{(v,w) E(S) | v,w V (T )}.
2. We say that T and S are isomorphic, denoted by
T S, if T S and S T .
3. We say that T is a subtree isomorphism of S, de-
noted by T S, if there exists a tree S
S such
that T S
.
In this paper, we deal with a subtree isomorphism
problem of P for T whether or not P T for trees
P and T . We call P a pattern tree and T a text tree.
Then, the following theorem holds.
Theorem 1. (Shamir and Tsur, 1999) Let P and T be
trees where p = |P| and t = |T |. Then, the problem
of determining whether or not P T is solvable in
O(p
1.5
t/ log p) time.
As the restricted form of trees, we introduce a
rooted labeled caterpillar (a caterpillar, for short) as
follows.
Definition 2. We say that a tree is a caterpil-
lar (cf. (Gallian, 2007)) if it is transformed to a rooted
path after removing all the leaves in it. For a caterpil-
lar C, we call the remained rooted path a backbone of
C and denote it by bb(C).
It is obvious that r(C) = r(bb(C)) and V (C) =
V (bb(C))lv(C) for a caterpillar C, that is, every ver-
tex in a caterpillar is either a leaf or an element of the
backbone.
We call a subtree isomorphism when P is a cater-
pillar, that is, the problem of determining whether or
not P T , a subcaterpillar isomorphism. Then, the
following theorem holds.
Theorem 2. (Miyazaki and Hirata, 2022) Let P be
a caterpillar and T a tree, where t = |T |, h = h(P),
H = h(T ), D = d(T ) and σ = |Σ|. Then, the problem
of determining whether or not P T is solvable (i)
in O(tDhσ) time and O(Dh) space and (ii) in O(tDσ)
time and O(D(h + H)) space.
We refer the algorithm of (i) (resp., (ii)) in Theo-
rem 2 to CATTREEISO (resp., CATTREEISO2). Note
that both CATTREEISO and CATTREEISO2 return all
of the positions in T where P is a subcaterpillar of T .
Finally, we introduce a tree inclusion and a cater-
pillar inclusion. For a tree T and a vertex v T , the
deletion of v in T is to delete a non-root vertex v in
T with a parent v
, making the children of v become
the children of v
that are inserted in the place of v as
a subset of the children of v
. We denote the result of
the deletion of v in T by delete(T,v). See Figure 1.
Definition 3. Let P and T be trees. Then, we say that
P is an inclusion of T, denoted by P T , if either P
T or there exists a sequence of vertices v
1
,...,v
k
in T
such that T
0
T , T
k
P and T
i+1
delete(T
i
,v
i+1
)
(0 i k 1).
For trees P and T , if P T then P T , because
T achieves P by deleting leaves or roots in T . On the
other hand, the converse does not hold in general.
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
90
T delete(T,v)
Figure 1: delete(T,v).
We call the tree inclusion when both P and T are
caterpillars a caterpillar inclusion. Then, the follow-
ing theorems hold.
Theorem 3. (Kilpel
¨
ainen and Mannila, 1995) For
trees P and T , the problem of determining whether or
not P T is NP-complete. This statement also holds
even if the maximum height of T is at most 3.
Theorem 4. (Miyazaki et al., 2022) Let P and T be
caterpillars, where h = h(P), H = h(T ) and σ = |Σ|.
Then, the problem of determining whether or not P
T is solvable in O((h + H)σ) time.
We refer the algorithm in Theorem 4 to CATCAT-
INC. Note that CATCATINC returns “yes” if P T
and “no” otherwise.
3 SUBCATERPILLAR
ISOMORPHISM BETWEEN
CATERPILLARS
In this paper, we focus on a subcaterpillar isomor-
phism between caterpillars that is a subcaterpillar iso-
morphism when both P and T are caterpillars. In
other words, we focus on the problem of whether or
not P T for caterpillars P and T . We call P and T a
pattern caterpillar and a text caterpillar, respectively.
Throughout of this section, we refer p = |P|, t = |T |,
h = h(P), H = h(T ), D = d(T ) and σ = |Σ|.
For a pattern caterpillar P, we refer the backbone
of P to a sequence v
1
,...,v
n
such that (v
i
,v
i+1
)
E(P) and v
n
= r(P). We denote the children of v
i
by ch(v
i
). For a text caterpillar T , we refer the
backbone of T to a sequence w
1
,...,w
m
such that
(w
j
,w
j+1
) E(T ) and w
m
= r(T ). We denote the
children of w
j
by ch(w
j
).
Suppose that P T and let P
T be a subcater-
pillar in T such that P P
and bb(P
) = v
1
,...,v
n
,
where v
n
= r(P
). Then, we call the index j such that
v
1
= w
j
in T a matching position of P in T .
As same as the algorithms of CATTREEISO and
CATTREEISO2, we use a multiset of labels in order
to compare two sets of vertices. A multiset on Σ is
a mapping S : Σ N. For two multisets S
1
and S
2
,
S
1
S
2
if S
1
(a) S
2
(a) for every a Σ.
For a set V of vertices, we denote the multiset of
labels occurring in V by
e
V . Then, it is necessary for
the subcaterpillar isomorphism to check whether or
not
^
ch(v
i
)
^
ch(w
j
) for v
i
bb(P) and w
j
bb(T ). It
is realized to check
^
ch(v
i
)
(a)
^
ch(w
j
)
(a) for
every a Σ in O(σ) time (cf. (Muraka et al., 2019)).
By simplifying the algorithm CATTREEISO in
Theorem 2 (i), we design the algorithm CATCATISO
in Algorithm 1 to determine whether or not P T
and to output all of the matching positions if P T .
Here, the table match(i) stores j such that v
i
bb(P)
is corresponding to w
j
bb(T ).
procedure CATCATISO(P, T )
/* P : caterpillar, bb(P) = v
1
,. . .,v
n
*/
/* T : caterpillar, bb(T) = w
1
,. . .,w
m
*/
for i = 1 to n 1 do match(i) 0;1
for j = 1 to m do2
for i = n 1 downto 1 do3
if match(i) ̸= 0 then4
k match(i); match(i) 0;5
if l(v
i+1
) = l(w
j
) and6
^
ch(v
i+1
)
^
ch(w
j
) then
if i + 1 = n then output k;7
else match(i + 1) k;8
if l(v
1
) = l(w
j
) and
^
ch(v
1
)
^
ch(w
j
) then9
match(1) j;10
Algorithm 1: CATCATISO.
Example 1. Consider the pattern caterpillar P and
the text caterpillar T in Figure 2. Here, bb(P) =
v
1
,v
2
,v
3
and bb(T ) = w
1
,w
2
,w
3
,w
4
,w
5
,w
6
, so it
holds that n = 3 and m = 6.
a
v
3
b
a
v
2
b
a
v
1
a
b
c
a
w
6
b
c a
w
5
a
b
a
w
4
b
c a
w
3
a
b
a
w
2
b
c a
w
1
a
b
c
P T
Figure 2: A pattern caterpillar P and a text caterpillar T in
Example 1.
For P and T , the algorithm CATCATISO(P,T )
stores the values of match(i) and outputs the match-
ing positions as Table 1 Then, the matching positions
of P in T are 1, 2 and 4.
On the other hand, by simplifying the algorithm
CATTREEISO2 in Theorem 2 (ii), we design another
Subcaterpillar Isomorphism Between Caterpillars: Subtree Isomorphism Restricted Text and Pattern Trees to Caterpillars
91
Table 1: The execution of the algorithm CATCATISO(P, T ).
j 1 2 3 4 5 6
match(1) 1 2 0 4 0 6
match(2) 0 1 2 0 4 0
output 1 2 4
algorithm CATCATISO2 in Algorithm 2. The differ-
ence between CATCATISO2 and CATCATISO is that
CATCATISO2 does not always access all the values of
match(i) for 1 i n 1 but just access the values
of match(i) such that i CHK.
procedure CATCATISO2(P, T )
/* P : caterpillar, bb(P) = [v
1
,. . .,v
n
] */
/* T : caterpillar, bb(T) = [w
1
,. . .,w
m
] */
CHK
/
0;1
for i = 1 to n 1 do match(i) 0;2
for j = 1 to m do3
foreach i CHK do4
k match(i); match(i) 0;5
CHK CHK \ {i};
if l(v
i+1
) = l(w
j
) and6
^
ch(v
i+1
)
^
ch(w
j
) then
if i + 1 = n then output k;7
else match(i + 1) k;8
CHK CHK {i + 1};
if l(v
1
) = l(w
j
) and
^
ch(v
1
)
^
ch(w
j
) then9
match(1) j; CHK CHK {1};10
Algorithm 2: CATCATISO2.
Example 2. Consider the pattern caterpillar P and
the text caterpillar T in Example 1 (Figure 2). Then,
the algorithm CATCATISO2(P,T ) stores the values of
match(i) and outputs the matching positions, and ad-
ditionally updates the set of CHK as Table 2. Hence,
the matching positions of P in T are 1, 2 and 4.
Table 2: The execution of the algorithm CAT-
CATISO2(P,T ).
j 1 2 3 4 5 6
match(1) 1 2 0 4 0 6
match(2) 0 1 2 0 4 0
output 1 2 4
CHK 1 1, 2 2 1 2 1
For subcaterpillar isomorphism between caterpil-
lars, the following theorem holds.
Theorem 5. Let P and T be caterpillars, where h =
h(P), H = h(T ) and σ = |Σ|. Then, the algorithms of
CATCATISO(P,T ) and CATCATISO2(P,T ) output all
the matching positions of P in T correctly in O(hHσ)
time and O(h) space.
Proof. The following proof of the correctness is sim-
ilar as (Miyazaki and Hirata, 2022).
The algorithm CATCATISO first stores the candi-
date j of the matching point corresponding to v
1
to
match(1) if l(v
1
) = l(w
j
) and
^
ch(v
1
)
^
ch(w
j
) (line
9). Then, for the current j, the algorithm CATCATISO
removes the candidate k from match(i) and stores k to
match(i+1) if l(v
i+1
) = l(w
j
),
^
ch(v
i+1
)
^
ch(w
j
) and
i + 1 < n (lines 6 and 8). If i + 1 = n, then the algo-
rithm CATCATISO outputs k (line 7).
Hence, every output k at line 7 satisfies that l(v
i
) =
l(par
i1
(w
k
)) and
^
ch(v
i
) =
^
ch(par
i1
(w
k
)) for every
i (1 i n), where par
0
(v) = v and par
i+1
(v) =
par(par
i
(v)). As a result, the algorithm SUBCATISO
outputs all of the matching points of P in T .
On the other hand, the difference between the al-
gorithms CATCATISO and CATCATISO2 is the us-
ages of the set CHK, which is stored to all the indices
i such that match(i) ̸= 0 for 1 i n 1. Hence, the
algorithm CATCATISO2 can access all the values of
match(i) such that match(i) ̸= 0 for every 1 j m,
which implies the correctness of the algorithm CAT-
CATISO2.
Next, consider the computational complexity of
the algorithms.
For the algorithm CATCATISO, we can check the
lines 6 and 9 in O(σ) time. Also, the for-loop in line
3 is repeated at h 1 times and the for-loop in line 2
is repeated at H times. Then, the total running time
of CATCATISO is O(hHσ) time. Also the space is the
size of the table match, which is O(h).
On the other hand, for the algorithm CAT-
CATISO2, the foreach-loop in line 3 is repeated at
most h 1 times. Then, the total running time of
CATCATISO2 is O(hHσ) time. Also the space is the
sizes of the table match and the set CHK, which is
O(h + h) = O(h).
4 EXPERIMENTAL RESULTS
In this section, we give the experimental results of
computing CATCATISO and CATCATISO2. Here, the
computer environment is that OS is Ubuntu 18.04.4,
CPU is Intel Xeon E5-1650 v3(3.50GHz) and RAM
is 3.8GB.
We deal with caterpillars for N-glycans and all-
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
92
glycans from KEGG
2
, CSLOGS
3
, the largest 51,395
caterpillars (1%) in dblp
4
(refer to dblp
1%
) and Swis-
sProt from UW XML Repository
5
. Also we deal
with non-isomorphic caterpillars obtained by deleting
the root in Nasa (refer to NASA
), Protein (refer to
Protein
) and University (refer to University
) from
UW XML Repository. Table 3 illustrates the infor-
mation of such caterpillars. Here, #, n, d, h, λ and β
are the number of caterpillars, the average number of
vertices, the average degree, the average height, the
average number of leaves and the average number of
labels.
Table 3: The information of caterpillars.
data # n d h λ β
N-glycans 513 6.40 1.84 4.22 2.19 3.24
all-glycans 7,984 4.74 1.49 3.02 1.72 2.84
CSLOGS 41,592 5.84 3.05 2.20 3.64 5.18
dblp
1%
51,395 21.29 20.21 1.04 20.25 9.73
SwissProt 6,804 35.10 24.96 2.00 33.10 16.79
Nasa
33 7.27 5.15 1.64 5.64 3.18
Protein
5,150 4,97 3.63 1.16 3.81 4.57
University
26 1.35 0.35 0.19 1.15 1.35
We compare all the pairs (P, T ) in the caterpillars
in Table 3. The number of pairs is # × (# 1), and
Table 4 summarizes such number as #pairs.
Table 4: The number (#pairs) of all the pairs in caterpillars
in Table 3.
data #pairs
N-glycans 262,656
all-glycans 63,736,272
CSLOGS 1,729,852,872
dblp
1%
2,641,394,630
data #pairs
SwissProt 46,287,612
Nasa
1,056
Protein
26,517,350
University
650
First, we compare the running time of the al-
gorithms CATCATISO and CATCATISO2 in Sec-
tion 3 with the algorithms of CATTREEISO and CAT-
TREEISO2 (Miyazaki and Hirata, 2022).
Then, Table 5 illustrates the total and average run-
ning time of computing P T for all the data by
CATCATISO and CATCATISO2. Here, the bold faces
present the smaller total running time.
Table 5 shows that, whereas CATCATISO is faster
than CATCATISO2 for N-glycans, dblp
1%
, Nasa
and University
, CATCATISO2 is faster than CAT-
2
Kyoto Encyclopedia of Genes and Genomes,
http://www.kegg.jp/
3
http://www.cs.rpi.edu/
zaki/www-new/pmwiki.php/
Software/Software
4
http://dblp.uni-trier.de/
5
http://aiweb.cs.washington.edu/research/projects/xmlt
k/xmldata/www/repository.html
Table 5: The total and average running time (msec.) of
computing P T for all the data by CATCATISO and CAT-
CATISO2.
CATCATISO CATCATISO2
data total ave. total ave.
N-glycans 4,719 0.02 4,753 0.02
all-glycans 617,932 0.01 616,863 0.01
CSLOGS 13,703,530 0.01 13,636,908 0.01
dblp
1%
73,453,734 0.03 73,637,350 0.01
SwissProt 3,706,628 0.08 3,696,157 0.08
Nasa
12 0.01 17 0.02
Protein
166,717 0.01 170,123 0.01
University
1 0.00 4 0.01
CATISO for all-glycans, CSLOGS and SwissProt. On
the other hand, since the difference of the computa-
tion time between CATCATISO and CATCATISO2 is
not large, the usage of the set CHK in CATCATISO2
is not effective for our experimental data.
Table 6 illustrates the total and average running
time of computing P T for all the data by CAT-
TREEISO and CATTREEISO2.
Table 6: The total and average running time (msec.) of com-
puting P T for all the data by CATTREEISO and CAT-
TREEISO2.
CATTREEISO CATTREEISO2
data total ave. total ave.
N-glycans 6,315 0.02 6,325 0.02
all-glycans 726,173 0.01 855,131 0.01
CSLOGS 15,444,331 0.01 19,242,569 0.01
dblp
1%
78,063,181 0.03 80,077,842 0.01
SwissProt 3,790,730 0.08 3,934,301 0.09
Nasa
17 0.02 14 0.01
Protein
174,212 0.01 213,539 0.01
University
4 0.00 4 0.01
Tables 5 and 6 show that the algorithms of CAT-
CATISO and CATCATISO2 are faster than the algo-
rithms of CATTREEISO and CATTREEISO2. The
reason is that, whereas CATTREEISO and CAT-
TREEISO2 are necessary to traverse a whole text
caterpillar, CATCATISO and CATCATISO2 just tra-
verse the backbone of a text caterpillar.
Next, we compare the algorithms CATCATISO
and CATCATISO2 with CATCATINC. Note that CAT-
INC is a decision algorithm to return just “yes” or
“no. Then, we use the decision versions of the algo-
rithms of CATCATISO and CATCATISO2, designed
by changing line 7 as follows and by adding the fol-
lowing line 11 to the last of the algorithms.
line 7 if i + 1 = n then output “yes”; halt;
line 11 output “no”;
We refer the decision versions of CATCATISO and
CATCATISO2 to CATCATISO* and CATCATISO2*.
Subcaterpillar Isomorphism Between Caterpillars: Subtree Isomorphism Restricted Text and Pattern Trees to Caterpillars
93
Then, Table 7 illustrates the total and average run-
ning time (msec.) of determining whether or not
P T by CATCATISO* and CATCATISO2* and of
determining whether or not P T by CATCATINC.
Table 7: The total and average running time (msec.) of de-
termining whether or not P T by CATCATISO* and CAT-
CATISO2* and of determining whether or not P T by
CATCATINC.
CATCATISO*/ISO2* CATCATINC
data total ave. total ave.
N-glycans 4,788/4,721 0.02 16,075 0.06
all-glycans 611,103/603,052 0.01 2,221,026 0.04
CSLOGS 13,211,929/13,497,908 0.01 83,129,368 0.05
dblp
1%
73,315,987/73,548,291 0.03 143,584,440 0.05
SwissProt 3,706,628/3,696,157 0.08 7,291.159 0.16
Nasa
11/11 0.01 29 0.03
Protein
164,107/164,240 0.01 596,332 0.02
University
1/1 0.00 9 0.01
As stated in the previous sections, the algorithm
CATCATINC runs in O((h + H)σ) time (Theorem 4)
and the algorithms CATCATISO* and CATCATISO2*
run in O(hHσ) time (Theorem 5). On the other hand,
Table 7 shows that the algorithms CATCATISO* and
CATCATISO2* are much faster than the algorithm
CATCATINC. One of the reasons is that, whereas
the main loop in the algorithm CATCATINC is re-
peated at near to h + H times, the for-loop in the
algorithms CATCATISO* and CATCATISO2* are re-
peated at much smaller than H times.
Furthermore, Table 8 illustrates the number
(#pairs) of pairs (P,T ) such that P T and P
T (Miyazaki et al., 2022) with its ratio (%) in all the
pairs.
Table 8: The number (#pairs) of pairs (P,T ) such that P T
and P T with its ratio (%) in all the pairs.
P T P T
data #pairs % #pairs %
N-glycans 17,505 6.67 21,919 8.35
all-glycans 646,170 1.01 907,776 1.42
CSLOGS 1,979,560 0.11 2,277,568 0.13
dblp
1%
364,182,693 13.79 364,184,642 13.79
SwissProt 1,400,455 3.03 1,400,455 3.03
Nasa
108 10.23 108 10.23
Protein
3,701 0.01 3,701 0.01
University
1 0.15 1 0.15
Table 8 shows that, whereas #pairs such that P T
is smaller than #pair such that P T for N-glycans,
all-glycans, CSLOGS and dblp
1%
, the former is equal
to the latter for SwissProt, Nasa
, Protein
and
University
; Nevertheless, for these data, we can de-
termine P T faster than P T shown in Table 7.
5 CONCLUSION
In this paper, we have designed the algorithms of
CATCATISO and CATCATISO2 to solve the subcater-
pillar isomorphism between caterpillars and given the
experimental results of comparing them with the sub-
caterpillar isomorphism algorithms of CATTREEISO
and CATTREEISO2 and the caterpillar inclusion al-
gorithm CATCATINC.
Then, the algorithms of CATCATISO and CAT-
CATISO2 are faster than the algorithms of CAT-
TREEISO and CATTREEISO2 for subcaterpillar iso-
morphism between caterpillars. Also, whereas the al-
gorithm CATCATINC is faster than the decision ver-
sions CATCATISO* and CATCATISO2* in theoreti-
cal, the latter is faster than the former in experimental.
Since Theorem 1 for the subtree isomorphism also
holds for unrooted trees, it is a future work to extend
the algorithms in this paper to unrooted subcaterpil-
lar isomorphism between caterpillars. In particular,
it is necessary to investigate whether or not the un-
rooted subcaterpillar isomorphism between caterpil-
lars can avoid to the SETH-hardness of subtree iso-
morphism (Abboud et al., 2018).
REFERENCES
Abboud, A., Backurs, A., Hansen, T. D., v. Williams, V.,
and Zamir, O. (2018). Subtree isomorphism revisited.
ACM Trans. Algo., 14:27.
Gallian, J. A. (2007). A dynamic survey of graph labeling.
Electorn. J. Combin., 14:DS6.
Kilpel
¨
ainen, P. and Mannila, H. (1995). Ordered and un-
ordered tree inclusion. SIAM J. Comput., 24:340–356.
Miyazaki, T., Hagihara, M., and Hirata, K. (2022). Cater-
pillar inclusion: Inclusion problem for rooted labeled
caterpillars. In Proc. ICPRAM ’22, pages 280–287.
Miyazaki, T. and Hirata, K. (2022). Subcaterpillar isomor-
phism: Subtree isomorphism restricted pattern trees to
caterpillars. In Proc. FedCSIS ’22, pages 351–356.
Muraka, K., Yoshino, T., and Hirata, K. (2019). Vertical
and horizontal distances to approximate edit distance
for rooted labeled caterpillars. In Proc. ICPRAM’19,
pages 590–597.
Shamir, R. and Tsur, D. (1999). Faster subtree isomor-
phism. Algorithmica, 33:267–280.
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
94