Hadamard Code Graph Kernels for Classifying Graphs
Tetsuya Kataoka and Akihito Inokuchi
School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo, Japan
Keywords:
Graph Classification, Support Vector Machine, Graph Kernel, Hadamard Code.
Abstract:
Kernel methods such as Support Vector Machines (SVMs) are becoming increasingly popular because of their
high performance on graph classification problems. In this paper, we propose two novel graph kernels called
the Hadamard Code Kernel (HCK) and the Shortened HCK (SHCK). These kernels are based on the Hadamard
code, which is used in spread spectrum-based communication technologies to spread message signals. The
proposed graph kernels are equivalent to the Neighborhood Hash Kernel (NHK), one of the fastest graph
kernels, and comparable to the Weisfeiler-Lehman Subtree Kernel (WLSK), one of the most accurate graph
kernels. The fundamental performance and practicality of the proposed graph kernels are evaluated using three
real-world datasets.
1 INTRODUCTION
A natural way of representing structured data is to
use graphs (Vinh, et. al, 2010). As an example,
the structural formula of a chemical compound is a
graph, where each vertex corresponds to an atom in
the compound and each edge corresponds to a bond
between the two atoms therein. Using such graph rep-
resentations, a new research field called graph min-
ing has emerged from data mining with the objective
of mining information from a database consisting of
graphs. With the potential to find meaningful infor-
mation, graph mining has raised great interest, and
research in the field has increased rapidly in recent
years. Furthermore, because the need for classifying
graphs has increased in many real-world applications,
e.g., the analysis of proteins in bioinformatics and
chemical compounds in cheminformatics (Sch ¨olkopf,
et. al, 2004), graph classification has also been widely
researched worldwide. The main objective of graph
classification is to classify graphs of similar structures
into the same classes. This originates from the fact
that instances represented by graphs usually havesim-
ilar properties if their graph representations have high
structural similarity.
Kernel methods such as Support Vector Machine
(SVM) are becoming increasingly popular because of
their high performance on graph classification prob-
lems (Kashima, et. al, 2003). Most graph kernels
are based on the decomposition of a graph into sub-
structures and a feature vector containing counts of
these substructures. Because the dimensionality of
these feature vectors is typically very high and this ap-
proach includes the subgraph isomorphism matching
problem that is known to be NP-complete (Garey and
Johnson, 1979), kernels deliberately avoid the explicit
computation of feature values and instead employ ef-
ficient procedures.
One representative graph kernel is the Random
Walk Kernel (RWK) (Sch¨olkopf and Smola, 2002;
Kashima, et. al, 2003), which computes k(g
i
,g
j
) in
O(|V(g)|
3
) for graphs g
i
and g
j
, where |V(g)| is the
number of vertices in g
i
and g
j
. The kernel returns
a high value if the random walk on the graph gen-
erates many sequences with the same labels for ver-
tices and edges, i.e., the graphs are similar to each
other. The Neighborhood Hash Kernel (NHK) (Hido
and Kashima, 2009) and the Weisfeiler-Lehman Sub-
tree Kernel (WLSK) are two other recently proposed
kernels that compute k(g
i
,g
j
) faster than RWK. The
NHK uses logical operations such as exclusive-OR on
the label set of adjacent vertices, while the WLSK
uses a concatenation of label strings of the adjacent
vertices to compute k(g
i
,g
j
). The labels updated by
repeating the hash or concatenation propagate the la-
bel information over the graph and uniquely represent
the higher-order structures around the vertices beyond
the vertex or edge level. An SVM with two graph ker-
nels works very well with benchmark data consisting
of graphs.
The computation of NHK is very efficient be-
cause its computation is a logical operation between
fixed-length bit strings and does not require any string
24
Kataoka, T. and Inokuchi, A.
Hadamard Code Graph Kernels for Classifying Graphs.
DOI: 10.5220/0005634700240032
In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 24-32
ISBN: 978-989-758-173-1
Copyright
c
2016 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
sorting. However, its drawback is hash collision,
which occurs when different induced subgraphs have
an identical hash value. Although WSLK must sort
the vertex labels, it has high expressiveness because
each vertex v has a distribution of vertex labels within
i steps from v. To overcome these drawbacks, in this
paper, we propose a novel graph kernel that is equiv-
alent with NHK in terms of time and space complex-
ities and comparable to WLSK in term of expressive-
ness. The graph kernel proposed in this paper is based
on the Hadamard code. The Hadamard code is used in
spread spectrum-based communication technologies
such as Code Division Multiple Access (CDMA) to
spread message signals. Because the probability of
occurrences of 1 and 1 are equivalent in each col-
umn of the Hadamard matrix except for the first col-
umn, labels assigned by our graph kernel follow the
binomial distribution with zero mean under a certain
assumption. Therefore, the expected value of the la-
bel is 0, and for such labels, a large memory space is
not required. This characteristic is used to compress
vertex labels in graphs, enabling the proposed graph
kernel to be computed quickly.
The rest of this paper is organized as follows. In
Section 2, we define the graph classification prob-
lem and explain the framework of the existing graph
kernels. In Section 3, we propose the Hadamard
Code Kernel (HCK), based on the Hadamard code,
and another graph kernel called the Shortened HCK
(SHCK), which is a version of HCK that compresses
vertex labels in graphs. In Section 4, we provide a
theoretical discussion of the effect of overflow on the
proposed graph kernel. In Section 5, the fundamental
performance and practicality of the proposed method
are demonstrated through experiments. Finally, we
conclude the paper in Section 6.
2 GRAPH KERNELS
2.1 Framework of Representative
Graph Kernels
This paper tackles the classification problem of
graphs. A graph is represented as g = (V,E,Σ,),
where V is a set of vertices, E V × V is a set of
edges, Σ is a set of vertex labels, and : V Σ is
a function that assigns a label to each vertex in the
graph. Additionally, the set of vertices in graph g
is represented as V(g). Although we assume that
only the vertices in the graphs have labels in this
paper, the methods in this paper can be applied to
graphs where both the vertices and edges have labels.
The vertices adjacent to vertex v are represented as
N(v) = {u | (v,u) E}. A sequence of vertices from
v to u is called a path, and its step refers to the num-
ber of edges on that path. A path is called simple
if and only if the path does not have repeating ver-
tices. Paths in this paper are not always simple. Given
two graphs g = (V,E,L,) and g
= (V
,E
,L
,
), g
is called a subgraph of g, if there exists an injective
function ϕ : V
V that satisfies the following three
conditions for v, v
1
,v
2
V
.
1. (ϕ(v
1
),ϕ(v
2
)) E, if (v
1
,v
2
) E
,
2.
(v) = (ϕ(v)),
3.
((v
1
,v
2
)) = ((ϕ(v
1
),ϕ(v
2
))).
Additionally, a subgraph g
of g is an “induced sub-
graph, where ϕ(v
1
) and ϕ(v
2
) are adjacent in g if and
only if v
1
and v
2
in V(g
) are adjacent in g
.
The graph classification problem is defined as
follows. Given a set of n training examples D =
{(g
i
,y
i
)} (i = 1, · · · , n), where each example is a pair
consisting of a labeled graph g
i
and the class y
i
{+1, 1} to which it belongs, the objective is to learn
a function f that correctly predicts the classes of the
test examples.
In this paper, graphs are classified by a
SVM that uses graph kernels. Let Σ and
c(g,σ) be {σ
1
,σ
2
,··· ,σ
|Σ|
} and c(g,σ) =
|{v V(g) | (v) = σ}|, respectively. A function
φ that converts a graph g to a vector is defined as
φ(g) =
c(g,σ
1
),c(g,σ
2
),··· ,c(g,σ
|Σ|
)
T
.
Function k
(g
i
,g
j
), defined as φ(g
i
)
T
φ(g
j
), is a semi-
positive definite kernel. This function is calculated as
follows.
k
(g
i
,g
j
) = φ(g
i
)
T
φ(g
j
)
=
v
i
V(g
i
)
v
j
V(g
j
)
δ((v
i
),(v
j
)),
where δ is the Kronecker delta.
Given a g
(h)
= (V,E,Σ,
(h)
), a procedure to con-
vert g
(h)
to another graph g
(h+1)
= (V,E,Σ
,
(h+1)
)
is called a relabel. Although relabel function
(h+1)
is defined later in detail, the label of a v in g
(h+1)
is defined using the labels of v and N(v) in g
(h)
,
and is denoted as
(h+1)
(v) = r(v,N(v),
(h)
). Let
{g
(0)
,g
(1)
,··· ,g
(h)
} be a series of graphs obtained by
iteratively applying a relabel h times, where g
(0)
is a
graph contained in D. Given two graphs g
i
and g
j
, a
graph kernel is defined using k
as
k(g
i
,g
j
) = k
(g
(0)
i
,g
(0)
j
) + k
(g
(1)
i
,g
(1)
j
) + ··· + k
(g
(h)
i
,g
(h)
j
).
Because k is a summation of semi-positive definite
kernels, k is also semi-positive definite (Cristianini
and Taylor, 2000).
Hadamard Code Graph Kernels for Classifying Graphs
25
Recently, various graph kernels have been applied
to the graph classification problem. Representative
graph kernels such as the NHK and WLSK follow
the above framework, where graphs contained in D
are iteratively relabeled. In these kernels,
(h)
(v) =
r(v, N(v),
(h1)
) characterizes a subgraph induced by
the vertices that are reachable from v within h steps
in g
(0)
. Therefore, given v
i
V(g
i
) and v
j
V(g
j
), if
subgraphs of the graphs induced by the vertices reach-
able from vertices v
i
and v
j
within h steps are identi-
cal, the relabel assigns an identical label to them. Ad-
ditionally, it is desirable for a graph kernel to fulfill
the converse of this condition. However, it is not an
easy task to design such a graph kernel.
We now review the representative graph kernels,
NHK and WLSK.
NHK: Given a fixed-length bit string
(0)
1
(v) of
length L,
(h)
1
(v) is defined as follows.
(h)
1
(v) = ROT(
(h1)
1
(v))
M
uN(v)
(h1)
1
(u)
,
where ROT is bit rotation to the left and is the ex-
clusive OR of the bit strings. NHK is efficient in terms
of computation and space complexities because the
relabel of NHK is computable in O(L|N(v)|) for each
vertex and its space complexity is O(L).
Figure 1 shows an example of an NHK relabel and
its detailed calculation for a vertex v
2
, assuming that
L = 3. First,
(0)
1
(v
2
) = #011 is rotated to return #110.
We then obtain #001 by the exclusive OR of #110,
(0)
1
(v
1
) = #011,
(0)
1
(v
3
) = #001,
(0)
1
(v
4
) = #001, and
(0)
1
(v
5
) = #100. In this computation, we do not re-
quire sorted bit strings because the exclusive OR is
commutative. Three bits are required for
(0)
1
(v
2
) in
this example, and
(h)
1
(v
2
) also requires three bits,
even if h is increased.
NHK has a drawback with respect to accidental
hash collisions. For example, vertices v
1
, v
3
, and v
4
in
g
(1)
in Fig. 1 have an identical label after the relabel.
This is because v
3
and v
4
in g
(0)
have identical labels
and the same number of adjacent vertices. However,
despite the different labels and numbers of adjacent
vertices of v
1
and v
3
, these vertices have the same ver-
tex labels in g
(1)
, leading to low graph expressiveness
and low classification accuracy.
We next describe the WLSK, which is based on
the Weisfeiler-Lehman algorithm, an algorithm that
determines graph isomorphism.
WLSK: When
(0)
2
(v) returns a string of characters,
Figure 1: Relabeling g
(0)
to g
(1)
in NHK.
(h)
2
(v) is defined as
(h)
2
(v) =
(h1)
2
(v) ·
K
uN(v)
(h1)
2
(u)
,
where · and
J
are string concatenation operators. Be-
cause concatenation is not commutative, u is an iter-
ator to obtain the vertices N(v) adjacent to v in al-
phabetical order. Because
(h)
2
(v) has information on
the distribution of labels for h steps from v, it has
high graph expressiveness.
1
If the labels are sorted
using bucket sort, the time complexity of WLSK is
O(|Σ||N(v)|) for each vertex.
Figure 2 shows an example of a relabel using
WLSK. Vertices v
1
, v
2
, v
3
, v
4
, and v
5
in g
(0)
have
labels A, A, B, B, and C, respectively. For each ver-
tex, WLSK sorts the labels of the vertices adjacent to
the vertex, then concatenates these labels. In g
(1)
, v
3
has label BAC, meaning that v
3
has label B in g
(0)
and
two adjacent vertices whose labels are A and C.
In addition to NHK and WLSK, we define the La-
bel Aggregate Kernel (LAK) to facilitate the under-
standing of the other kernels proposed in this paper.
LAK: In this kernel,
(0)
3
(v) is a vector in |Σ|-
dimensional space. In concrete terms, if a vertex in
a graph has a label σ
i
among Σ = {σ
1
,σ
2
,··· ,σ
|Σ|
},
the i-th element in the vector is 1. Otherwise, it is 0.
In LAK,
(h)
3
(v) is defined as
(h)
3
(v) =
(h1)
3
(v) +
uN(v)
(h1)
3
(u).
1
When
(0)
2
(v) is a string of length 1,
(1)
2
(v) is a string of
length |N(v)| + 1. By replacing the later string with a new
string of length 1, both the computation time and memory
space that WLSK requires are reduced.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
26
Figure 2: Relabeling g
(0)
to g
(1)
in WLSK.
Figure 3: Relabeling g
(0)
to g
(1)
in LAK.
Figure 4: Relabeling g
(3)
to g
(4)
in LAK.
The i-th element in
(h)
3
(v) is the frequency of occur-
rence of character σ
i
in the string
(h)
2
(v) concatenated
by WLSK. Therefore,
(h)
3
(v) has information on the
distribution of labels within h steps from v. Therefore,
LAK has high graph expressiveness. However, when
Table 1: Graph Kernel Characteristics.
advantages drawbacks
NHK computation time hash collision
WSLK expressiveness computation time
LAK expressiveness & memory space
computation time
h is increased, the number of paths from v that reach
vertices labeled σ
i
increases exponentially. Thus, el-
ements in
(h)
3
(v) also increase exponentially. For ex-
ample, if the average degree of vertices is d, there
are (d + 1)
h
vertices reachable from v within h steps.
Thus, LAK requires a large amount of memory space.
Figures 3 and 4 showan exampleof a relabel using
LAK, assuming that |Σ| = 3. The vertex label of v
5
in
g
(1)
is (1, 2, 1), which means that there are one, two,
and one vertices reachable from v within one step that
have labels σ
1
, σ
2
, and σ
3
, respectively. Compared
with relabeling g
(0)
to g
(1)
, the additional number of
values in
(h)
3
(v) when relabeling g
(3)
to g
(4)
is large.
2.2 Existing Graph Kernel Drawbacks
We here summarize the characteristics of the above
three graph kernels. NHK is efficient because its
computation is a logical operation between fixed-
length bit strings and does not require string sorting.
However, its drawback is a tendency for hash colli-
sion, where different induced subgraphs have identi-
cal hash values. Although WSLK requires vertex la-
bel sorting, it has high expressiveness because
(h)
2
(v)
contains the distribution of the vertex labels within
h
steps (0 h
h) from v. LAK requires a large
amount of memory space to store vectors for high h
although it does not require label sorting. To over-
come these drawbacks, in this paper, we propose a
novel graph kernel that is equivalent to NHK in terms
of time and space complexities and equivalentto LAK
in terms of expressiveness.
3 GRAPH KERNELS BASED ON
THE HADAMARD CODE
In this section, we propose a novel graph kernel with
the Hadamard code to overcome the aforementioned
drawbacks. A Hademard matrix is a square (1,1)-
matrix in which any two row vectors are orthogonal,
defined as follows:
H
2
=
1 1
1 1
(1)
H
2
k
=
H
2
k1
H
2
k1
H
2
k1
H
2
k1
(2)
Hadamard Code Graph Kernels for Classifying Graphs
27
A Hadamard code is a row vector of the Hadamard
matrix. Given a Hadamard matrix of order 2
k
, 2
k
Hadamard codes having 2
k
elements are generated
from this matrix. Using the Hadamard codes, we
propose the HCK as follows.
HCK: Let H be a Hadamard matrix of order
2
log
2
|Σ|⌉
and
(0)
4
(v) be a Hadamard code of order
|H|. If a vertex v has label σ
i
, the i-th row in the
Hadamard matrix of order |H| is assigned to the
vertex. Then
(h)
4
(v) is defined as follows.
(h)
4
(v) =
(h1)
4
(v) +
uN(v)
(h1)
4
(u).
When
σ
i
is a Hadamard code for a vertex label
σ
i
,
T
σ
i
(h)
4
(v)/|H| is the occurrence of σ
i
in a string
(h)
2
(v) generated by WLSK. Therefore, HCK has the
same expressiveness as LAK.
Figure 5 shows an example of a relabel using
HCK. Each vertex v in g
(1)
is represented as a vector
produced by the summation of vectors for vertices ad-
jacent to v in g
(0)
. Additionally, after the relabel, we
can obtain the distribution of the vertex labels within
one step of v using the following calculation:
1
|H|
H
(1)
4
(v
5
)
=
1
4
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
4
0
2
2
=
1
2
1
0
.
That is, there are one σ
1
, two σ
2
, and one σ
3
labels
within one step of v
5
. Furthermore, the result is equiv-
alent to
(1)
3
(v
5
), as shown in Fig. 3. The reason why
we divide H
(h)
4
(v) by four is that the order of the
Hadamard matrix used is |H| = 4.
If each element in
(h)
4
(v) is stored in four bytes
(the commonly used size of integers in C, Java, and
other languages) the space complexity of HCK is
equivalent to LAK. Therefore, we have not overcome
the drawback of LAK yet. In this paper, we assume
that each vertex label is assigned to a vertex with
equal probability. Because the probability of occur-
rence of 1 and 1 are equivalent in the each col-
umn in the Hadamard matrix except for the first col-
umn, the i-th element (1 < i |Σ|) in
(h)
4
(v) follows
a binomial distribution with zero mean under this as-
sumption. Therefore, the expected value of the el-
ement in
(h)
4
(v) is 0, and for the elements, a large
Figure 5: Relabeling g
(0)
to g
(1)
in HCK.
memory space is not required. For example, Tables 2
and 3 represent values of the i-th elements in
(h)
3
(v
2
)
and
(h)
4
(v
2
), respectively, in a graph g
(h)
, when g
(0)
(shown in Fig. 6) is relabeled iteratively h times. Un-
der this assumption of vertex label probability, the ex-
pected value of all elements in
(h)
4
(v
2
) except for the
first element becomes 0. The first element represents
the number of paths from v
2
to the vertices reachable
within one step. Based on this observation, we assign
bit arrays of length ρ in the L bit array to the elements
as follows.
SHCK: Similar to NHK,
(0)
5
(v) is a fix-length bit
array of length L. The bit array is divided into
|H| fragments, one of which is a bit array of length
L ρ(|H| 1) and the rest are bit arrays of length
ρ. The first fragment of length L ρ(|H| 1) is as-
signed to store the first element of
(0)
4
(v), the next
fragment of length ρ is assigned to store the second
element, and so on. Here, ρ is a positive integer ful-
filling ρ(|H| 1) = ρ(2
log
2
|Σ|⌉
1) L. Addition-
ally, each element of
(0)
4
(v) is represented by its one’s
complement in
(0)
5
(v) for the purpose of the follow-
ing summation, which defines
(h)
5
(v).
(h)
5
(v) =
(h1)
5
(v) +
uN(v)
(h1)
5
(u).
Because
(h)
5
(v) is a fixed-length binary bit string and
(h)
5
(v) is the summation of the values represented as
bit strings, both the time and space complexities of
SHCK are equivalent to those of NHK. Additionally,
the expressiveness of SHCK is equivalent to LAK, if
overflow of the fix-length bit array does not occur.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
28
Table 2: Elements in a label in LAK.
h Label
0
(0)
3
(v
2
) = ( 0 1 0 0)
1
(1)
3
(v
2
) = ( 1 1 1 0)
2
(2)
3
(v
2
) = ( 2 3 2 2)
3
(3)
3
(v
2
) = ( 7 7 7 6)
4
(4)
3
(v
2
) = ( 20 21 20 20)
5
(5)
3
(v
2
) = ( 61 61 61 60)
6
(6)
3
(v
2
) = ( 182 183 182 182)
7
(7)
3
(v
2
) = ( 547 547 547 546)
8
(8)
3
(v
2
) = ( 1640 1641 1640 1640)
9
(9)
3
(v
2
) = ( 4921 4921 4921 4920)
10
(10)
3
(v
2
) = ( 14762 14763 14762 14762)
Figure 6: Relabeled graphs.
Table 3: Elements in a label in HCK.
h Label
0
(0)
4
(v
2
) =( 1 -1 -1 1)
1
(1)
4
(v
2
) =( 3 -1 -1 -1)
2
(2)
4
(v
2
) =( 9 -1 -1 1)
3
(3)
4
(v
2
) =( 27 -1 -1 -1)
4
(4)
4
(v
2
) =( 81 -1 -1 1)
5
(5)
4
(v
2
) =( 243 -1 -1 -1)
6
(6)
4
(v
2
) =( 729 -1 -1 1)
7
(7)
4
(v
2
) =( 2187 -1 -1 -1)
8
(8)
4
(v
2
) =( 6561 -1 -1 1)
9
(9)
4
(v
2
) =( 19683 -1 -1 -1)
10
(10)
4
(v
2
) =( 59049 -1 -1 1)
4 SHCK OVERFLOW
As explained in the previous section, in SHCK, the
fixed-length bit array L is divided into small frag-
ments, each of which corresponds to an element in
(h)
4
(v). We sum such bit arrays to relabel vertices.
Because all elements in
(h)
4
(v) except for the first ele-
ment are represented as a bit array of length ρ, we face
the possibility of overflow when iteratively summing
up these bit arrays. In this section, we theoretically
discuss the probability of overflow in SHCK.
Let x
k
i
be the i-th element in
(h)
4
(v), which is the
label of vertex v and is a value generated by summing
up the base Hadamard codes k times. If i = 1, x
k
i
= k.
For i 6= 1, if 2
ρ
x
k
i
2
ρ
1, x
k
i
fits in a fragment
of length ρ without overflowing. Let p(k, j) be the
probability that the value of x
k
i
is j and x
k
i
fits in a bit
fragment of length ρ without overflowing. Under the
assumption that the probability of any label existing
on a vertex is uniform, when k = 1,
p(k, j) =
1/2 if j = 1,
1/2 if j = 1, and
0 otherwise,
because an element in the Hadamard matrix is either
1 or 1. If x
k
i
fits in a bit array of length ρ without
overflowing, x
k1
i
also fits in the array. In contrast, if
x
k
i
cannot fit in a bit array of length ρ without over-
flowing, x
k+1
i
also cannot fit in the array. Overflow
occurs when x
k
i
is 2
ρ1
and +1 sum to x
k
i
or when x
k
i
is 2
ρ
and 1 sums to x
k
i
. Therefore, p(k, j) is intro-
duced by the following recurrence formula.
p(k, j) =
1
2
p(k 1, j 1) if j = 2
ρ
1,
1
2
p(k 1, j + 1) else if j = 2
ρ
,
1
2
p(k 1, j + 1) +
1
2
p(k 1, j 1)
else i f 2
ρ
< j < 2
ρ
1,
0 otherwise.
Accordingly, p(k), which is the probability that x
j
i
fits
in a bit array of length ρ without overflowing is
p(k) =
2
ρ
1
j=2
ρ
p(k, j).
After h relabels of a graph in which the average
degree is d, x
k
i
is a value that is a summation of k =
(d + 1)
h
binary values. The probability p(ρ, d,h) that
overflow does not occur for ρ, d, and h is
p(ρ,d,h) =
2
ρ
1
j=2
ρ
p
(d + 1)
h
, j
. (3)
When h increases, p(ρ, d,h) becomes very small.
Nevertheless, in the next section, we demonstrate that
the proposed graph kernel SHCK has the ability to
classify graphs with high accuracy.
5 EXPERIMENTAL EVALUATION
The proposed method was implemented in Java. All
experiments were done on an Intel Xeon X5670 2.93
GHz computer with 48 GB memory running Mi-
crosoft Windows 8. We compared the computation
Hadamard Code Graph Kernels for Classifying Graphs
29
Table 4: Summary of evaluation datasets.
MUTAG ENZYMES D&D
Number of graphs |D| 188 600 1178
Maximum graph size 84 126 5748
Average graph size 53.9 32.6 284.3
Number of labels |Σ| 12 3 82
Number of classes 2 6 2
(class distribution) (126,63) (100,100,100,100,100,100) (487, 691)
Average degree of vertices 2.1 3.8 5.0
Figure 7: Conversion of a graph.
time and accuracy of the prediction performance of
HCK and SHCK with those of HNK and WLSK. To
learn from the kernel matrices generated by the above
graph kernels, we used the LIBSVM package
2
using
10-fold cross validation.
We used three real-world datasets. The first
dataset, MUTAG (Debnath, et. al, 1991), contains in-
formation on 188 chemical compounds and their class
labels. The class labels are binary values that indi-
cate the mutagenicity of chemical compounds. The
second dataset, ENZYMES, contains information on
600 proteins and their class labels. The class labels
are one of six labels showing the six EC top-level
classes (Schomburg, et. al, 2004). The third dataset,
D&D, contains information on 1178 protein struc-
tures, in which each amino acid corresponds to a ver-
tex and two vertices are connected by an edge if they
are less than 6
˚
Angstroms apart (Dobson and Doig,
2003). Each chemical compound is represented as an
undirected graph where each vertex, edge, vertex la-
bel, and edge label corresponds to an atom, chemi-
cal bond, atom type, and bond type, respectively. Be-
cause we assume that only vertices in graphs have la-
bels, the chemical graphs are converted following the
article (Hido and Kashima, 2009), that is, an edge la-
beled with that is adjacent to vertices v and u in a
chemical graph is replaced with a vertex labeled with
that is adjacent to v and u with unlabeled edges, as
shown in Fig. 7. Table 4 summarizes the datasets.
5.1 Scalability
Figures 8, 9, and 10 show the computation time re-
quired to obtain a graph g
(h)
from a graph g
(0)
in
2
http://www.csie.ntu.edu.tw/cjlin/libsvm/
Ϭ
Ϭ͘ϭ
Ϭ͘Ϯ
Ϭ͘ϯ
Ϭ͘ϰ
Ϭ͘ϱ
Ϭ͘ϲ
Ϭ͘ϳ
Ϭ ϱ ϭϬ ϭϱ ϮϬ
ŽŵƉƵƚĂƚŝŶdŝŵĞ΀ŵƐĞĐ΁
Ś
E,<
t>^<
,<
^,<
Figure 8: Computation time for various h (MUTAG).
Ϭ
Ϭ͘ϭ
Ϭ͘Ϯ
Ϭ͘ϯ
Ϭ͘ϰ
Ϭ͘ϱ
Ϭ͘ϲ
Ϭ ϱ ϭϬ ϭϱ ϮϬ
ŽŵƉƵƚĂƚŝŽŶdŝŵĞ΀ŵƐĞĐ΁
Ś
E,<
t>^<
,<
^,<
Figure 9: Computation time for various h (ENZYMES).
NHK, WLSK, HCK, and SHCK for various h for
the MUTAG, ENZYMES, and D&D datasets, respec-
tively. As shown in the figures, NHK and SHCK are
faster than HCK, and much faster than WLSK. Ad-
ditionally, the computation time of NHK, HCK, and
SHCK increases linearly when h is increased. The
reason why WLSK requires such a large amount of
computation time is that WLSK must sort the labels
of adjacent vertices and replace a string of length
|N(v)| + 1 with a string of length 1. This is espe-
cially true when h = 11 or 15 for the MUTAG dataset,
h = 8 or 14 for the ENZYMES dataset, and h = 10
or 20 for the D&D dataset. In our implementation,
this replacement is done with Java’s HashMap class,
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
30
Ϭ
Ϯ
ϰ
ϲ
ϴ
ϭϬ
ϭϮ
ϭϰ
ϭϲ
Ϭ ϱ ϭϬ ϭϱ ϮϬ
ŽŵƉƵƚĂƚŝŽŶdŝŵĞ΀ŵƐĞĐ΁
Ś
E,<
t>^<
,<
^,<
Figure 10: Computation time for various h (D&D).
ϲϱ
ϳϬ
ϳϱ
ϴϬ
ϴϱ
ϵϬ
ϵϱ
Ϭ ϱ ϭϬ ϭϱ ϮϬ
ůĂƐƐŝĨŝĐĂƚŝŽŶĐĐƵƌĂĐLJ΀й΁
Ś
E,<
t>^<
,<
^,<;ʌсϭͿ
^,<;ϭфʌфϱͿ
Figure 11: Classification accuracy for various h and ρ (MU-
TAG).
where a string of length |N(v)| + 1 is the hash key
and a string of length 1 is a value corresponding to
that key. Although the average degree in the evalu-
ated datasets is small, WLSK requires further com-
putation time when the average degree of the data in-
creases. HCK requires a large amount of computation
time for the D&D dataset because the number of la-
bels in the dataset is large and its computation time is
proportional to the number of labels.
5.2 Classification Accuracy
Figure 11 shows the classification accuracy of NHK,
WLSK, HCK, and SHCK for various h and ρ for the
MUTAG dataset. Their maximum accuracies for var-
ious h are almost the same. When h = 0, the accuracy
for SHCK (ρ = 1) is very low, because 1 or 1 (the
values in the Hadamard matrix) cannot be stored as
a one’s complement consisting of one bit. The ac-
curacy of HCK is exactly the same as that of SHCK
(1 < ρ < 5), which means that although overflow may
occur in SHCK, the kernel can assign identical vertex
labels to the identical subgraphs induced by a vertex
v and the vertices within h steps from v. Figure 12
shows the classification accuracy of NHK, WLSK,
ϭϱ
ϮϬ
Ϯϱ
ϯϬ
ϯϱ
ϰϬ
ϰϱ
ϱϬ
ϱϱ
Ϭ ϱ ϭϬ ϭϱ ϮϬ
ůĂƐƐŝĨŝĐĂƚŝŽŶĐĐƵƌĂĐLJ΀й΁
Ś
E,< t>^<
,< ^,<;ʌсϭͿ
^,<;ʌсϮͿ ^,<;ʌсϯͿ
^,<;ϳфʌфϭϳͿ
Figure 12: Classification accuracy for various h and ρ (EN-
ZYMES).
ϱϱ
ϲϬ
ϲϱ
ϳϬ
ϳϱ
ϴϬ
ϴϱ
Ϭ ϱ ϭϬ ϭϱ ϮϬ
ůĂƐƐŝĨŝĐĂƚŝŽŶĐĐƵƌĂĐLJ΀й΁
Ś
E,<
t>^<
,<
^,<;ʌсϭͿ
^,<;ʌсϮͿ
Figure 13: Classification accuracy for various h and ρ
(D&D).
HCK, and SHCK for various h and ρ for the EN-
ZYMES dataset. The accuracy of WLSK is slightly
superior to those of HCK and SHCK (ρ = 2, ρ = 3,
and 7 < ρ < 17), and their accuracies are much supe-
rior to those of NHK and SHCK (ρ = 1). The perfor-
mance of HCK is exactly the same as that of SHCK
for high ρ (7 < ρ < 17) and almost the same of that
of SHCK for low ρ (ρ = 2 and ρ = 3). The max-
imum accuracy of WLSK is 53.0%, while the max-
imum accuracy of both HCK and SHCK (ρ = 3, 4,
and 7 < ρ < 17) is 51.3%. The reason why the accu-
racy of WLSK is slightly superior to that of HCK is
that
(h)
2
(v) contains information on the distribution of
labels at h steps from v, while
(h)
4
(v) contains infor-
mation on the distribution of all labels within h steps
from v. Although the latter distribution can be ob-
tained from the former distribution, the former dis-
tribution cannot be obtained from the latter distribu-
tion. Therefore, WLSK is more expressive than HCK
Hadamard Code Graph Kernels for Classifying Graphs
31
t>^<
E,<
,< ^,<
ĨĂƐƚ
ĂĐĐƵƌĂƚĞ
Figure 14: Qualitative performance of evaluated graph ker-
nels.
and SHCK. When ρ is increased up to 16, the length
of a bit string to store the first element of
(h)
4
(v) is
L ρ × 2
log
2
|Σ|⌉
= 64 16 × 2
log
2
3
= 0. Even in
this case, the accuracy of SHCK is equivalent to that
of HCK, which means that the overflow of the first
element of
(h)
4
(v) has absolutely no impact on classi-
fication accuracy. Figure 13 shows the classification
accuracy of NHK, WLSK, HCK, and SHCK for var-
ious h and ρ for the D&D dataset. All accuracies ex-
cept for that of SHCK (ρ = 1) are almost equivalent.
6 CONCLUSION
In this paper, we proposed graph kernels based on the
Hadamard code to classify graphs. Figure 14 presents
a qualitative description of the performance of graph
kernels in terms of computation time and classifica-
tion accuracy. These experimental results show that
the proposed graph kernel SHCK is fast and accurate.
REFERENCES
Borgwardt, Karsten M., Cheng, Soon Ong, Schonauer, Ste-
fan, Vishwanathan, S. V. N., Smola, Alex J., and
Kriegel, Hans-Peter. 2005. Protein Function Predic-
tion via Graph Kernels. Bioinfomatics 21 (suppl 1):
47–56.
Chang, Chih-Chung, and Lin, Chih-Jen. 2001. LIBSVM: A
library for support vector machines. Available online
at http://www.csie.ntu.edu.tw/cjlin/libsvm.
Cristianini, Nello, and Shawe-Taylor, John. 2000. An
Introduction to Support Vector Machines and Other
Kernel-based Learning Methods. Cambridge Univer-
sity Press.
Debnath, Asim Kumar, Lopez de Compadre, Rosa L., Deb-
nath, Gargi, Shusterman, Alan J., and Hansch, Cor-
win. 1991. Structure-Activity Relationship of Mu-
tagenic Aromatic and Heteroaromatic Nitro Com-
pounds. Correlation with Molecular Orbital Energies
and Hydrophobicity. Journal of Medicinal Chemistry
34: 786–797.
Dobson, Paul D., and J. Doig, Andrew. 2003. Distinguish-
ing Enzyme Structures from Non-enzymes Without
Alignments. Journal of Molecular Biology 330(4):
771–783.
Garey, Michael R., and Johnson, David S.. 1979. Com-
puters and Intractability: A Guide to the Theory of
NP-Completeness. W.H. Freeman.
Hido, Shohei, and Kashima, Hisashi. 2009. A Linear-Time
Graph Kernel. In Proc. of the International Confer-
ence on Data Mining (ICDM). 179–188.
Kashima, Hisashi, Tsuda, Koji, and Inokuchi, Aki-
hiro. 2003. Marginalized Kernels Between Labeled
Graphs. In Proc. of the International Conference on
Machine Learning (ICML). 321–328.
Shervashidze, Nino, Schweitzer, Pascal, Jan van Leeuwen,
Erik, Mehlhorn, Kurt, and Borgwardt, Karsten M..
2011. Weisfeiler-Lehman Graph Kernels. Journal of
Machine Learning Research (JMLR): 2539–2561.
Sch¨olkopf, Bernhard, and Smola, Alexander J.. 2002.
Learning with Kernels. MIT Press.
Sch¨olkopf, Bernhard, Tsuda, Koji, and Vert, Jean-Philippe.
2004 Kernel Methods in Computational Biology. MIT
Press.
Schomburg, Ida, Chang, Antje, Ebeling, Christian, Gremse,
Marion, Heldt, Christian, Huhn, Gregor, and Schom-
burg, Dietmar. 2004. BRENDA, the Enzyme
Database: Updates and Major New Developments.
Nucleic Acids Research 32D: 431–433.
Vinh, Nguyen Duy, Inokuchi, Akihiro, and Washio,
Takashi. 2010. Graph Classification Based on Opti-
mizing Graph Spectra. In Proc. of the International
Conference on Discovery Science. 205–220.
ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods
32