Authors:
Kilho Shin
1
and
Taro Niiyama
2
Affiliations:
1
University of Hyogo, Japan
;
2
NTT DoCoMo, Japan
Keyword(s):
Edit Distance, Kernel, Mapping, Tree.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
The edit distances has been widely used as an effective method to analyze similarity of compound data, which
consist of multiple components, such as strings, trees and graphs. For example, the Levenshtein distance for
strings is known to be effective to analyze DNA and proteins, and the Ta¨ı distance and its variations are attracting
wide attention of researchers who study tree-type data such as glycan, HTML-DOM-trees, parse trees
of natural language processing and so on. The problem that we recognize here is that the way of engineering
new edit distances was ad-hoc and lacked a unified view. To solve the problem, we introduce the concept
of the mapping distance. The mapping distance framework can provide a unified view over various distance
measures for compound data focusing on partial one-to-one mappings between data. These partial one-to-one
mappings are a generalization of what are known as traces in the legacy study of edit distances. This is a clear
contrast to the le
gacy edit distance framework, which define distances between compound data through edit
operations and edit paths. Our framework enables us to design new distance measures consistently, and also,
various distance measures can be described using a small number of parameters. In fact, in this paper, we
take rooted trees as an example and introduce three independent dimensions to parameterize mapping distance
measures. As a result, we define 16 mapping distance measures, 13 of which are novel. In experiments, we
discover that some novel measures outperform the others including the legacy edit distances in accuracy when
used with the k-NN classifier.
(More)