ON ORDER EQUIVALENCES BETWEEN DISTANCE AND SIMILARITY MEASURES ON SEQUENCES AND TREES

Martin Emms, Hector-Hugo Franco-Penya

2012

Abstract

Both ’distance’ and ’similarity’ measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings, and the paper concerns the equivalence or otherwise of these. These measures are usually parameterised by an atomic ’cost’ table, defining label-dependent values for swaps, deletions and insertions. We look at the question of whether orderings induced by a ’distance’ measure, with some cost-table, can be dualized by a ’similarity’ measure, with some other cost-table, and vice-versa. Three kinds of orderings are considered: alignment-orderings, for fixed source S and target T, neighbour-orderings, where for a fixed S, varying candidate neighbours Ti are ranked, and pair-orderings, where for varying Si, and varying Tj , the pairings hSi,Tji are ranked. We show that (1) alignment-orderings by distance can be dualized by similarity, and vice-versa; (2) neigbour-ordering and pair-ordering by distance can be dualized by similarity; (3) neighbour-ordering and pair-ordering by similarity can sometimes not be dualized by distance. A consequence if this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via distance.

References

  1. Alves, C. E. R., Cáceres, E. N., and Dehne, F. (2002). Parallel dynamic programming for solving the string editing problem on a cgm/bsp. In Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, SPAA 7802, pages 275-281. ACM.
  2. Batagelj, V. and Bren, M. (1995). Comparing resemblance measures. Journal of Classification, 12(1):73-90.
  3. Bernard, M., Boyer, L., Habrard, A., and Sebban, M. (2008). Learning probabilistic models of tree edit distance. Pattern Recogn., 41(8):2611-2629.
  4. Bose, R. P. J. C. and van der Aalst, W. M. P. (2009). Context aware trace clustering: Towards improving process mining results. In SAIM International Conference on Data Mining, SDM, pages 401-412.
  5. Chen, S., Ma, B., and Zhang, K. (2009). On the similarity metric and the distance metric. Theoretical Computer Science, 410(24-25):2365 - 2376.
  6. Emms, M. (2010). Trainable tree distance and an application to question categorisation. In KONVENS 2010.
  7. Emms, M. and Franco-Penya, H. (2011). Dataset used in Kendall-Tau experiments www.scss.tcd.ie/Martin.Emms/SimVsDistData.
  8. Gusfield, D. (1997). Algorithms on strings, trees, and sequences. Cambridge Univ. Press.
  9. Haji, J., Ciaramita, M., Johansson, R., Kawahara, D., Meyers, A., Nivre, J., Surdeanu, M., Xue, N., and Zhang, Y. (2009). The conll-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009).
  10. Herrbach, C., Denise, A., Dulucq, S., and Touzet, H. (2006). Alignment of rna secondary structures using a full set of operations. Technical Report 145, LRI.
  11. Kendall, M. G. (1945). The treatment of ties in ranking problems. Biometrika, 33(3):239-251.
  12. Kondrak, G. (2003). Phonetic alignment and similarity. Computers and the Humanities, 37.
  13. Kuboyama, T. (2007). Matching and Learning in Trees. PhD thesis, Graduate School of Engineering, University of Tokyo.
  14. Lesot, M.-J. and Rifqi, M. (2010). Order-based equivalence degrees for similarity and distance measures. In Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty, IPMU'10, pages 19-28, Berlin, Heidelberg. Springer-Verlag.
  15. Omhover, J.-F., Rifqi, M., and Detyniecki, M. (2005). Ranking invariance based on similarity measures in document retrieval. In Adaptive Multimedia Retrieval, pages 55-64.
  16. Ristad, E. S. and Yianilos, P. N. (1998). Learning string edit distance. IEEE Transactions on Pattern Recognition and Machine Intelligence, 20(5):522-532.
  17. Smith, T. F. and Waterman, M. S. (1981). Comparison of biosequences. Advances in Applied Mathematics, 2(4):482 - 489.
  18. Spiro, P. A. and Macura, N. (2004). A local alignment metric for accelerating biosequence database search. Journal of Computational Biology, 11(1):61-82.
  19. Stojmirovic, A. and Yu, Y.-K. (2009). Geometric aspects of biological sequence comparison. Journal of Computational Biology, 16:579-610.
  20. Tai, K.-C. (1979). The tree-to-tree correction problem. Journal of the ACM (JACM), 26(3):433.
  21. Wagner, R. A. and Fischer, M. J. (1974). The string-tostring correction problem. Journal of the Association for Computing Machinery, 21(1):168-173.
  22. Zhang, K. and Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18:1245-1262.
Download


Paper Citation


in Harvard Style

Emms M. and Franco-Penya H. (2012). ON ORDER EQUIVALENCES BETWEEN DISTANCE AND SIMILARITY MEASURES ON SEQUENCES AND TREES . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 15-24. DOI: 10.5220/0003712500150024


in Bibtex Style

@conference{icpram12,
author={Martin Emms and Hector-Hugo Franco-Penya},
title={ON ORDER EQUIVALENCES BETWEEN DISTANCE AND SIMILARITY MEASURES ON SEQUENCES AND TREES},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2012},
pages={15-24},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003712500150024},
isbn={978-989-8425-98-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - ON ORDER EQUIVALENCES BETWEEN DISTANCE AND SIMILARITY MEASURES ON SEQUENCES AND TREES
SN - 978-989-8425-98-0
AU - Emms M.
AU - Franco-Penya H.
PY - 2012
SP - 15
EP - 24
DO - 10.5220/0003712500150024