# The Longest Common Subsequence Distance using a Complexity Factor

### Octavian Lucian Hasna, Rodica Potolea

#### Abstract

In this paper we study the classic longest common subsequence problem and we use the length of the longest common subsequence as a similarity measure between two time series. We propose an original algorithm for computing the approximate length of the LCSS that uses a discretization step, a complexity invariant factor and a dynamic threshold used for skipping the computation.

#### References

- Bagnall, A., Bostrom, A., Large, J., and Lines, J. (2016). The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version.
- Batista, G. E., Wang, X., and Keogh, E. J. (2011). A Complexity-Invariant Distance Measure for Time Series. In SDM, volume 11, pages 699-710. SIAM.
- Chen, Y., Keogh, E., Hu, B., Begum, N., Bagnall, A., Mueen, A., and Batista, G. (2015). The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time series data/.
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1):10-18.
- Hasna, O. L. (2015). The time series math library. github.com/octavian-h/time-series-math/.
- Hirschberg, D. S. (1975). A linear space algorithm for computing maximal common subsequences. 18(6):341- 343.
- Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. 23(1):67-72.
- Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. 3(3):263-286.
- Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 2-11. ACM.
- Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., and Keogh, E. (2012). Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 262-270. ACM.
- Sakoe, H. and Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. 26(1):43-49.
- Vlachos, M., Kollios, G., and Gunopulos, D. (2002). Discovering similar multidimensional trajectories. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 673-684. IEEE.

#### Paper Citation

#### in Harvard Style

Hasna O. and Potolea R. (2016). **The Longest Common Subsequence Distance using a Complexity Factor** . In *Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)* ISBN 978-989-758-203-5, pages 336-343. DOI: 10.5220/0006067603360343

#### in Bibtex Style

@conference{kdir16,

author={Octavian Lucian Hasna and Rodica Potolea},

title={The Longest Common Subsequence Distance using a Complexity Factor},

booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},

year={2016},

pages={336-343},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0006067603360343},

isbn={978-989-758-203-5},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)

TI - The Longest Common Subsequence Distance using a Complexity Factor

SN - 978-989-758-203-5

AU - Hasna O.

AU - Potolea R.

PY - 2016

SP - 336

EP - 343

DO - 10.5220/0006067603360343