TrieMotif - A New and Efficient Method to Mine Frequent K-Motifs from Large Time Series

Daniel Y. T. Chino, Renata R. V. Gonçalves, Luciana A. S. Romani, Caetano Traina Jr., Agma J. M. Traina

2014

Abstract

Finding previously unknown patterns that frequently occur on time series is a core task of mining time series. These patterns are known as time series motifs and are essential to associate events and meaningful occurrences within the time series. In this work we propose a method based on a trie data structure, that allows a fast and accurate time series motif discovery. From the experiments performed on synthetic and real data we can see that our TrieMotif approach is able to efficiently find motifs even when the size of the time series goes longer, being in average 3 times faster and requiring 10 times less memory than the state of the art approach. As a case study on real data, we also evaluated our method using time series extracted from remote sensing images regarding sugarcane crops. Our proposed method was able to find relevant patterns, as sugarcane cycles and other land covers inside the same area.

References

  1. Catalano, J., Armstrong, T., and Oates, T. (2006). Discovering Patterns in Real-Valued Time Series. In Fürnkranz, J., Scheffer, T., and Spiliopoulou, M., editors, Knowledge Discovery in Databases: PKDD 2006, volume 4213 of Lecture Notes in Computer Science, pages 462-469. Springer Berlin Heidelberg.
  2. Chiu, B., Keogh, E., and Lonardi, S. (2003). Probabilistic discovery of time series motifs. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7803, pages 493-498, New York, NY, USA. ACM.
  3. Faloutsos, C., Ranganathan, M., Manolopoulos, Y., and Manolopoulos, Y. (1994). Fast subsequence matching in time-series databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 419-429, Minneapolis, USA.
  4. Goldin, D. Q., Kanellakis, P. C., and Kanellakis, P. C. (1995). On similarity queries for time-series data: Constraint specification and implementation. In Proceedings of the 1st International Conference on Principles and Practice of Constraint Programming, pages 137-153, Cassis, France.
  5. Keogh, E., Chakrabarti, K., Pazzani, M., and Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3:263-286.
  6. Keogh, E. and Kasetty, S. (2003). On the need for time series data mining benchmarks: a survey and empirical demonstration. In Data Mining and Knowledge Discovery, volume 7, pages 349-371. Springer.
  7. Keogh, E., Lin, J., Lee, S.-H., and Herle, H. (2007). Finding the most unusual time series subsequence: algorithms and applications. Knowledge and Information Systems, 11(1):1-27.
  8. Li, Y. and Lin, J. (2010). Approximate variable-length time series motif discovery using grammar inference. In Proceedings of the Tenth International Workshop on Multimedia Data Mining, MDMKDD 7810, pages 10:1--10:9, New York, NY, USA. ACM.
  9. Li, Y., Lin, J., and Oates, T. (2012). Visualizing VariableLength Time Series Motifs. In SDM, pages 895-906. SIAM / Omnipress.
  10. Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003). A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, DMKD 7803, pages 2-11, New York, NY, USA. ACM.
  11. Lin, J., Keogh, E., Patel, P., and Lonardi, S. (2002). Finding motifs in time series. In In the 2nd Workshop on Temporal Data Mining, at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada. ACM.
  12. Lin, J., Keogh, E. J., Wei, L., and Lonardi, S. (2007). Experiencing sax: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15:107-144.
  13. Mohammad, Y. and Nishida, T. (2009). Constrained Motif Discovery in Time Series. New Generation Computing, 27(4):319-346.
  14. Rouse, J. W., Haas, R. H., Schell, J. A., and Deering, D. W. (1973). Monitoring vegetation systems in the great plains with ERTS. In Proceedings of the Third ERTS Symposium, pages 309-317, Washington, DC, USA.
  15. Udechukwu, A., Barker, K., and Alhajj, R. (2004). Discovering all frequent trends in time series. In Proceedings of the winter international synposium on Information and communication technologies, WISICT 7804, pages 1-6. Trinity College Dublin.
  16. Wang, L., Chng, E. S., and Li, H. (2010). A treeconstruction search approach for multivariate time series motifs discovery. Pattern Recognition Letters, 31(9):869-875.
  17. Yankov, D., Keogh, E., Medina, J., Chiu, B., and Zordan, V. (2007). Detecting time series motifs under uniform scaling. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7807, pages 844-853, New York, NY, USA. ACM.
Download


Paper Citation


in Harvard Style

Chino D., Gonçalves R., Romani L., Traina Jr. C. and Traina A. (2014). TrieMotif - A New and Efficient Method to Mine Frequent K-Motifs from Large Time Series . In Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-758-027-7, pages 60-69. DOI: 10.5220/0004891900600069


in Bibtex Style

@conference{iceis14,
author={Daniel Y. T. Chino and Renata R. V. Gonçalves and Luciana A. S. Romani and Caetano Traina Jr. and Agma J. M. Traina},
title={TrieMotif - A New and Efficient Method to Mine Frequent K-Motifs from Large Time Series},
booktitle={Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2014},
pages={60-69},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004891900600069},
isbn={978-989-758-027-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 16th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - TrieMotif - A New and Efficient Method to Mine Frequent K-Motifs from Large Time Series
SN - 978-989-758-027-7
AU - Chino D.
AU - Gonçalves R.
AU - Romani L.
AU - Traina Jr. C.
AU - Traina A.
PY - 2014
SP - 60
EP - 69
DO - 10.5220/0004891900600069