Making the Investigation of Huge Data Archives Possible in an Industrial Context - An Intuitive Way of Finding Non-typical Patterns in a Time Series Haystack

Yavor Todorov, Sebastian Feller, Roger Chevalier

2015

Abstract

Modern nuclear power plants are equipped with a vast variety of sensors and measurement devices. Vibrations, temperatures, pressures, flow rates are just the tip of the iceberg representing the huge database composed of the recorded measurements. However, only storing the data is of no value to the information-centric society and the real value lies in the ability to properly utilize the gathered data. In this paper, we propose a knowledge discovery process designed to identify non-typical or anomalous patterns in time series data. The foundations of all the data mining tasks employed in this discovery process are based on the construction of a proper definition of non-typical pattern. Building on this definition, the proposed approach develops and implements techniques for identifying, labelling and comparing the sub-sections of the time series data that are of interest for the study. Extensive evaluations on artificial data show the effectiveness and intuitiveness of the proposed knowledge discovery process.

References

  1. Goebel, M., Gruenwald, L., 1999. A survey of data mining and knowledge discovery software tools. In ACM SIGKDD Explorations Newsletter, pp. 20-33.
  2. McPherson, S., 2009. Tim Berners-Lee: inventor of the World Wide Web, USA Today Lifeline Biographies.
  3. Esling, P., Agon, C., 2012. Time-series data mining. In ACM Computing Surveys, pp. 12:1-12:34.
  4. Gama, J., 2010. Knowledge discovery from data streams, CRC Press.
  5. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., 1996. From data mining to knowledge discovery in databases. In AI Magazine, pp. 37-54.
  6. Kurgan, L., Musilek, P., 2006. A survey of knowledge discovery and data mining process models. In The Knowledge Engineering Review, pp. 1-24.
  7. Maimon, O., Rokach, L., 2010. Data mining and knowledge discovery handbook, Springer, 2nd edition.
  8. Lin, J., Keogh, E., Lonardi S., Patel, P., 2002. Finding motifs in time series. In The 8th ACM International Conference on Knowledge Discovery and Data Mining, pp. 53-68.
  9. Fu, T., Chung, F., Luk, R., Ng, V., 2005. Preventing meaningless stock time series pattern discovery by changing perceptually important point detection. In Fuzzy Systems and Knowledge Discovery, 2nd International Conference, pp. 1171-1174.
  10. Keogh, E., Lonardi, S., Chiu, B., 2002. Finding surprising patterns in a time series database in linear time and spac. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 550-556.
  11. Lin, J., 2005. Discovering unusual and non-trivial patterns in massive time series databases, University of California, Riverside.
  12. Lin, J., Keogh, E., Lonardi, S., Chiu, B., 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2-11.
  13. Yi, B., Faloutsos, C., 2000. Fast time sequence indexing for arbitrary Lp norms. In Proceedings of the 26th International Conference on Very Large Data Bases, pp. 385-394.
  14. Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S., 2001. Dimensionality reduction for fast similarity search in large time series databases. In Knowledge and Information Systems, pp. 263-286.
  15. Keogh, E., Kasetty, S., 2002. On the need for time series data mining benchmarks: a survey and empirical demonstration. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 102-111.
  16. Minnen, D., Isbell, C., Essa, I., Starner, T., 2007. Detecting subdimensional motifs: an efficient algorithm for generalized multivariate pattern discovery. In 7th IEEE International Conference on Data Mining, pp. 601-606.
  17. Keogh, E., Pazzani, M., 1998. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 239-241.
  18. Feller, S., Todorov, Y., Pauli, D., Beck, F., 2011. Optimized strategies for archiving multidimensional process data: building a fault-diagnosis database. In ICINCO, pp. 388-393.
  19. Chung, F., Fu, T., Luk, R., Ng, V., 2001. Flexible time series pattern matching based on perceptually important points. In International Joint Conference on Artificial Intelligence Workshop on Learning from Temporal and Spatial Data, pp. 1-7.
  20. Feller, S., 2013. Nichtparametrische Regressionsverfahren zur Zustandsüberwachung, Zustandsdiagnose und Bestimmung ener optimalen Strategie zur Steuerung am Beispiel einer Gasturbine und einer Reaktorkühlmittelpumpe, Institu für Theoretische Physik de Universität Stuttgart.
  21. Pauli, D., Feller, S., Rupp, B., Timm, I., 2013. Using Chernoff's bounding method for high-performance structural break detection and forecast error reduction. In Informatics in Control, Automation and Robotics, pp. 129-148.
  22. Takeda, K., Hattori, T., Izumi, T., Kawano, H., 2010. Extended SPRT for structural change detection of time series based on a multiple regression model. In Artificial Life and Robotics, pp. 417-420.
  23. Kihara, S., Morikawa, N., Shimizu, Y., Hattori, T., 2011. An improved method of sequential probability ratio test for change point detection in time series. In International Conference on Biometrics and Kansei Engineering (ICBAKE), pp. 43-48.
  24. Chow, G., 1960. Tests of equality between sets of coefficients in two linear regressions. In Econometrica, pp. 591-605.
  25. Wald, A., 1945. Sequential tests of statistical hypotheses. In The Annals of Mathematical Statistics, pp. 117-186.
  26. Man, P., Wong, M., 2001. Efficient and robust feature extraction and pattern matching of time series by a lattice structure. In Proceedings of the 10th International Conference on Information and Knowledge Management, pp. 271-278.
  27. Fu, T., 2011. A review on time series data mining. In Engineering Applications of Artificial Intelligence, pp. 164-181.
  28. Chakrabarti, K., Keogh, E., Mehrotra, S., Pazzani, M., 2002. Locally adaptive dimensionality reduction for indexing large time series databases. In ACM Transactions on Database Systems (TODS), pp. 188- 228.
  29. Pavlidis, T., Horowitz, S., 1974. Segmentation of plane curves. In IEE Transactions on Computers, pp. 860- 870.
  30. Agrawal, R., Faloutsos, C., Swami, A., 1993. Efficient similarity search in sequence databases. In Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms, pp. 69-84.
  31. Bronshtein, I., Semendyayev, K., Musiol, G., Mühlig, H., 2004. Handbook of mathematics, Springer, 4th edition.
  32. Press, W., Teukolsky, S., Vetterling, W., Flannery, B., 2007. Numerical recipes, Cambridge University Press.
  33. Fu, T., 2001. Time series pattern matching, discovery & segmentation for numeric-to-symbolic conversion, The Hong Kong Polytechnic University.
  34. Fu, T., Chung, F., Ng, V., Luk, R., 2001. Pattern discovery from stock time series using self-organizing maps. In KDD Workshop on Temporal Data Mining, pp. 26-29.
  35. Zhang, Z., Jiamg, J., Liu, X., Lau, R., Wang, H., Zhang, R., 2010. A real time hybrid pattern matching scheme for stock time series. In Proceeding of the 21st Australasian Conference on Database Technologies, pp. 161-170.
  36. Kohonen, T., 2001. Self-organizing maps, Springer, 3rd Edition.
  37. Vesanto, J., Alhoniemi, E, 2000. Clustering of the selforganizing map. In IEEE Transactions on Neural Networks, pp. 586-600.
  38. Arbelaitz, O., Gurrutxaga, I, Muguerza, J., Perez, J., Perona, I., 2013. An extensive comparative study of cluster validity indices. In Pattern Recognition, pp. 243-256.
Download


Paper Citation


in Harvard Style

Todorov Y., Feller S. and Chevalier R. (2015). Making the Investigation of Huge Data Archives Possible in an Industrial Context - An Intuitive Way of Finding Non-typical Patterns in a Time Series Haystack . In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-758-122-9, pages 569-581. DOI: 10.5220/0005542105690581


in Bibtex Style

@conference{icinco15,
author={Yavor Todorov and Sebastian Feller and Roger Chevalier},
title={Making the Investigation of Huge Data Archives Possible in an Industrial Context - An Intuitive Way of Finding Non-typical Patterns in a Time Series Haystack},
booktitle={Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2015},
pages={569-581},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005542105690581},
isbn={978-989-758-122-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - Making the Investigation of Huge Data Archives Possible in an Industrial Context - An Intuitive Way of Finding Non-typical Patterns in a Time Series Haystack
SN - 978-989-758-122-9
AU - Todorov Y.
AU - Feller S.
AU - Chevalier R.
PY - 2015
SP - 569
EP - 581
DO - 10.5220/0005542105690581