Detecting Data Stream Dependencies on High Dimensional Data

Jonathan Boidol, Andreas Hapfelmeier


Intelligent production in smart factories or wearable devices that measure our activities produce on an ever growing amount of sensor data. In these environments, the validation of measurements to distinguish sensor flukes from significant events is of particular importance. We developed an algorithm that detects dependencies between sensor readings. These can be used for instance to verify or analyze large scale measurements. An entropy based approach allows us to detect dependencies beyond linear correlation and is well suited to deal with high dimensional and high volume data streams. Results show statistically significant improvements in reliability and on-par execution time over other stream monitoring systems.


  1. Bernhard, H.-P., Darbellay, G., et al. (1999). Performance analysis of the mutual information function for nonlinear and linear signal processing. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on, volume 3, pages 1297-1300. IEEE.
  2. Bodik, P., Hong, W., Guestrin, C., Madden, S., Paskin, M., and Thibaux, R. (2004). Intel lab data.
  3. Cover, T. M. (1991). Ja thomas elements of information theory.
  4. Darbellay, G. A. (1999). An estimator of the mutual information based on a criterion for conditional independence. Computational Statistics & Data Analysis, 32(1):1-17.
  5. Daub, C. O., Steuer, R., Selbig, J., and Kloska, S. (2004). Estimating mutual information using bspline functions-an improved similarity measure for analysing gene expression data. BMC bioinformatics, 5(1):118.
  6. Dionisio, A., Menezes, R., and Mendes, D. A. (2004). Mutual information: a measure of dependency for nonlinear time series. Physica A: Statistical Mechanics and its Applications, 344(1):326-329.
  7. Fernandez, D. A., Grau-Carles, P., and Mangas, L. E. (2002). Nonlinearities in the exchange rates returns and volatility. Physica A: Statistical Mechanics and its Applications, 316(1):469-482.
  8. Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189-1232.
  9. Granger, C. and Lin, J.-L. (1994). Using the mutual information coefficient to identify lags in nonlinear models. Journal of time series analysis, 15(4):371-384.
  10. Hall, P. and Morton, S. C. (1993). On the estimation of entropy. Annals of the Institute of Statistical Mathematics, 45(1):69-88.
  11. Han, M., Ren, W., and Liu, X. (2015). Joint mutual information-based input variable selection for multivariate time series modeling. Engineering Applications of Artificial Intelligence , 37:250-257.
  12. Kaluz?a, B., Mirchevska, V., Dovgan, E., Lus?trek, M., and Gams, M. (2010). An agent-based approach to care in independent living. In Ambient intelligence, pages 177-186. Springer.
  13. Kraskov, A., St ögbauer, H., and Grassberger, P. (2008). Estimating mutual information. Physical review E, 69(6):066138.
  14. Paninski, L. (2003). Estimation of entropy and mutual information. Neural computation, 15(6):1191-1253.
  15. Seliniotaki, A., Tzagkarakis, G., Christofides, V., and Tsakalides, P. (2014). Stream correlation monitoring for uncertainty-aware data processing systems. In Information, Intelligence, Systems and Applications, IISA 2014, The 5th International Conference on, pages 342-347. IEEE.
  16. Siemens AG (2015). Gas turbine data. .
  17. Sorjamaa, A., Hao, J., and Lendasse, A. (2005). Mutual information and k-nearest neighbors approximator for time series prediction. Artificial Neural Networks: Formal Models and Their Applications-ICANN 2005 , pages 752-752.
  18. The NASDAQ Stock Market (2015). Nasdaq daily quotes.
  19. Zhu, Y. and Shasha, D. (2002). Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th international conference on Very Large Data Bases, pages 358-369. VLDB Endowment.

Paper Citation

in Harvard Style

Boidol J. and Hapfelmeier A. (2016). Detecting Data Stream Dependencies on High Dimensional Data . In Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD, ISBN 978-989-758-183-0, pages 383-390. DOI: 10.5220/0005953303830390

in Bibtex Style

author={Jonathan Boidol and Andreas Hapfelmeier},
title={Detecting Data Stream Dependencies on High Dimensional Data},
booktitle={Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,},

in EndNote Style

JO - Proceedings of the International Conference on Internet of Things and Big Data - Volume 1: IoTBD,
TI - Detecting Data Stream Dependencies on High Dimensional Data
SN - 978-989-758-183-0
AU - Boidol J.
AU - Hapfelmeier A.
PY - 2016
SP - 383
EP - 390
DO - 10.5220/0005953303830390