Benchmark Datasets for Fault Detection and Classification in Sensor Data

Bas de Bruijn, Tuan Anh Nguyen, Doina Bucur, Kenji Tei


Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of \emph{benchmark sensor datasets}. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the \emph{ground truth}, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: a dataset of 280 temperature and light subsets of data from 10 indoor \emph{Intel Lab} sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 \emph{Smart Santander} sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.


  1. Baljak, V., Tei, K., and Honiden, S. (2013). Fault classification and model learning from sensory readings - framework for fault tolerance in wireless sensor networks. In Intelligent Sensors, Sensor Networks and Information Processing, 2013 IEEE Eighth International Conference on, pages 408-413.
  2. Box, G. E., Jenkins, G. M., and Reinsel, G. C. (2013). Time Series Analysis: Forecasting and Control.
  3. Gaillard, F., Autret, E., Thierry, V., Galaup, P., Coatanoan, C., and Loubrieu, T. (2009). Quality control of large argo datasets. Journal of Atmospheric and Oceanic Technology, 26.
  4. Hamdan, D., Aktouf, O., Parissis, I., El Hassan, B., and Hijazi, A. (2012). Online data fault detection for wireless sensor networks - case study. In Wireless Communications in Unusual and Confined Areas (ICWCUCA), 2012 International Conference on, pages 1-6.
  5. IntelLab (2015). The Intel Lab at Berkeley dataset.
  6. Li, R., Liu, K., He, Y., and Zhao, J. (2011). Does feature matter: Anomaly detection in sensor networks. In Proceedings of the 6th International Conference on Body Area Networks, BodyNets 7811, pages 85-91, ICST, Brussels, Belgium, Belgium. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).
  7. LifeUnderYourFeet (2015). The Life Under Your Feet dataset.
  8. Mainwaring, A., Culler, D., Polastre, J., Szewczyk, R., and Anderson, J. (2002). Wireless sensor networks for habitat monitoring. In Proceedings of the 1st ACM International Workshop on Wireless Sensor Networks and Applications, WSNA 7802, pages 88-97, New York, NY, USA. ACM.
  9. Nguyen, T. A., Bucur, D., Aiello, M., and Tei, K. (2013). Applying time series analysis and neighbourhood voting in a decentralised approach for fault detection and classification in WSNs. In SoICT'13, pages 234-241.
  10. Ni, K., Ramanathan, N., Chehade, M. N. H., Balzano, L., Nair, S., Zahedi, S., Kohler, E., Pottie, G., Hansen, M., and Srivastava, M. (2009). Sensor network data fault types. ACM Trans. Sen. Netw., 5(3):25:1-25:29.
  11. Ren, W., Xu, L., and Deng, Z. (2008). Fault diagnosis model of WSN based on rough set and neural network ensemble. In Intelligent Information Technology Application, 2008. IITA 7808. Second International Symposium on, volume 3, pages 540-543.
  12. SensorScope (2015). The SensorScope dataset.
  13. Sharma, A. B., Golubchik, L., and Govindan, R. (2010). Sensor faults: Detection methods and prevalence in real-world datasets. ACM Transactions on Sensor Networks (TOSN), 6(3):23.
  14. Shi, L., Liao, Q., He, Y., Li, R., Striegel, A., and Su, Z. (2011). Save: Sensor anomaly visualization engine. In Visual Analytics Science and Technology (VAST), 2011 IEEE Conference on, pages 201-210.
  15. SmartSantander (2015). Smart Santander.
  16. Warriach, E., Aiello, M., and Tei, K. (2012). A machine learning approach for identifying and classifying faults in wireless sensor network. In Computational Science and Engineering (CSE), 2012 IEEE 15th International Conference on, pages 618-625.
  17. Yao, Y., Sharma, A., Golubchik, L., and Govindan, R. (2010). Online anomaly detection for sensor systems: A simple and efficient approach. Performance Evaluation, 67(11):1059-1075.
  18. Zhang, Y. (2010). Observing the Unobservable - Distributed Online Outlier Detection in Wireless Sensor Networks. PhD thesis, University of Twente.
  19. Zhang, Y., Meratnia, N., and Havinga, P. (2010). Outlier detection techniques for wireless sensor networks: A survey. Communications Surveys Tutorials, IEEE, 12(2):159-170.

Paper Citation

in Harvard Style

de Bruijn B., Nguyen T., Bucur D. and Tei K. (2016). Benchmark Datasets for Fault Detection and Classification in Sensor Data . In Proceedings of the 5th International Confererence on Sensor Networks - Volume 1: SENSORNETS, ISBN 978-989-758-169-4, pages 185-195. DOI: 10.5220/0005637901850195

in Bibtex Style

author={Bas de Bruijn and Tuan Anh Nguyen and Doina Bucur and Kenji Tei},
title={Benchmark Datasets for Fault Detection and Classification in Sensor Data},
booktitle={Proceedings of the 5th International Confererence on Sensor Networks - Volume 1: SENSORNETS,},

in EndNote Style

JO - Proceedings of the 5th International Confererence on Sensor Networks - Volume 1: SENSORNETS,
TI - Benchmark Datasets for Fault Detection and Classification in Sensor Data
SN - 978-989-758-169-4
AU - de Bruijn B.
AU - Nguyen T.
AU - Bucur D.
AU - Tei K.
PY - 2016
SP - 185
EP - 195
DO - 10.5220/0005637901850195