Authors: Bas de Bruijn 1 ; Tuan Anh Nguyen 1 ; Doina Bucur 1 and Kenji Tei 2

Affiliations: 1 University of Groningen, Netherlands ; 2 National Institute of Informatics, Japan

ISBN: 978-989-758-169-4

Keyword(s): Benchmark Dataset, Fault Tolerance, Data Quality, Sensor Data, Sensor Data Labelling.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Biomedical Engineering ; Biomedical Signal Processing ; Data Manipulation ; Data Quality and Integrity ; Fault Tolerance and Diagnosis ; Health Engineering and Technology Applications ; Human-Computer Interaction ; Methodologies and Methods ; Multi-Sensor Data Processing ; Neurocomputing ; Neurotechnology, Electronics and Informatics ; Obstacles ; Pattern Recognition ; Physiological Computing Systems ; Sensor Networks ; Soft Computing

Abstract: Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data. In order to evaluate the performance of such solutions, however, the field lacks a set of \emph{benchmark sensor datasets}. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the \emph{ground truth}, i.e., whether or not the data point is accurate, and, if faulty, the type of fault. We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: a dataset of 280 temperature and light subsets of data from 10 indoor \emph{Intel Lab} sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 \emph{Smart Santander} sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets. (More)

PDF ImageFull Text

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
de Bruijn B., Nguyen T., Bucur D. and Tei K. (2016). Benchmark Datasets for Fault Detection and Classification in Sensor Data.In Proceedings of the 5th International Confererence on Sensor Networks - Volume 1: SENSORNETS, ISBN 978-989-758-169-4, pages 185-195. DOI: 10.5220/0005637901850195

author={Bas de Bruijn and Tuan Anh Nguyen and Doina Bucur and Kenji Tei},
title={Benchmark Datasets for Fault Detection and Classification in Sensor Data},
booktitle={Proceedings of the 5th International Confererence on Sensor Networks - Volume 1: SENSORNETS,},


JO - Proceedings of the 5th International Confererence on Sensor Networks - Volume 1: SENSORNETS,
TI - Benchmark Datasets for Fault Detection and Classification in Sensor Data
SN - 978-989-758-169-4
AU - de Bruijn B.
AU - Nguyen T.
AU - Bucur D.
AU - Tei K.
PY - 2016
SP - 185
EP - 195
DO - 10.5220/0005637901850195

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.