Authors:
Sara Santos
;
Phillip Probst
;
Luís Silva
and
Hugo Gamboa
Affiliation:
LIBPhys (Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics), NOVA School of Science and Technology, NOVA University of Lisbon, Caparica, Portugal
Keyword(s):
Unsupervised Learning, Human Activity Recognition, Data Imbalance, Occupational Health.
Abstract:
Office workers spend most of their time sitting, often with rigid postures, for prolonged periods of time. This has been recognized by the European Union as a risk factor for work-related musculoskeletal disorders. To study work activities and their distribution over time, Human Activity Recognition (HAR) techniques need to be implemented. Since supervised learning techniques require labeled data and large datasets for training, unsupervised learning is a viable alternative for HAR. However, these models may be affected by the highly imbalanced distribution of activities typically observed in office workers. Considering this, this work studied the impact of data imbalance on clustering performance when the dataset is comprised of 33 %, 50 %, 70 %, and 90 % of sitting activity. Office activities were collected from 19 subjects and three traditional clustering models were employed. KMeans and Gaussian Mixture Model were more affected than Agglomerative Clustering, which seems to be mor
e robust to data imbalance. With 90 % of sitting time, all three models performed poorly, which emphasizes the need for clustering models that can handle highly imbalanced data.
(More)