Daily Activity Recognition based on Meta-classiﬁcation of Low-level

Audio Events

Theodoros Giannakopoulos and Stasinos Konstantopoulos

Institute of Informatics and Telecommunication, National Center for Scientiﬁc Research ‘Demokritos’, Athens, Greece

Keywords:

Audio Event Recognition, Raspberry PI, Audio Sensors, Activity Recognition, Fusion.

Abstract:

This paper presents a method for recognizing activities taking place in a home environment. Audio is recorded

and analysed realtime, with all computation taking place on a low-cost Raspberry PI. In this way, data acquisi-

tion, low-level signal feature calculation, and low-level event extraction is performed without transferring any

raw data out of the device. This ﬁrst-level analysis produces a time-series of low-level audio events and their

characteristics: the event type (e.g., ‘music’) and acoustic features that are relevant to further processing, such

as energy that is indicative of how loud the event was. This output is used by a meta-classiﬁer that extracts

long-term features from multiple events and recognizes higher-level activities. The paper also presents exper-

imental results on recognizing kitchen and living-room activities of daily living that are relevant to assistive

living and remote health monitoring for the elderly. Evaluation on this dataset has shown that our approach

discriminates between six activities with an accuracy of more than 90%, that our two-level classiﬁcation ap-

proach outperforms one-level classiﬁcation, and that including low-level acoustic features (such as energy) in

the input of the meta-classiﬁer signiﬁcantly boosts performance.

1 INTRODUCTION

Home automation is quickly becoming a widely avail-

able commodity, with an increasing variety of sensing

and actuation hardware available in the market. This

is an important enabler for telehealth and assisted liv-

ing environments, transforming in marketable prod-

ucts the considerable volume of research on remotely

collecting medically relevant information. Advance-

ments in artiﬁcial intelligence and intelligent moni-

toring have been explored as a way to prolong inde-

pendent living at home (Barger et al., 2005; Hagler

et al., 2010; Mann et al., 2002). Several methods have

been used to detect activities of daily living in real

home environments, focusing on elderly population

(Vacher et al., 2013; Vacher et al., 2010; Costa et al.,

2009; Botia et al., 2012; Chernbumroong et al., 2013;

Stikic et al., 2008; Giannakopoulos et al., 2017). In

addition, special focus has been given in fall detection

(El-Bendary et al., 2013) and physiological monitor-

ing (Li et al., 2014; Petridis et al., 2015b; Petridis

et al., 2015a).

These opportunities for prolonging independent

living at home are needed to sustain our aging pop-

ulation, but their application also raises concerns re-

garding the acquisition and processing of the data col-

lected. Commercial solution always deploy very sim-

ple units at the customers’ location that record and

transfer raw content to cloud services operated by the

solution manufacturer. This is ethically questionable

even in cases of providing non-health related automa-

tion services, but becomes even more so in the face

of medical problems and the collection of data that

pertains to such problems.

Additionally there is an obvious issue with user

acceptance if raw audio or visual data is sent to the

cloud. In this work, we demonstrate a framework that

uses audio information in order to extract low-level

events and then recognizes a series of higher-level

activities usually performed in the context of a real

home environment. Towards this end, we have inte-

grated a Raspberry PI device that records and analy-

ses audio in real time. The result of this procedure is

a sequence of low-level audio events, i.e., a sequence

of labels that characterize short segments of the audio

signal. Naturally, the selection of labels depends on

the application at hand. For the application and use

cases presented here, we have labels for the sounds

typically occurring in a home environment. At a sec-

ond stage, a meta-classiﬁer maps the low-level events

to high-level and long-term activities (e.g. watching

TV, cleaning up the kitchen, etc.) The conceptual ar-

220

Giannakopoulos, T. and Konstantopoulos, S.

Daily Activity Recognition based on Meta-classiﬁcation of Low-level Audio Events.

DOI: 10.5220/0006372502200227

In Proceedings of the 3rd International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE 2017), pages 220-227

ISBN: 978-989-758-251-6

Figure 1: Conceptual architecture of the proposed scheme.

chitecture of this rationale is illustrated in Figure 1.

The contribution of the work described here is the fol-

lowing:

• a state-of-the-art sound analyser is integrated on a

low-cost sensor that functions both as data acqui-

sition and pattern analysis module

• experimentation on a real-world dataset from a

join living-room and kitchen space has proven that

the proposed metaclassiﬁer is very accurate (more

than 90% on 6 basic activities), and more accurate

than directly recognizing the high-level activities

from the raw audio features

2 LOW-LEVEL AUDIO EVENT

RECOGNITION

Audio-based low-level event recognition is achieved

through a Raspberry PI device that performs both au-

dio acquisition and real-time signal recognition. In

particular, the Raspberry PI captures audio samples

from the attached microphone device and executes a

set of real-time feature extraction and classiﬁcation

procedures, to provide online audio event recogni-

tion. We have presented more details of this architec-

ture in the context of a practical workﬂow that helps

the technicians setup the device through a fast, user-

friendly and robust tuning and calibration procedure

(Siantikos et al., 2016). In this way, ‘quotestraining

of the device is performed without any need for prior

knowledge of machine learning techniques. In partic-

ular, in order to train the classiﬁer that discriminates

between low-level audio events, we developed Graph-

ical User Interface (GUI) for Android. This app acts

as a ‘remote control’ for the Raspberry Pi device and

through that, the user can: (a) Record and annotate

sounds: the user notiﬁes the Raspberry Pi device that

Figure 2: MQTT-based calibration procedure.

a particular audio event (e.g., music playing) is cur-

rently taking place, so that the Raspberry Pi can col-

lect recordings of each class (b) Train the audio clas-

siﬁer: The user triggers the training procedure, which

results in a probabilistic Support Vector Machine clas-

siﬁer (SVM) of audio segments (Platt, 1999).

Communication between the mobile app and the

Raspberry PI device is achieved through MQTTm

which is is a lightweight messaging protocol. In our

use case, it is used both for sending commands to the

Raspberry PI (for example to start/stop recording a

sample of a given class) and for remotely receiving

the processing results.

An example of screenshots of such a training pro-

cedure is shown in Figure 3. Here, the technician suc-

cessively records sounds from all 6 classes (Figure 3

a–d). As more recordings are added, the total dura-

tion dynamically changes in the respective GUI area.

When the data are sufﬁcient (typically at least 20 sec-

onds are recorded from each class), the user triggers

the classiﬁcation training by selecting the appropri-

ate audio classes and pressing ‘Create Classiﬁer’ (Fig-

ure 3e). Then, the SVM tuning procedure is executed

on the Raspberry PI device. As soon as this step is

executed, the confusion matrix of the internal cross-

validation procedure is returned (Figure 3f) and the

Raspbery PI device is now equipped with the audio

Daily Activity Recognition based on Meta-classiﬁcation of Low-level Audio Events

221

(a) (b) (c)

(d) (e)

(f)

Figure 3: Screenshots of the training (calibration) Graphical User Interface.

classiﬁer, and ready to function as an audio event de-

tector. When the audio segment classiﬁer is trained,

the Raspberry PI device is ready to record sounds

from the microphone and (in an online mode) rec-

ognize low-level audio events. Figure 4 shows three

screenshots of the GUI when the Raspberry PI device

in on the ‘testing mode’: ﬁrst the user selects a pre-

trained classiﬁer (in our case the classiﬁer trained as

shown in Figure 3) and then the Raspberry PI device

starts recording and classifying audio segments. The

results of this classiﬁcation procedure are pushed to

the mobile GUI interface (Figure 4c).

Audio feature extraction and classiﬁcation has

been achieved using the open-source pyAudioAnal-

ysis library (Giannakopoulos, 2015), that implements

a wide range of audio analysis functionalities. The

complete set of features and more details on the clas-

siﬁcation procedure is presented in (Siantikos et al.,

2016), where this architecture has been adopted for

recognizing low-level audio events in the context of

a bathroom activity monitoring scenario. Finally, we

have selected to adopt a 1 second of signal length for

each classiﬁcation decision. In other words, in the

runtime mode the Raspberry PI device broadcasts a

classiﬁcation decision every one second.

3 HUMAN ACTIVITY

RECOGNITION

The procedure described in Section 2 trains a classi-

ﬁer integrated on the Raspberry PI device, using an

Android mobile device as a ‘remote controller’ for

making the whole process faster and easier. As soon

as this process is completed, the Raspberry PI de-

vice can record audio streams from the microphone

and automatically classify audio segments of 1 sec-

ICT4AWE 2017 - 3rd International Conference on Information and Communication Technologies for Ageing Well and e-Health

222

Figure 4: Screenshots of the testing Graphical User Interface.

ond length to any of the 6 audio classes. The goal of

this work is to extract higher-level labels regarding ev-

eryday human activities that typically occur the living

room and kitchen area. In the context of this work,

we have deﬁned the following higher-level events:

(1) no activity (NO), (2) talking (TA), (3) watching

TV (TV), (4) listening to music (MU), (5) cleaning

up the kitchen (KI) and (6) other activity (OT). Our

task here is to infer these activities in a long-term

rate (e.g. every one minute) using the short-term,

low-level events. Towards this end, we have adopted

a simple meta-classiﬁcation procedure according to

which low-level audio segment decisions are used to

extract feature statistics that are fed as input to a su-

pervised model, namely a Support Vector Machine

meta-classiﬁer. In particular, the adopted features are

the following:

• F

∑

j=0

1 : λ( j) = i, where i = 1, ..6,, λ( j) is

the class label of the j-th audio segment and N is

the total number of audio segments in a session

(recording). These ﬁrst 6 features are actually the

distribution of short-term, low-level audio event

labels as extracted by the audio classiﬁer

• F

is the average number of transitions from si-

lence to any of the non-silent audio classes per

second

• F

to F

are the mean, max, min and std statistics

of the normalized segment energies.

This 11-dimensional feature vector is used to clas-

sify the whole session (recording) in terms of its re-

spective activity. As shown in the experiments section

of the paper, the simple low-level energy statistics add

to the performance of the low-level event statistics

(i.e. the ﬁrst 6 high-level features). The classiﬁcation

methods adopted, in order to classify the high-level

features to any of the activity classes are the follow-

ing:

• k-Nearest Neighbor classiﬁer

• Probabilistic SVMs (Platt, 1999)

• Random forests is an ensemble learning method

used for classiﬁcation and regression that uses a

multiude of decision trues (Ho, 1995), (Pal, 2005)

• Gradient boosting: an ensembling approach inter-

preted as an optimization task, widely used both

in classiﬁcation and regression tasks (Breiman,

1997; Friedman, 2002)

• Extremely Randomized Trees (or Extra trees) is a

modern classiﬁcation tree-ensemble approach that

is based on strong randomization of both attribute

and cut-point choice while splitting the tree node

(Geurts et al., 2006), and has been adopted in vari-

ous computer vision and data mining applications.

It has to be noted that this meta-classiﬁcation stage

can be either executed as a cloud or a home-PC ser-

vice, or even as an independent submodule in the

Raspberry PI device. In any case, the core concept of

the proposed approach is that the joint audio feature

extraction-audio event recognition module broadcasts

only low-level events and no raw information, in or-

der to be be complied with the privacy requirements

of a health monitoring or assisted living application.

Furthermore, in order to compare the proposed ap-

proach with the straightforward classiﬁcation of high-

level activities from low audio features, the pyAudio-

Analysis library has also been used to train and eval-

uate an SVM classiﬁer that learns to discriminate be-

tween high-level activities based on the low-level time

sequences of short-term features. As shown in the ex-

periments section, this approach achieves much lower

performance rates compared to the proposed meta-

classiﬁcation module.

Daily Activity Recognition based on Meta-classiﬁcation of Low-level Audio Events

223

4 EXPERIMENTS

4.1 Datasets

In order to train and evaluate the proposed methodol-

ogy, three separate datasets have been compiled and

manually annotated. The ﬁrst two of the datasets are

associated with the traing and the evaluation stage

of the audio segment classiﬁcation submodule (i.e.

the low-level audio event classiﬁers), while the third

dataset is used to train and evaluate the meta-classiﬁer

(using cross-validation). In mode detail, the following

datasets have been adopted:

Audio segment classiﬁcation dataset (Training):

This dataset consists of one uninterrupted recording

for each class, recorded through the mobile calibra-

tion GUI (Section 2). The total duration of each

recording is 200 seconds, giving 20 minutes in total

for all six classes. This is the total time required to

‘calibrate’ the audio recognition model, plus the time

required for the SVM model to be trained.

Audio segment classiﬁcation dataset (Evaluation).

This dataset consists of 600 audio segments (100

from each audio class). Each audio segment is 1-sec

long, since this is the selected decision window dura-

tion. This dataset is used to evaluate the audio clas-

siﬁcation module. Obviously the segments of these

datasets are totally independent to the segments of the

previous dataset to avoid bias in evaluation.

High-level event dataset This dataset is used to

evaluate the proposed high-level activity recognition

method. It consists of 20 long-term sessions for each

activity class (120 sessions in total). Each long-

term session corresponds to a recording of 20 sec-

onds to 2 minutes that has led to a series of short-

term, low-level audio events. This dataset is openly

provided.

Each recording-session corresponds to an-

other ﬁle. All ﬁles follow the JSON format, with the

following ﬁelds: (a) winner class probability (b) (nor-

malized) signal energy (c) timestamp and (d) win-

ner class. Note that these four ﬁelds are provided

for each 1-second audio segment of the recording.

Therefore, each ﬁle has several rows that correspond

to several audio segments of the same recording-

session. Training and classiﬁcation performance eval-

uation has been achieved using repeated random sub-

sampling using this dataset. In particular, in each sub-

sampling 10% of the data is used for testing.

Available at https://zenodo.org/record/376480

4.2 Experimental Results

4.2.1 Low-level Audio Event Recognition

Evaluation

Table 1 presents the confusion matrix, along with the

respective class recall, class precision and class F1

values for the audio segment classiﬁcation task. The

second dataset described in Section 4.1 has been used

to this end. The overall F1 measure was found to be

equal to 83.5%. This actually means that more than 8

out of 10 audio segments (1 second long) are correctly

classiﬁed to any of the 6 audio classes, on average.

4.2.2 High-level Activity Recognition

First, we present the F1 measures for all meta-

classiﬁers applied on (a) meta-features 1 to 6 (b)

meta-features 1 to 7 and (c) all meta features. The

best performance is achieved for the Support Vector

Machine classiﬁer, when all features are used. In

addition, the 7th meta-feature (the average, per sec-

ond, transitions from silence to other audio classes)

adds 2%. Furthermore, if the energy-related features

are also combined, the overall performance boosting

reaches 4%. Note that for the SVM classiﬁer, a linear

kernel has been adopted. Probably this is the reason

that the SVM classiﬁer overperforms the more sophis-

ticated ensemble-based classiﬁers, since it is proven

to be more robust to overﬁtting due to low training

sample size. Second, Table 3 presents the detailed

evaluation metrics for the best high-level event recog-

nition method, i.e. the SVM classiﬁer for all adopted

high-level features.

According to the results above, the proposed

methodology achieves a very high recognition ac-

curacy for the six activities, based on simple low-

level audio decisions and signal energy statistics, es-

pecially when the SVM classiﬁer is used. The high-

est confusion is achieved between ‘watching TV’

and ‘talking’–‘music’, as well as between ‘kitchen

cleanup’ and ‘other activity’. As explained in the next

section, our ongoing work focuses on the expansion

of some of the high-level classes to more detailed rep-

resentations: for example, the music-related activities

could be expanded to ‘listening to music’ and ‘danc-

ing’ and ‘other activities’ may be further split to ‘eat-

ing’, ‘drinking’, ‘exercising’ and ‘cooking’. However

such information requires both more detailed low-

level audio representations and other modalities, e.g.

visual and motion information, stemming from cam-

eras and accelerometers respectively. An example of

accelerometer information and an explanation of how

it could be used for fusion with audio sensing is pre-

sented in the last section.

ICT4AWE 2017 - 3rd International Conference on Information and Communication Technologies for Ageing Well and e-Health

224

Table 1: Audio segment classiﬁcation results: Row-wise normalized confusion matrix, recall precision and F1 measures.

Overall F1 measure: 83.5%.

Confusion Matrix (%)

Predicted

True ⇓ activity boiler music silence speech waterbasin

activity 79.3 0.0 0.0 5.2 6.9 8.6

boiler 32.4 51.4 2.7 13.5 0.0 0.0

music 0.0 0.0 88.0 0.0 12.0 0.0

silence 0.0 0.0 0.0 100.0 0.0 0.0

speech 0.0 0.0 12.9 0.0 87.1 0.0

waterwasin 0.0 0.0 0.0 0.0 0.0 100.0

Performance Measurements (%, per class)

Recall: 79.3 51.4 88.0 100.0 87.1 100.0

Precision: 71.0 100.0 85.0 84.3 82.2 92.1

F1: 74.9 67.9 86.5 91.5 84.6 95.9

Table 2: Comparison between classiﬁcation methods and

feature sets for the high-level activity recognition task. Per-

formance is quantiﬁed based on the F1 measure.

Features

Method 1-6 1-7 1-11 (all)

kNN 84 88 84

SVM 88 90 92

Random Forests 89 89 90

Extra Trees 88 89 88

Gradient Boosting 88 88 89

Table 3: High-level event recognition based on all meta-

features for the best classiﬁer (SVM) Overall F1 measure:

92%.

Confusion Matrix (%)

Predicted

True ⇓ NO TA TV MU KI OT

NO 100 0 0 0 0 0

TA 0 96 4 0 0 0

TV 7 11 71 5 0 5

MU 0 5 5 89 0 0

KI 0 1 0 0 95 4

OT 0.5 0 0 0 0.5 99

Performance Measurements (%, per class)

Recall: 100 96 71 89 95 99

Precision: 93 85 89 95 99 91

F1: 96 90 79 92 97 95

Finally, for comparison purposes, we have eval-

uated the ability of a one-level audio classiﬁer that

directly maps mid-term audio feature statistics to

high-level activities. Using the same dataset, the

F1 measure for this method was signiﬁcantly lower

(80%). Additionally, such an approach would require

more training data, since it needs to much a high-

dimensional audio feature space to semantically-high

activity classes. This is not only computationally de-

manding, but impractical, since acquiring annotations

for high-level activities is a laborious and intense task.

On the other hand, the proposed approach has made

training low-level, short-term multidimensional audio

classiﬁers an easy task (through the calibration proce-

dure described in Section 2), while the training of the

meta-classiﬁer requires only a few training samples,

since it is based on audio segment classiﬁcation de-

cisions, not on very low-level and multi-dimensional

audio features.

5 CONCLUSIONS AND FURTHER

WORK

We have presented a low-cost solution for recognizing

human activities in a home environment using low-

level audio labels. Towards this end, we expanded

our previous related work (Siantikos et al., 2016)

by (a) training an audio segment classiﬁer that dis-

tinguishes between low-level audio events that usu-

ally appear in the context of a joint living-room and

kitchen environment (b) proposing and evaluating a

meta-classiﬁcation technique that uses low-level au-

dio events and simple signal energy values, in order

to recognize long-term and high-level activities. Both

the dataset and the prototype implementation used

for the experiments described here are publicly avail-

able.

A range of modern classiﬁers has been evaluated

on this meta-classiﬁcation task and the results have

See https://zenodo.org/record/376480 and

https://github.com/radio-project-eu/AUROS

Daily Activity Recognition based on Meta-classiﬁcation of Low-level Audio Events

225

Figure 5: Example of accelerometer data for two activities: eating (upper left), kitchen cleaning up (upper right), listening to

music (lower left) and dancing (lower right).

proven that probabilistic SVMs that combine both

low-level audio classiﬁcation decisions and simple

energy values of raw audio segments outperforms all

other classiﬁers and feature combinations. In partic-

ular, this experimentation on a real dataset has shown

that the proposed approach successfully classiﬁes ac-

tivities in 92% of the cases.

Further work on this topic aims to reﬁne the ac-

tivity taxonomy, to fuse acoustic results with other

modalities, and to include temporal information about

the low-level events. Speciﬁcally, it is obvious that the

classes used here can be expanded to a more detailed

taxonomy of types and sub-types of activities, but also

that further top-level activities related to one’s abil-

ity to function independently (such as preparing and

consuming a meal, currently lumped under ‘other’)

should be added. It is also obvious that as the taxon-

omy gets ﬁner, the performance of acoustic-only clas-

siﬁcation will start dropping, as some activities can-

not be distinguished using only-audio information;

consider, for example, making the distinction between

‘listening to music’ and ‘dancing to the music’.

In order to boost the classiﬁcation performance for

such cases, we have started experimenting with fusing

audio information with accelerometer data, which has

been lately used in human activity recognition for the

elderly (Khan et al., 2010), (de la Concepci

on et al.,

2017). Figure 5 illustrates the raw accelerometer val-

ues (along the three spatial axes) for four activities,

namely: eating, cleaning up the kitchen, listening to

music and dancing. It is obvious that these time series

can be used to discriminate between the dancing and

listening to music classes, once acoustic modelling

has established the presence of ‘music’ in the envi-

ronment.

A second opportunity to improve the discrimina-

tion ability of the classiﬁer, is to include information

about the temporal evolution of the low-level events.

In future work, we will experiment with methods

from complex event recognition (Artikis et al., 2014)

to describe complex events in a way that takes into

account temporal information.

ACKNOWLEDGEMENTS

This project has received funding from the European

Unions Horizon 2020 research and innovation pro-

gramme under grant agreement No 643892. Please

see http://www.radio-project.eu for more details.

ICT4AWE 2017 - 3rd International Conference on Information and Communication Technologies for Ageing Well and e-Health

226

REFERENCES

Artikis, A., Gal, A., Kalogeraki, V., and Weidlich, M.

(2014). Event recognition challenges and techniques.

ACM Transactions on Internet Technology, 14(1).

Barger, T. S., Brown, D. E., and Alwan, M. (2005). Health-

status monitoring through analysis of behavioral pat-

terns. Systems, Man and Cybernetics, Part A: Systems

and Humans, IEEE Transactions on, 35(1):22–27.

Botia, J. A., Villa, A., and Palma, J. (2012). Ambient as-

sisted living system for in-home monitoring of healthy

independent elders. Expert Systems with Applications,

39(9):8136–8148.

Breiman, L. (1997). Arcing the edge. Technical report,

Technical Report 486, Statistics Department, Univer-

sity of California at Berkeley.

Chernbumroong, S., Cang, S., Atkins, A., and Yu, H.

(2013). Elderly activities recognition and classiﬁca-

tion for applications in assisted living. Expert Systems

with Applications, 40(5):1662–1674.

Costa, R., Carneiro, D., Novais, P., Lima, L., Machado,

J., Marques, A., and Neves, J. (2009). Ambient as-

sisted living. In 3rd Symposium of Ubiquitous Com-

puting and Ambient Intelligence 2008, pages 86–94.

Springer.

de la Concepci

on, M.

A., Morillo, L. M. S., Garc

ıa, J.

A., and Gonz

alez-Abril, L. (2017). Mobile activ-

ity recognition and fall detection system for elderly

people using ameva algorithm. Pervasive and Mobile

Computing, 34:3–13.

El-Bendary, N., Tan, Q., Pivot, F. C., and Lam, A. (2013).

Fall detection and prevention for the elderly: A review

of trends and challenges. Int. J. Smart Sens. Intell.

Syst, 6(3):1230–1266.

Friedman, J. H. (2002). Stochastic gradient boosting. Com-

putational Statistics & Data Analysis, 38(4):367–378.

Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely

randomized trees. Machine learning, 63(1):3–42.

Giannakopoulos, T. (2015). pyaudioanalysis: An open-

source python library for audio signal analysis. PloS

one, 10(12):e0144610.

Giannakopoulos, T., Konstantopoulos, S., Siantikos, G.,

and Karkaletsis, V. (2017). Design for a system of

multimodal interconnected adl recognition services.

In Components and Services for IoT Platforms, pages

323–333. Springer.

Hagler, S., Austin, D., Hayes, T. L., Kaye, J., and Pavel,

M. (2010). Unobtrusive and ubiquitous in-home mon-

itoring: a methodology for continuous assessment of

gait velocity in elders. Biomedical Engineering, IEEE

Transactions on, 57(4):813–820.

Ho, T. K. (1995). Random decision forests. In Document

Analysis and Recognition, 1995., Proceedings of the

Third International Conference on, volume 1, pages

278–282. IEEE.

Khan, A. M., Lee, Y.-K., Lee, S. Y., and Kim, T.-

S. (2010). A triaxial accelerometer-based physical-

activity recognition via augmented-signal features and

a hierarchical recognizer. IEEE transactions on infor-

mation technology in biomedicine, 14(5):1166–1172.

Li, X., Chen, J., Zhao, G., and Pietikainen, M. (2014). Re-

mote heart rate measurement from face videos under

realistic situations. In Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition,

pages 4264–4271.

Mann, W. C., Marchant, T., Tomita, M., Fraas, L., and Kath-

leen, S. (2002). Elder acceptance of health monitor-

ing devices in the home. Care Management Journals,

3(2):91–98.

Pal, M. (2005). Random forest classiﬁer for remote sensing

classiﬁcation. International Journal of Remote Sens-

ing, 26(1):217–222.

Petridis, S., Giannakopoulos, T., and Perantonis, S. (2015a).

Unobtrusive low-cost physiological monitoring using

visual information. In Handbook of Research on Inno-

vations in the Diagnosis and Treatment of Dementia,

pages 306–316. IGI Global.

Petridis, S., Giannakopoulos, T., and Spyropoulos, C. D.

(2015b). A low cost pupillometry approach. Interna-

tional Journal of E-Health and Medical Communica-

tions, 6(4):49–61.

Platt, J. C. (1999). Probabilistic outputs for support vector

machines and comparisons to regularized likelihood

methods. In Advances in Large Margin Classiﬁers.

Citeseer.

Siantikos, G., Giannakopoulos, T., and Konstantopoulos, S.

(2016). A low-cost approach for detecting activities

of daily living using audio information: A use case

on bathroom activity monitoring. In Proceedings of

the 2nd International Conference on Information and

Communication Technologies for Ageing Well and e-

Health (ICT4AWE 2016).

Stikic, M., Huynh, T., Van Laerhoven, K., and Schiele,

B. (2008). Adl recognition based on the combina-

tion of rﬁd and accelerometer sensing. In Pervasive

Computing Technologies for Healthcare, 2008. Per-

vasiveHealth 2008. Second International Conference

on, pages 258–263. IEEE.

Vacher, M., Portet, F., Fleury, A., and Noury, N. (2010).

Challenges in the processing of audio channels for

ambient assisted living. In e-Health Networking Ap-

plications and Services (Healthcom), 2010 12th IEEE

International Conference on, pages 330–337. IEEE.

Vacher, M., Portet, F., Fleury, A., and Noury, N. (2013).

Development of audio sensing technology for ambient

assisted living: Applications and challenges. Digital

Advances in Medicine, E-Health, and Communication

Technologies, page 148.

Daily Activity Recognition based on Meta-classiﬁcation of Low-level Audio Events

227