Improving Activity Mining in a Smart Home using Uncertain and

Temporal Databases

Josky A

ızan

1,2

, Cina Motamed

and Eugene C. Ezin

Institut de Math

ematiques et de Sciences Physiques, Universit

e d’Abomey-Calavi, Benin

Laboratoire d’Informatique Signal et Image de la C

ote d’Opale, Universit

e du Littoral C

ote d’Opale, France

Keywords:

Sequential Pattern Mining, Smart Home, Activity of Daily Living.

Abstract:

In the context of smart home, activity mining appears as an interesting and promising solution for learning ac-

tivity of daily living. This paper is an extension of a previous a research work titled Activity Mining in a Smart

Home from Sequential and Temporal Databases. It proposes an activity mining method based on uncertain and

temporal sequential pattern mining to deal with data uncertainty and events temporal relationships. It allows

to track regular activities and to detect changes in an individual’s behavioural pattern. Uncertain sequential

pattern mining algorithm is ﬁrstly applied to the input sequence database to extract typical sequences and sec-

ondly a clustering approach based on sequence alignment methods is performed in order to obtain separated

typical activities. The results obtained are enough good compared to existing related works.

1 INTRODUCTION

Home assistance is an emerging area that aims to

ensure health care, safety and autonomy for the el-

derly. A smart home capability appears as a promis-

ing solution. It is the combination of technologies and

services through the networking process for a better

quality of life.

Because of the large amount and uncertainty of

data that can be generated through daily activities of

inhabitant, in this paper, we have extended our previ-

ous work (A

ızan et al., 2020) by using sequential pat-

tern mining technique based on probabilistic and tem-

poral databases to discover frequent activities mod-

els. Proposed method is consisted of three steps. In

the ﬁrst step, the pre-processing phase converts sen-

sor data into the event sequences. In the second step,

the detection of frequent activities is performed by ap-

plying uncertain sequential pattern mining algorithm.

In the third step, the system estimates temporal con-

straints between events inside of each set of frequent

sequences and tries to cluster temporally similar se-

quence to obtain more speciﬁc sequences. It permits,

for example, to differentiate a short and a long activity

based on the same sequence.

The paper is organized as follows. In section 2,

related works are reviewed. Section 3 gives a theoret-

ical description of the proposed method while section

4 presents experimental results and analysis. A

conclusion ends this paper with its future directions.

2 STATE OF ART AND RELATED

WORKS

Learning daily activities in a smart home is a research

real challenge in the ﬁeld of Sequential Pattern Min-

ing (SPM).

A frequent sequential pattern mining algorithm is

proposed in (Schweizer et al., 2015) to learn con-

sumer behaviour and then reduce energy consumption

in smart homes. This algorithm uses a window with a

preﬁxed size over the chronologically ordered events

to ﬁnd all possible frequent patterns and is named

Window Sliding with De-Duplication (WSDD). This

approach does not consider the time between two

events. (Singh and Yassine, 2017) proposed in the

same ﬁeld of energy consumption behaviour analysis,

an unsupervised progressive incremental data mining

mechanism.

Frequent episode mining is used in (Li et al.,

2017) to recognize sequential behaviour patterns. To

discover data model for smart home and IoT data an-

alytics, (Suryadevara, 2017) developed a framework.

A novel sequential pattern mining algorithm called

PBuilder is proposed in (Hassani et al., 2015) that

Aïzan, J., Motamed, C. and Ezin, E.

Improving Activity Mining in a Smart Home using Uncertain and Temporal Databases.

DOI: 10.5220/0009827206370642

In Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2020), pages 637-642

ISBN: 978-989-758-442-8

 2020 by SCITEPRESS – Science and Technology Publications, Lda. All r ights reserved

637

uses a batch-free approach to mine activities in a

smart home. High utility pattern mining is given in

(Menaka and Gayathri, 2013) to model activity in a

smart home and uses linked sensor data stream ap-

proach to save processing time and memory space.

Temporal data mining algorithm is presented in

(Moutacalli et al., 2012) to model activities. Their

approach uses the mining process of the timestamp

of each activity event constituting. A novel method

used sequential pattern mining based on the longest

common subsequence is proposed in (Raeiszadeh and

Tahayori, 2018) to model behaviour in smart home.

All proposed approaches and algorithms in the

context of smart home are applied over deterministic

database. Therefore they did not consider the uncer-

tainty of real world data. In practice, sensors data are

not always reliable and it seems important to consider

these imperfections.

Several works are carried out on sequential pattern

mining algorithms by integrating uncertainty manage-

ment. (Muzammal and Raman, 2010) proposed prob-

abilistic models for uncertain sequential pattern min-

ing. (Zhao et al., 2014) have developed algorithm to

mine probabilistically frequent sequential patterns in

large uncertain databases. (Li et al., 2013) focused on

probabilistic frequent spatio-temporal sequential pat-

tern with gap constraints. (Zhang et al., 2017) pro-

posed high utility-probability sequential pattern min-

ing from uncertain databases. (Muzammal et al.,

2017) developed a framework for probabilistic trajec-

tory extraction and mining from uncertain data. In

(Yang et al., 2002) sequential pattern mining is stud-

ied in noisy sequences. Sequential pattern mining in

probabilistic databases (Aggarwal, 2009), (Suciu and

Dalvi, 2005) is the popular framework for modelling

uncertainty.

In our approach, we considered uncertainty of the

sensor events and also, the order of the sensors’ ac-

tivation, for the frequent sequential pattern extrac-

tion for each activity. Since some activities share

some sensor events, in order to solve the conﬂicts in

the classiﬁcation of activities, we used the sequence

alignment-based technique, which basically increases

the average accuracy of classiﬁcation over the previ-

ous works.

3 PROPOSED METHOD

In this work, we use uncertain and temporal se-

quential pattern mining to discover typical activi-

ties that frequently occur by an individual in smart

home and the sequence alignment-based technique to

recognize and predict these activities as they occur.

The proposed method has three phases namely pre-

processing, uncertain sequential pattern mining and

activity modeling. Fig. 1 presents the proposed ap-

proach architecture wich differs from that used in our

previous work (A

ızan et al., 2020) by the integration

of uncertainty management in the sequetial pattern

mining phase and the use of sequence alignment in the

activity modeling phase. For our experience, we have

used the Massachusetts Institute of Technology (MIT)

smart home data set (Tapia et al., 2004). This data

set needs to be transformed to a realistic probabilistic

and temporal sequential database. The pre-processing

represents the ﬁrst stage of the architecture. The sec-

ond step extracts typical activities by using a sequen-

tial pattern mining approach, and then a third stage

operates activity modeling based on sequence align-

ment.

Figure 1: Architecture of proposed approach.

3.1 Pre-processing

This stage is the same as that proposed in our previ-

ous work (A

ızan et al., 2020). An activity is a time

ordered records of events. Events are generated by

sensors and actuators. The decision about activating

of an event is linked with the state changes (Boolean)

from the sensor or when its value greatly changes nu-

merically. A small change in value is considered as

the noise and is therefore ignored. In the real world,

each sensor is associated with a conﬁdence c with

0 < c < 1. The pre-processing phase aims to con-

vert sensor data into probabilistic event sequences.

For illustration we show the “Washing dishes” activ-

ity from the dataset in Table 1. In the pre-processing

phase as shown in Fig. 2, raw sensor data are con-

verted to (t)eid format in which t represents sensor

activation or deactivation timestamp, eid represents

event id. The event id named eid is in the form XY Z

where X represents sensor id, Y represents sensor

state which can be 1 for activating or 0 for deacti-

vating. Z represents the number of times the sensor is

activated or deactivated during the same activity.

ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics

638

Table 1: Sample of data.

Going out to work 4/1/2003 12:11:26 12:15:12

81 139 140

Closet Jewelry box Door

12:12:29 12:13:27 12:13:45

12:13:0 12:13:35 12:13:48

Toileting 4/4/2003 12:30:17 12:31:10

100 67

Toilet Flush Cabinet

12:30:30 12:30:51

14:2:12 12:30:54

Washing dishes 4/5/2003 15:57:55 16:0:15

70 132 132 70

Dishwasher Cabinet Cabinet Dishwasher

15:58:31 15:58:52 15:59:22 15:59:39

15:59:32 15:59:19 15:59:26 16:7:15

Figure 2: Pre-Processing phase of the sensor data.

3.2 Sequential Pattern Mining

The second stage is performed by a sequential pat-

tern mining to obtain frequent sequences. According

to the sequence database, we have a deterministic se-

quential pattern mining when there is no uncertainty

in the data otherwise we must use an uncertain se-

quential pattern mining.

Deterministic Sequential Pattern Mining. rep-

resents the standard sequential pattern mining. Let

S = {1, ··· , p} and I = {i

, i

, · ·· , i

} be respectively

a set of sources and a set of items. An event e is a col-

lection of items such that e ⊆ I . A sequence database

D = hs

, s

, · ·· , s

i is an ordered list of sequences

such that each s

∈ D is of the form (eid

, e

, σ

where eid

is a unique event-id, including a timestamp

(events are ordered by this timestamp), e

is an event

and σ

is a source.

A sequence is an ordered list of events s =

, e

, · ·· , e

i such that e

⊆ I (1 ≤ k ≤ n). A se-

quence s

= hA

, A

, · ·· , A

i is a subsequence of an-

other sequence s

= hB

, B

, · ·· , B

i denoted s

 s

if and only if there exist integers 1 ≤ i

< i

< · ·· <

≤ m such that A

⊆ B

, A

⊆ B

, · ·· , A

⊆ B

. Let

be the sequence corresponding to a source i. For

a sequence s and source i, let X

(s, d

) be an indica-

tor variable, whose value is 1 if s is a subsequence of

sequence d

, and 0 otherwise. For any sequence s, its

support in D is denoted by:

Sup(s, D) =

∑

i=1

(s, d

) (1)

The goal is to ﬁnd all sequences s such that

Sup(s, D) ≥ θp for some user-deﬁned threshold 0 ≤

θ ≤ 1.

Uncertain Sequential Pattern Mining. In the

context of Uncertain sequential pattern mining, there

are three different kinds of uncertainty: the source, the

event and the time (Muzammal and Raman, 2010). In

our work, we will focus on sensors data uncertainty,

which is linked with the event uncertainty.

In the event uncertainty, the database D

, D

, · ·· D

) is represented by a collection of p-

sequence. Each p-sequence is an ordered list of

events with its conﬁdence. An example of p-sequence

database is presented in Table 2.

Table 2: Sample of p-sequence database.

p-sequence

(e, h : 0.6)(e, f : 0.3)( f , g : 0.7)

(e, h : 0.4)(e, f : 0.2)

(e : 1.0)(e, f : 0.5)( f , g : 0.3)

The Possible Worlds semantics of D

is as fol-

lows. For each event e

in a p-sequence D

there

are two kinds of worlds; one in which e

occurs and

the other where it does not. For the rest of the algo-

rithm, we use Possible Worlds semantics PW (D

) of

the database D

. we get PW (D

) by calculating the

PW (D

). each PW (D

) is obtained by considering of

the 2

possibilities with l the length of p-sequence D

In this approach, the events at the p-sequences

level are considered probabilistically independent,

Table 3 presents PW (D

Table 3: Possible worlds of D

hi (1 − 0.4) × (1 − 0.2) = 0.48

(e, h) (0.4) × (1 − 0.2) = 0.32

(e, f ) (1 − 0.4) × (0.2) = 0.12

(e, h)(e, f ) (0.4) × (0.2) = 0.08

The same method is used to determine PW (D

)

and PW (D

) and Table 4 presents the possible worlds

PW (D

) of the p-sequence database D

An example of the possible world D

∗

is shown in

Table 5.

A probability of this possible world Pr(D

∗

) =

0.294 × 0.32 × 0.35 = 0.03 if we consider that p-

sequence stochastically independent. Expected Sup-

port is evaluated according (2).

Improving Activity Mining in a Smart Home using Uncertain and Temporal Databases

639

Table 4: Possible worlds of D

PW (D

) {hi = 0.084};{(e, h) = 0.126};

{(e, f ) = 0.036}; {( f , g) = 0.196};

{(e, h)(e, f ) = 0.054};

{(e, h)( f , g) = 0.294};

{(e, f )( f , g) = 0.084};

{(e, h)(e, f )( f , g) = 0.126}

PW (D

) {hi = 0.48}; {(e, h) = 0.32};

{(e, f ) = 0.12}; {(e, h)(e, f ) = 0.08}

PW (D

) {(e) = 0.35};{(e)(e, f ) = 0.35};

{(e)( f , g) = 0.15}; {(e)(e, f )( f , g) = 0.15}

Table 5: One possible world.

∗

{(e, h)( f , g)} 0.294

∗

{(e, h)} 0.32

∗

{(e)(e, f )} 0.35

ES(s, D

) =

∑

∗

∈PW (D

)

Pr[D

∗

] × Sup(s, D

∗

) (2)

Sup(s, D

∗

) is evaluated according the (1) be-

cause D

∗

is deterministic. We have |PW (D

)| ×

|PW (D

)| × |PW (D

)| = 8 × 4 × 4 = 128 and

then Equation (2) becomes unexploitable when the

database is large. To deal with problem, expected

support is evaluated as follows: Let s = (e)( f ) be a

sequence and database of Table 2. For each source

X, Y and Z, the probability that it supports s is calcu-

lated. According to PW (D

see Table 4, the probabil-

ity that source X support s is (0.054+0.294+0.084+

0.126) = 0.558 and the probability that it does not is

1 − 0.558 = 0.442. similarly, the probability that Y

and Z support s are 0.08 and 0.65. For i = 0, 1, 2, 3,

independence of p-sequence is used to compute the

probability that exactly i sources support s as shown

in Table 6. For example, the probability that s is sup-

ported by all three sources is (0.558 × 0.08 × 0.65) =

0.029. Then ES(s) = (0 × 0.142 + ···3 × 0.029) =

1.228.

Table 6: Support probability distribution.

No of sources 0 1 2 3

support probability 0.142 0.456 0.372 0.029

3.3 Activity Modeling

At this stage, each input sequence (test sequences) is

compared against frequent sequences patterns of all

of the activites from uncertain sequential pattern min-

ing stage. A score will be computed for each pair

of sequence and activity class. This score determines

how similar the triggered sensors of input sequence

are with the triggered sensors of discovered frequent

sequences patterns for each activity. The class of ac-

50 60

70 80 90 100

100

Conﬁdence

Accuracy

subject 1

subject 2

Figure 3: Activities recognition accuray according to sensor

conﬁdence.

tivity with maximum score will be selected for that

sequence.

To deﬁne a similarity measure between the two

sequences (3), we used Levenshtein distance e(A, B)

(Rashidi et al., 2011), which is the number of edits

(insertions, deletions, and substitutions) required to

transform an event sequence A into another event se-

quence B.

sim(A, B) = 1 −

e(A, B)

max(|A|, |B|)

(3)

4 EXPERIMENTAL RESULTS

AND ANALYSIS

In this section, we present the results obtained with

the proposed method. We used the MIT data smart

home testbed better described in subsection 4.1.

4.1 MIT Dataset

MIT dataset is a two weeks collection of human ac-

tivity in two single-person apartments containing re-

spectively 77 and 84 sensors see Fig. 5 for illustra-

tion. The sensors were installed in everyday objects

such as containers, drawers, refrigerators, etc. to

record opening-closing events (activation deactivation

events) as the subject carried out everyday activities.

Activities are labeled in to 16 different classes and

the number of occurrences of each class by subject is

showed in Table 7.

4.2 Results and Analysis

Our implementation in Java, is executed on a machine

Intel(R) Core(TM) i7 −7500U CPU @2.70 GHz 2.90

GHz running on Windows 10. With a support value

ﬁxed to 0.5, our method applied on MIT dataset (see

ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics

640

50 60

70 80 90 100

Conﬁdence

Process time

subject 1

subject 2

Figure 4: Processing time according to sensor conﬁdence.

Figure 5: (a) apartment of subject one. (b) apartment of

subject two.

Fig. 3) shows that accuracy rate is maximum for con-

ﬁdences between 50% and 70% and decreases for

conﬁdence greater than 70%. The maximum values of

accuracy are 99.38% and 100% respectively for sub-

ject 1 and subject 2.

In summary, the decrease of accuracy rate for con-

ﬁdences greater than 70% is explained by an increase

in the part of unreliable frequent pattern resulting

from uncertain sequential pattern mining phase by ap-

plying these conﬁdences.

Fig. 4 depicted processing time for MIT dataset.

We can notice that the processing time increases for

conﬁdences greater than 70% and are low for conﬁ-

dences that give maximum values of accuracy rate.

This is explained by the fact that when applying these

conﬁdences that give maximum values of accuracy

rate at the uncertain sequential pattern mining phase,

Table 7: Activities labeled.

Number of Examples per Class

Activity Subject 1 Subject 2

Preparing dinner 8 14

Preparing lunch 17 20

Listening to music - 18

Taking medication - 14

Toileting medication 85 40

Preparing breakfast 14 18

Washing dishes 7 21

Preparing a snack 14 16

Watching TV - 15

Bathing 18 -

Going out to work 12 -

Dressing 24 -

Grooming 37 -

Preparing a beverage 15 -

Doing laundry 19 -

Cleaning 8 -

Table 8: Comparison of results with MIT dataset.

Approach Result

Proposed

Method

(Uncertain SPM) Subject 1: 99.38%

Subject 2: 100%

(Raeiszadeh

and

Tahayori,

2018)

(UP-

DM+RandomForest)

Subject 1: 97.45%

Subject 2: 91.37%

(Tapia

et al.,

2004)

(Naive Bayes Clas-

siﬁer)

Subject 1: 60.6%

Subject 2: 41.09%

we obtain only useful and reliable frequent pattern.

Other conﬁdences give not only useful and reliable

frequent pattern but also unreliable frequent pattern

and then need more processing time. Table 8 shows a

comparison of our results with other methods.

5 CONCLUSION

We have used a sequential pattern mining algorithm

from probabilistic and temporal databases to bring out

typical activities in the smart home. By considering

of sensor uncertainty, the recognition focuses on re-

liable parts of the sensor data. We use temporal re-

lationships between events for a more accurate char-

acterization/classiﬁcation of frequent activities. The

future work is the development of an online adaptive

unsupervised learning of activities in the context of

uncertain observation and temporal constraints.

Improving Activity Mining in a Smart Home using Uncertain and Temporal Databases

641

REFERENCES

Aggarwal, C. C. (2009). Managing and mining uncertain

data.

ızan, J., Motamed, C., and Ezin, E. (2020). Activity min-

ing in a smart home from sequential and temporal

databases. In In Proceedings of the 9th International

Conference on Pattern Recognition Applications and

Methods.

Hassani, M., Beecks, C., ows, D. T., and Seidl, T. (2015).

Mining sequential patterns of event streams in a smart

home application. In The LWA 2015 Workshops:

KDML, FGWM, IR, and FGD.

Li, L., Li, X., Lu, Z., Lloret, J., and Song, H. (2017).

Sequential behavior pattern discovery with frequent

episode mining and wireless sensor network. In Com-

munications Magazine. IEEE.

Li, Y., Bailey, J., Kulik, L., and Pei, J. (2013). Mining

probabilistic frequent spatio-temporal sequential pat-

terns with gap constraints from uncertain databases.

In IEEE International Conference on Data Mining

(ICDM). IEEE.

Menaka, J. and Gayathri, K. S. (2013). Activity modeling

in smart home using high utility pattern mining over

data streams. In The Journal of Computer Science and

Network.

Moutacalli, M. T., Bouzouane, A., and Bouchard, B.

(2012). Unsupervised activity recognition using tem-

poral data mining. In The First International Confer-

ence on Smart Systems, Devices and Technologies.

Muzammal, M., Gohar, M., Rahman, A. U., and Qu, Q.

(2017). Trajectory mining using uncertain sensor data.

In The Journal of IEEE Access. IEEE.

Muzammal, M. and Raman, R. (2010). On probabilistic

models for uncertain sequential pattern mining. In In-

ternational Conference on Advanced Data Mining and

Applications.

Raeiszadeh, M. and Tahayori, H. (2018). A novel method

for detecting and predicting resident’s behavior in

smart home. In 6th Iranian Joint Congress on Fuzzy

and Intelligent Systems. IEEE.

Rashidi, P., Cook, D. J., Holder, L. B., and Schmitter-

Edgecombe, M. (2011). Discovering activities to rec-

ognize and track in a smart environment. In IEEE

Trans Knowl Data Eng, volume 23(4), page 527–539.

Schweizer, D., Zehnder, M., Wache, H., and Witschel, H.

(2015). Using consumer behavior data to reduce en-

ergy consumption in smart homes. In 14th Interna-

tional Conference on Machine Learning and Applica-

tions.

Singh, S. and Yassine, A. (2017). Mining energy consump-

tion behavior patterns for house holds in smart grid.

In Transactions on Emerging Topics in Computing.

IEEE.

Suciu, D. and Dalvi, N. N. (2005). Foundations of prob-

abilistic answers to queries. In the ACM SIGMOD

International Conference on Management of Data.

Suryadevara, N. (2017). Wireless sensor sequence data

model for smart home and iot data analytics. In

First International Conferenceon Computational In-

telligence and Informatics, Advances in Intelligent

Systems and Computing.

Tapia, E. M., Intille, S. S., and Larson, K. (2004). Activ-

ity recognition in the home setting using simple and

ubiquitous sensors. In Pervasive Computing.

Yang, J., Wang, W., Yu, P. S., and Han, J. (2002). Mining

long sequential patterns in a noisy environment. In

ACM SIGMOD international conference on Manage-

ment of data.

Zhang, B., Lin, J. C., Fournier-Viger, P., and Li, T. (2017).

Mining of high utility-probability sequential patterns

from uncertain databases. In The Journal of PLoS

One.

Zhao, Z., Yan, D., and Ng, W. (2014). Mining probabilis-

tically frequent sequential patterns in large uncertain

databases. In The Journal of IEEE Transactions on

Knowledge and Data Engineering. IEEE.

ICINCO 2020 - 17th International Conference on Informatics in Control, Automation and Robotics

642