PROTECTING PRIVATE DATA ON MOBILE SYSTEMS

BASED ON SPATIO–TEMPORAL ANALYSIS

Sausan Yazji

, Robert P. Dick

, Peter Scheuermann

and Goce Trajcevski

EECS Dept., Northwestern University, Evanston, IL 60208, U.S.A.

EECS Dept., University of Michigan, Ann Arbor, MI 48109, U.S.A.

Keywords:

Mobile security, Trajectory analysis.

Abstract:

Mobile devices such as smart phones and laptops are in common use and carry a vast amount of personal data.

This paper presents an efﬁcient behavior-based system for rapidly detecting the theft of mobile devices in order

to protect the private data of their users. Our technique uses spatio-temporal information to construct models

of user motion patters. These models are used to detect theft, which may produce anomalous spatio-temporal

patterns. We consider two types of user models, each of which builds on the relationship between location

and time of day. Our evaluation, based on the Reality Mining dataset, shows that our system is capable of

detecting an attack within 15 minutes with 81% accuracy.

1 INTRODUCTION

Mobile devices such as smart phones, iPhones, and

laptops are used in a number of applications, includ-

ing email, text messaging, gaming, web browsing,

navigation, and recording pictures/videos (Thornton

and Houser, 2004). Such devices are also used for ﬁ-

nancial transactions including Mobile Money (Chen,

2008), which is extensively used in China and Japan.

Mobile computing devices store a lot of personal in-

formation and, if stolen, loss of control over these data

may be even more important than loss of the mobile

device.

Some prior work on mobile device security has fo-

cused on physical aspects and/or access control (e.g.,

strong passwords, voice recognition, or ﬁngerprints).

However, such approaches do not protect the private

data on stolen devices in the post-authentication state.

Many mobile devices (e.g., from Apple, Blackberry,

Sony Ericsson, and Nokia) are equipped with loca-

tion identiﬁcation tools such as association with a

cellphone tower ID, WiFi, Bluetooth, or Global Posi-

tioning System (GPS) receivers, which can be used to

track location in case of theft. However, existing work

that uses the GPS-feature for the purpose of protect-

ing the users (e.g., GadgetTrak (GadgetTrak, 2010)

and RecoveryCop (Monitoring, 2010)) depend on the

owner to report the theft. It may take hours before the

owner discovers the theft of a device, at which point

private data may have already been violated. Even

Laptop Cop (LaptopCop, 2010), which has the goal

of protecting data on stolen devices by remotely and

manually deleting it, requires user intervention to ini-

tiate this process. In addition, these systems require

cellular connections to protect the data, while our sys-

tem is capable of detecting attacks and reacting with-

out cellular access.

Our main goal is to develop efﬁcient techniques

for protecting data saved on mobile devices. Our ap-

proach is based on detecting the spatio-temporal be-

havior of intruders, which may be anomalous com-

pared to the regular motion patters of owners. In

a previous study (Yazji et al., 2009), we used net-

work access patterns and ﬁle system activities to build

a behavioral model that permitted attack detection

with a latency of 5 minutes and an accuracy of 90%.

We investigate the complementary approach of using

spatio-temporal information and trajectory analysis to

model user behavior and support anomaly detection.

There has been recent research (Sun et al., 2007;

Hall et al., 2005; Yan et al., 2009) on mobility-based

intrusion detection. To the best of our knowledge,

ours is the ﬁrst such technique to use spatio-temporal

information and trajectory analysis to enable detec-

tion of an attack in 15 minutes and with 81% accu-

racy. The simple data structure used to model the

users spatio-temporal behavior – 2-dimensional and

3-dimensional matrices – enables efﬁcient lookup-

based attack detection.

The rest of this paper is organized as follows. Sec-

114

Yazji S., P. Dick R., Scheuermann P. and Trajcevski G. (2011).

PROTECTING PRIVATE DATA ON MOBILE SYSTEMS BASED ON SPATIO–TEMPORAL ANALYSIS.

In Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems, pages 114-123

DOI: 10.5220/0003373301140123

 SciTePress

tion 2 describes related work. Section 3 introduces the

system architecture and detection techniques. Section

4 presents evaluation of our technique. Section 5 con-

cludes the paper and indicates possible directions for

future work.

2 RELATED WORK

Spatio-temporal data management and efﬁcient query

processing techniques have been the topics of in-

tensive research in the ﬁeld of Moving Objects

Databases (G

uting and Schneider, 2005). In par-

ticular, trajectory analysis and similarity detection

have yielded numerous research results in the recent

years (Dodge et al., 2009; G

omez et al., 2008; Je-

ung et al., 2008). Several results from this arena

have goals similar to ours. For example, Mouza and

Rigaux (Mouza and Rigaux, 2005) propose regular

expression based algorithms for detecting mobility

patterns. However, those patterns do not explicitly

model the temporal dimension of the motion, i.e., the

focus is more on routes than trajectories. Hadjieleft-

heriou (Hadjieleftheriou et al., 2005) describe efﬁ-

cient indexing techniques and reﬁnement algorithms

for processing spatio-temporal pattern queries. The

main distinction of our work is the use of probabilistic

location-in-time patterns, which establish a threshold

for detecting anomalous behavior.

The importance of adding semantic information

to trajectory data has been previously recognized.

For example, in order to improve application aware-

ness during trajectory data analysis, Alvares (Alvares

et al., 2007) proposed adding semantic information

during trajectory preprocessing. Hung, Chang, and

Peng (Hung et al., 2009) proposed the complementary

approach of using a probabilistic sufﬁx tree to mea-

sure separation among users trajectories. Xie, Deng,

and Zhou (Xie et al., 2009) addressed the problem of

predicting social activities based on users trajectories.

In addition, Trestian (Trestian et al., 2009) used as-

sociation rule mining to investigate the relationships

between geographic locations and use habits for mo-

bile devices. In this work, we introduce two types of

mobility models and combine them for efﬁcient de-

tection of anomalous use.

Some intrusion detection research has objectives

similar to ours, but differs in approach. Sun (Sun

et al., 2007) proposed mobile intrusion detection

based on the Lempel–Ziv compression algorithm and

Markov Chains. The proposed technique used three-

level Markov Chains, and did not consider the as-

sociation between time of the day and the location.

Their ability to detect attack using the proposed tech-

nique is limited to the times at which the user is mak-

ing phone calls and moving faster than 60 miles per

hour. Yan (Yan et al., 2009) improved on this work,

yet the delay in detecting attack was 24 hours, since

the traces were obtained once a day, with a sampling

period of 30 minutes. Our technique has an attack

detection latency of 15 minutes. Hall, Barbeau, and

Kranakis (Hall et al., 2005) proposed an intrusion de-

tection method based on mobility traces. Their focus

was on public transportation traces in which the paths

are pre-deﬁned. Their results are inapplicable for de-

tecting attacks based on individual motion patterns.

3 SYSTEM ARCHITECTURE

We now explain the main results of our work. First,

we explain our detection system. We then describe

techniques for data collection and feature extraction

and present two user models for anomaly detection.

The main objectives of this work are to

1. develop efﬁcient algorithms for deriving user

models from spatio-temporal information and tra-

jectory analysis;

2. determine the accuracy with which users can be

distinguished using such models; and

3. ideally achieve a high detection accuracy with low

latency and low energy cost.

The methodology proposed in this paper is based

on the following observations:

• most mobile systems have location identiﬁcation

tools and can gather location traces;

• each individual typically has a small set of loca-

tions that are visited with high frequency, e.g., ev-

ery day (Gonz

alez et al., 2008); and

• individuals tend to take the same paths when mov-

ing among particular locations (Gonz

alez et al.,

2008).

3.1 System Components

The system for automatic generation of mobility

models and detection of spatio-temporal behavioral

anomalies has the following main modules:

1. data collection,

2. feature extraction,

3. user proﬁle building, and

4. anomaly detection.

Figure 1 illustrates the integration of these mod-

ules into the system architecture, which consists of

the following sub-systems.

PROTECTING PRIVATE DATA ON MOBILE SYSTEMS BASED ON SPATIO-TEMPORAL ANALYSIS

115

Information

capturing system

Information

management system

Log mobile location

information

Send log files to

information

management system

Analyze data

Build user profile

Perform anomaly

detection

Log data

Information

capturing system

Information

management system

Perform feature

extraction

User profile

model

OK?

Trigger an alert

Yes

Response management system

Receive the alert

Take appropriate

action

Alert

Figure 1: System architecture.

• (ICS) – the information capturing system, resid-

ing on the mobile device, which contains an ap-

plication to track the device location, register it

periodically, and save it in a new log ﬁle every

T minutes. It also contains the feature extraction

module.

• (IMS) – the information management system,

which collects the log-ﬁles from the ICS and re-

sides on a computer with higher performance and

much looser power consumption constraints than

the mobile device. It is responsible for building

mobility models and performing anomaly detec-

tion. Upon building the user model, the IMS sends

it to the mobile device, allowing the detection of

attacks in the absence of wireless connection, at

some computation power consumption penalty.

• (RMS) – the response management system, which

resides on both the mobile device and the remote

server that hosts the IMS. Upon receiving an alert,

the RMS identiﬁes the appropriate action to pro-

tect data on the mobile device, e.g., notifying the

device owner, locking the device, or automatically

deleting private data.

In this paper, we focus on the algorithms and im-

plementation details for the ICS and the IMS mod-

ules, since the RMS consists of user-dependent ac-

tions that should be executed in case an attack is de-

tected.

3.2 Data Collection and Feature

Extraction

Motion traces are essential for model construction and

anomaly detection. We considered human motion

data which is

• continuous: collected for a long period of time

continuously;

• consistent: collected at the same time every day;

and

• frequent: collected at a high enough frequency to

support fast anomaly detection.

The sampling frequency used by Gonz

alez

(Gonz

alez et al., 2008) was too low for our applica-

tion. The openStreetMap (OSM, 2010) data, as well

as the data used by Rhee (Rhee et al., 2008), were

neither continuous nor consistent. Hence, we used

the Reality Mining data set (Eagle et al., 2007), which

contains data for over 100 users during a nine-month

period. It consists of phone calls logs, locations iden-

tiﬁed by tower IDs and area IDs, application usage

logs, and device-speciﬁc data. The data collection in-

terval ranged from a few seconds to 15 minutes, with

an average of 2.5 minutes, except when the mobile

device was off.

Our spatio-temporal analysis techniques depends

on extracting the following features from the Reality

Mining log: (1) User ID u

; (2) Location information

, represented by the area ID in the traces; and (3)

Timestamps t

of the data records in the trace. Thus,

our input data records are tuples of the form (u

, l

We developed and evaluated two modeling tech-

niques for anomaly detection: Model #1 considers

time–location relationships and Model #2 considers

time–location sequences of recently visited locations.

We relate the anomaly detection rate to the total num-

ber of distinct locations for each user, based on which

we propose a method to adaptively select the best

model.

In the next section, we describe each of the models

in greater detail.

3.3 Model #1:

Spatio-temporal Information

In Model #1, for each user u

, we extract the location

and timestamp t

. For conciseness, we will some-

times neglect notation for user ID when it is clear

from the context.

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

116

3.3.1 Building User Proﬁle

Our goal is to protect private data on mobile devices

by detecting attacks based on identiﬁcation of unac-

ceptable deviation from the user’s normal behavior.

Our ﬁrst step is to behaviorally model each user’s nor-

mal behavior. To build the user proﬁle for the 100

users in the reality mining data set, we divided the

data evenly into two consecutive series: model data

(used for model construction) and test data (used for

evaluation).

Utilizing the model data, user proﬁle was con-

structed as follows.

1. Build a list of the user’s distinct locations (L

2. Extract from the distinct location list the user’s

common locations list (UCL

), which consists of

locations the user visited more than 1% of the time

during the data collection period.

3. Construct the LOC-IN-TIME

table for a 24 hours

time period using one-minute intervals. Each en-

try LOC-IN-TIME

( j, k) is the weighted probabil-

ity value Prob

), which represents the frac-

tion of time in the model data the user u

was at

location l

at time t

, where 1 ≤ j ≤ |UCL

|, and

1 ≤ k ≤ NT .

As explained above, UCL

denotes the set of loca-

tions visited by u

more than 1% of the time during the

data collection period, and NT denotes the number of

one-minute intervals.

At any given time t

, the user u

should be at only

one location l

from the location list L

. Therefore

the total probability value calculated for that time of

the day should always be equal to one. The weighted

probability value of (Prob

)) is the probability of

user u

being at location l

at time t

, divided by the

number of records in the model data set that represent

the locations in the UCL

The proﬁle construction process is formally de-

scribed in Algorithm 1. This process is repeated for

each user, as shown in Line 3. The ﬁrst step is to

constructs a list of all locations visited by user u

as shown in Line 4. In Line 8m we calculate the

weighted probability value. All locations that have

been visited less than 1% of the time are excluded as

explained in Lines 9–12. In Line 13, P

trust

is calcu-

lated as described in Section 3.3.2.

Figure 2 shows the proﬁle for user u

. The user

proﬁle is a two-dimensional matrix with (|UCL| ×

NT ) elements. Rows correspond to minutes of the

day and columns correspond to locations.

Algorithm 1: Build User Proﬁle based on Spatio-temporal

Information.

1: INPUT: model data log

2: OUTPUT: user Proﬁle LOC-IN-TIME

3: for all users u

4: Read each record in the model data log

5: Identify the list of distinct locations (L

) visited by

the user

6: Build the infrequent location list (IF

) where

7: if

∑

records ≤ 1% size of model data then

8: l

∈ IF

9: end if

10: Let RP represents the total number of records in the

model

data where l

∈ IF

11: Build list of the user common locations UCL

= L

−

12: Allocate space for table LOC-IN-TIME

with UCL

columns and NT rows

13: Calculate the weighted probability value

14: LOC-IN-TIME

( j, k)=Prob

)/(size of

(model data) − RP)

15: Calculate the (P

trust

) value for each user

16: end for

Figure 2: User proﬁle for Model #1.

3.3.2 Anomaly Detection

Attacks are detected via mismatches between limited-

duration spatio-temporal traces and the model of nor-

mal user behavior, yielding an attack detection la-

tency ≤ T . When the probability of a speciﬁc trace

being generated by the user model drops below the

Trust value (P

trust

), our system concludes that the mo-

bile device is used by someone other than its owner.

To calculate the P

trust

associated with a given

user proﬁle we used the test data set. We ran-

domly selected 100 samples (S

, S

, ..., S

100

) from the

test data, for which the time span is T minutes. A

random sample S

of span T corresponds to a con-

tiguous sequence of records: (u

, l

), (u

, l

···, (u

, l

), ···, (u

, l

) satisfying conditions

≤ t

··· ≤ t

and (t

−t

) = T .

PROTECTING PRIVATE DATA ON MOBILE SYSTEMS BASED ON SPATIO-TEMPORAL ANALYSIS

117

S100

SP1=0.08% SP2=0.083% SP3=0.089% SPm=0.0 % SP100=0.098%

t”1 t”2 t”n

t1 t2 t5

AAAAA

t’1t’2 t’9

A A C C C C C CB

A B B B B C C

D D C C C

D A A A

Trace Sequence

T T T T T

S100

Figure 3: Example of calculating P

trust

value.

Table 1: System Sensitivity to False Rejection Rate.

FRR 0% 10% 20% 50% 80% 100%

FAR 28.5% 19.4% 14.9% 11.6% 1.2% 0%

Figure 3 illustrates a T -duration trace sequence

containing 100 samples. The number of records per

sample varies among samples due to variation in data

collection interval. For each sample S

, we calculate

the cumulative probability SP

of the records in the

sequence using the probability distribution table es-

tablished on the model data representative of the user

as follows:

∑

( j,k)∈S

LOC-IN-TIME

). (1)

Most SP values are similar with few outliers (see

Figure 3). Selecting P

trust

equal to the smallest SP

value of zero implies no tolerance of false rejection,

resulting in a False Acceptance Rate (FAR) of 100%.

In contrast, if we have no tolerance for errors, then

trust

should equal the highest SP value that would re-

sult in a very low FAR, thus producing a very high

False Rejection Rate (FRR). We use a P

trust

result-

ing in an FRR of 10% based on sensitivity study re-

sults, in which we selected different FRR values, and

calculated the P

trust

and the FAR. Table 1 shows the

sensitivity results.

After calculating the P

trust

for each user, the

anomaly detection process can start. Algorithm 2

gives a formal description of the anomaly detection

algorithm. Upon receiving the user trajectory in

Line 2, the system initializes the cumulative proba-

bility value T P for the received trajectory as shown in

Line 5. In Lines 6–11, the system calculates the T P

value based on every l

in the trajectory. In Line 13,

the system compares the T P value with the P

trust

order to detect anomalous behavior.

Algorithm 2: Detect Mobile Theft based on Location Infor-

mation.

1: INPUT: LOC-IN-TIME

2: INPUT: User trajectory every T minutes

3: OUTPUT: Alarm in case of attack

4: Initialize the Trajectory Probability value T P

5: T P

= 0

6: for all l

in the obtained trace do

7: if l ∈ UCL

> 0 then

8: Get the probability value LOC-IN-TIME

)

value

9: Calculate the cumulative probability value for the

trace T P

= T P

+ LOC-IN-TIME

)

10: end if

11: end for

12:

13: if T P

≤ P

trust,i

then

14: Trigger an alarm

15: end if

3.4 Model #2: Trajectory Analysis

The main feature of Model #2 is that it considers the

probabilities of moves implicitly contained in the se-

quence of (time, location) points visited by the user

in the model data. Conceptually, the user’s location–

duration trace is divided into sequences, i.e., trajecto-

ries. Each trajectory consists of a start point, a num-

ber of intermediate points, and an end point, and may

differ semantically due to the notion of stopping time

STP.

• Stopping Point. (STP) is the time interval for

which the user is stationary. Based on observa-

tions from other researchers (Xie et al., 2009), we

use STP = 30 minutes for all users.

• Start Point. (SSP) = (u

, l

) is the ﬁrst location

identiﬁed in the sequence where (t

−t

k−1

) ≥ STP.

• Intermediate Point. (SIP) = (u

, l

) is a point

in the sequence where t

> t

and (t

−t

) ≤ T .

• End Point. SEP = (u

, l

) is the last location

identiﬁed in the sequence where (t

n+1

− t

) ≥

STP.

3.4.1 Building User Proﬁle

During user proﬁle construction, the Model #1 fea-

ture extraction technique is used (see Section 3.3)

and the list UCL

is constructed as described in Al-

gorithm 1. However, for Model #2, the user pro-

ﬁle is a three-dimensional table LOC-TIME-MOVE of

size (|UCL| × |UCL| × NT). Each entry in this table,

LOC-TIME-MOVE

( j, j

, k, k

), represents the proba-

bility of the user u

moving from location l

at time t

to location l

at time t

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

118

Figure 4: State graph representing the user sequences when

the user starts at location A at time t

Start Location = A

Start Location = B

Start Location = C

Start Location = N

Minute =1

Minute =2

Minute =3

Minute=1440

Minute=1439

A B C N

0.03

0.001

0.005

0.09

0.08

0.01

0.003

0.02

0.01

0.009

0.008

Minute =t2

Minute =t3

0.5

0.7

0.2 0.25

0.3

0 0

Figure 5: Mobility model for user u

(Model #2).

Similarly to the corresponding struc-

ture used in Section 3.3, each entry

LOC-TIME-MOVE

( j, j

, k, k

) represents the

weighted probability of Prob

→ l

, t

→ t

Figure 4 presents an example of a trace of sequence

information. Figure 5 shows the user proﬁle data

structure.

3.4.2 Anomaly Detection

The computation of trust values (P

trust

) for each user is

similar to that described in Section 3.3; however, for

the Model #2 we calculate the joint probability value

for each trace rather than the cumulative probability

value as follows:

∏

( j, j

,k,k

)∈S

LOC-TIME-MOVE

( j, j

, k, k

). (2)

The joint probability value is the product of the prob-

abilities of all records in the trace, as indicated in the

LOC-TIME-MOVE table. Equation 2 indicates that if

any record in the sequence has a probability of zero,

Figure 6: User path analysis.

which indicates that the user has never been at that lo-

cation at that time, the trace will be considered an at-

tack. To reduce the penalty of deviation from the nor-

mal path, we introduce the concept of Trace Threat

Level (TL), which represents the percentage of the

sequence that has no representation in the user pro-

ﬁle. Thus, if LOC-TIME-MOVE

( j, j

, k, k

) = 0, we

eliminate this value from the calculation of the trace

joint probability value, and increase the threat level

value by one. We use a threat level threshold of T L

trust

= 10% of the total records in the trace, based on em-

pirical analysis.

As an example, Figure 6 shows two paths. The

solid curve represents the normal path in the user’s

proﬁle and the dashed curve represents the currently

detected trajectory. In this example, the user pro-

ﬁle indicates that when the starting point at time

t is location B, the normal path of duration T is

B→C→D→E→F→G. In contrast, the captured user

trajectory that starts at location B at time t consists

of the sequence B→A→B→C→D→E→F. To deter-

mine whether this is an expected or anomalous user

behavior, we compare the calculated probability of

this path with the proﬁle of the particular user. The

calculated value should be equal to or greater than the

trust value for that user.

To calculate the captured trace joint probability

T P, we ﬁrst identify the starting point SSP = l

and

the time t

. Then we check whether l

∈ UCL

not. If not, we increase the value of the threat level

TL. Otherwise, we identify the next location l

time t

. If l

∈ UCL

, we obtain the joint proba-

bility value LOC-TIME-MOVE

( j, j

, k, k

). If not,

we increase the TL value again. This process is re-

peated for the entire sequence and, upon completion,

if TL ≥ TL

trust

, this sequence is judged to have been

generated by someone other than the user, i.e., an at-

tacker. If not, we subsequently check the TP value.

If TP ≥ P

trust

, the sequence is judged to belong to the

user; otherwise, it is treated as a sequence generated

PROTECTING PRIVATE DATA ON MOBILE SYSTEMS BASED ON SPATIO-TEMPORAL ANALYSIS

119

Algorithm 3: Detect Mobile Device Attack based on User

Trajectory.

1: INPUT: LOC-TIME-MOVE

2: INPUT: User trajectory every T minutes

3: OUTPUT: Alarm in case of attack

4: Initialize the Trace Probability (T P

) and Trace Threat

Level (T L

) values

5: T P

= 1, T L

= 0

6: for all n records in the sequence do

7: Read l

at time t

and l

at time t

8: if ((l

) and (l

)) ∈ UCL

> 0 then

9: calculate the joint probability value T P

= T P

LOC-TIME-MOVE

( j, j

, k, k

)

10: else

11: T L

= T L

+ 1

12: end if

13: end for

14: Check for anomaly

15: if (T L

≤ T L

trust,i

)and(T P

≥ P

trust,i

) then

16: Continue

17: else

18: Trigger an alarm

19: end if

by an attacker. A formal description of this anomaly

detection technique is presented in Algorithm 3.

4 EXPERIMENTAL RESULTS

We now describe the experimental setup and present

the results from the evaluation of our techniques.

As discussed in Section 3, we used the Reality

Mining mobility traces of students and staff at a ma-

jor university. The traces had the following sources:

60% graduate students, 27% incoming students at the

university’s business school, and 8% staff. The num-

ber of distinct locations per user ranges from 1–100,

with an average of 28. We eliminated single-location

users and those with fewer than 1,000 records (i.e.,

3.5 records per day) because it was not possible to

build models for users with very few records, leaving

93 users.

Each user log was divided into training

(model data) and testing (test data) portions as

described in Section 3.3. For each user, we randomly

selected 100 duration T samples from the test data

log. We repeated each test for four different T values

(5 min, 15 min, 30 min, and 60 min). The T value is

the attack detection latency.

ProbabilityProbability

Time of the day in minutes

Figure 7: 24-hour probability distribution diagram of loca-

tion ID=1 for (a) user u

and (b) user u

based on 9 months

of data.

4.1 Results for Spatio-temporal Model

(Model #1)

For each of the 93 users, we constructed models and

calculated trust values P

trust,i

following the steps de-

scribed in Section 3.3. Attacker behavior traces are

not presently available. However, traces for differ-

ent users are available. We evaluated the probability

of detecting the anomalous mobility patterns of other

“Reality Mining” study participants.

The limited number of locations, and the fact that

around 68% of the study participants worked in the

same set of locations (buildings), but different rooms

and ﬂoors (lab, library, ofﬁce, etc.) made this a chal-

lenging dataset for motion-based anomaly detection.

Using the area ID rather than the cellphone tower ID

during feature extraction was necessary to enable this

study. Figure 7 shows that users sharing the same lo-

cations in their proﬁles can have very different proba-

bility distributions over a 24-hour period. Figure 7(a)

shows the probability distribution for user u

and lo-

cation ID = 1, while Figure 7(b) shows the probability

distribution of the same location ID = 1 over 24 hours

for the user u

We calculated the T P

m,y

value for all 100 test sam-

ples for each user u

where x 6= i and 1 ≤ m ≤ 100.

Subsequently, we calculated the FAR

value that rep-

resents the percentage of the test samples for which

the total probability value is T P

m,y

≥ P

trust,i

. Figure

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

120

Figure 8: Histogram of total probability T P

m,y

for the 93

users and 100 test samples when (a)y = i = 30, and (b) y 6= i,

and i = 30.

8 illustrates the T P

m,y

results for a randomly selected

user u

. Speciﬁcally, Figure 8(a) shows the proba-

bility distribution of T P

m,30

, where y = i = 30. Fig-

ure 8(b) shows the probability distribution of T P

m,y

where y 6= i and i = 30. We observed that for user u

only 5% of the samples have TP

m,30

≤ 0.02, while

more than 80% of the samples for each of the other

92 users have T P

m,30

≤ 0.02.

Figure 9 illustrates the ability to distinguish the

behavior of a given user u

from that of the other

92 users, given 100 samples each. The accuracy

is (100 − FAR

) when T =5 minutes. For exam-

ple, Figure 11(a) shows that the average accuracy of

Model #1 is in the same range (76.08%–76.6%) for all

sample sizes. Therefore, we conclude that the sample

size does not have a large impact on attach detection

accuracy for Model #1. Figure 11(b) shows a stan-

dard deviation above 20%, which is also clear from

Figure 9, in which detection accuracy for some users

was 100% (e.g., users u

, u

, and u

) and in

which others have detection accuracies ranging from

9%–47% (e.g., u

, u

, and u

). High accuracy

is possible for users with few distinct locations (3–8).

Accuracy is low for users with many distinct locations

(69–100). Section 4.3 provides more details.

4.2 Results based on Trajectory

Analysis (Model # 2)

We followed the same steps described in the previ-

ous section to calculate FAR

values. Figure 10 shows

the results of the trajectory analysis for different test

sample lengths. In Model #2, detection accuracy is

20%

40%

60%

80%

100%

User ID

Accuracy in detecting attack ‐ Spatio – Temporal Analysis

T= 5 minutes, and FRR=10%

1 4 7

10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91

Avg

Figure 9: Accuracy in detecting theft according to allowed

delay using the spatio-temporal model (Model #1).

20%

40%

60%

80%

100%

User ID

1 4 7

10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91

Avg

Accuracy in detecting attack Trajectory Analysis

T= 5 minutes, and FRR=10%

Figure 10: Accuracy in detecting theft according to allowed

delay using the trajectory-based model (Model #2).

affected by test sequence length, with T = 15 min-

utes yielding the highest accuracy and T = 60 min-

utes yielding the lowest accuracy (see Figure 11(a)).

Lower P

trust

values are associated with the longer

traces, which indicates that it is uncommon for nor-

mal users to make large day-to-day changes in motion

patterns affecting short intervals within a trace. How-

ever, longer intervals are more likely to change from

day to day.

4.3 Model Comparison

As illustrated in Figure 11(a), the average accuracy

is slightly better for Model #2 than for Model #1 for

small sample intervals (less than 30 min). However,

the standard deviation is signiﬁcantly better, as shown

in Figure 11(b). It can be observed that there is im-

provement in accuracy for users with many distinct

location and degradation in accuracy for users with

few distinct locations. Hence, Model #1 is more ac-

curate in the cases when the users have few distinct lo-

cations and Model #2 is more accurate for users with

many distinct locations. Thus, a combined approach

might be useful.

NL is the discrete location count threshold at

which Models #1 and #2 have equal accuracies. If the

size of UCL

≤ NL, then Model #1 should be used.

Otherwise, Model #2 should be used. To determine

NL, we tested a combined approach with several val-

ues (5, 6, 7, · · · , 30), where 28 is the average num-

ber of distinct locations in the data set. Figure 12

illustrates the average accuracies for each NL value

depending on time T. NL = 10 allowed the highest

PROTECTING PRIVATE DATA ON MOBILE SYSTEMS BASED ON SPATIO-TEMPORAL ANALYSIS

121

Table 2: Comparison with Existing Theft Detection Systems.

Our Gadget- Recovery Laptop

System Trak (GadgetTrak, 2010) Cop (Monitoring, 2010) Cop (LaptopCop, 2010)

Detection Latency 15 min N/A N/A N/A

Accuracy 81% N/A N/A N/A

Data Protection Yes No No Yes

User Intervension No Yes Yes Yes

72%

73%

74%

75%

76%

77%

78%

79%

5Minutes 15Minutes 30Minutes 60Minutes

DetectionAccuracy

SizeofTe s t TimeSample

Model#2 Model#1

10%

15%

20%

25%

5Minutes 15Minutes 30Minutes 60Minutes

StandardDiviation

SizeofTe s t TimeSample

Model#2 Model#1

Figure 11: Average (a) accuracy and (b) standard deviation

values for Models #1 and #2.

accuracy: 80.59% when T = 15 minutes. Therefore

our recommendation is to use a combined approach

to permit

• faster detection if there are few distinct locations

and

• lower energy consumption due to decreased cal-

culations.

5 CONCLUDING REMARKS

We presented an approach for detecting anomalous

use of mobile devices. Our system uses spatio-

temporal mobility data to build models that have high

anomaly detection accuracy. Combining the spatio-

temporal model (for users with few locations) and

trajectory-based model (for users with many loca-

tions) allowed an average attack detection rate of

81%, with a latency of 15 minutes. The simplicity

73%

74%

75%

76%

77%

78%

79%

80%

81%

5 7 9 11131517192123252729

Accuracy

NumberofDistinctLocations

5min

15min

30min

60min

Figure 12: Detection accuracy according to number of dis-

tinct locations.

Figure 13: Anomaly detection elapsed time according to

sample interval.

of the resulting user models resulted in an efﬁcient

anomaly detection process supporting an average de-

tection time 0.02 seconds, as shown in Figure 13. A

comparison between our results and those of existing

systems is given in Table 2.

In the future, we plan to expand this study to

cover additional mobile computing data sources such

as phone and application logs in order to determine

the change in detection accuracy when more user-

speciﬁc data are acquired.

ACKNOWLEDGEMENTS

We thank Prof. Ruoming Jin for his insightful com-

ments. This work was supported in part by the Na-

tional Science Foundation under awards TC-0964545,

CNS-0720691, CNS-0347941, and CNS-0910952.

PECCS 2011 - International Conference on Pervasive and Embedded Computing and Communication Systems

122

REFERENCES

Alvares, L. O., Bogorny, V., Kuijpers, B., de Macedo,

J. A. F., Bart, B., and Vaisman, A. (2007). A model for

enriching trajectories with semantic geographical in-

formation. In GIS ’07: Proceedings of the 15th Inter-

national Symposium on Advances in Geographic In-

formation Systems. ACM.

Chen, L. D. (2008). A model of consumer acceptance

of mobile payment. International Journal of Mobile

Communications, 6(1):32–52.

Dodge, S., Weibel, R., and Forootan, E. (2009). Revealing

the physics of movement: Comparing the similarity of

movement characteristics of different types of moving

objects. Computers, Environment and Urban Systems,

33(6):419–434.

Eagle, N., Pentland, A., and Lazer, D. (2007). Infer-

ring social network structure using mobile phone data.

Proceedings of the National Academy of Sciences

(PNAS), 106(36):15274–15278.

GadgetTrak (2010). GadgetTrak System. http://www.

gadgettrak.com/.

omez, L. I., Kuijpers, B., and Vaisman, A. (2008). Query-

ing and mining trajectory databases using places of

interest. In Annals of Information Systems, volume 3.

Gonz

alez, M. C., Hidalgo, C. A., and Barab

asi, A. L.

(2008). Understanding individual human mobility

patterns. Nature, 453:479.

uting, R. H. and Schneider, M. (2005). Moving Objects

Databases. Morgan Kaufmann.

Hadjieleftheriou, M., Kollios, G., Bakalov, P., and Tsotras,

V. J. (2005). Complex spatio-temporal pattern queries.

In VLDB 05: Proceedings of the 31st International

Conference on Very Large Databases. ACM.

Hall, J., Barbeau, M., and Kranakis, E. (2005). Anomaly-

based intrusion detection using mobility proﬁles of

public transportation users. In WiMob’05: Proceed-

ings of the Wireless and Mobile Computing, Network-

ing and Communications. IEEE.

Hung, C., Chang, C., and Peng, W. (2009). Mining tra-

jectory proﬁles for discovering user communities. In

LBSN ’09: Proceedings of the 2009 International

Workshop on Location Based Social Networks. ACM.

Jeung, H., Liu, Q., Shen, H. T., and Zhou, X. (2008). A

hybrid prediction model for moving objects. In ICDE

’08: Proceedings of the 2008 IEEE 24th International

Conference on Data Engineering. IEEE.

LaptopCop (2010). Laptop Cop Software.

http://www.laptopcopsoftware.com/index.html.

Monitoring, M. S. (2010). Windows mobile security moni-

toring software. http://www.recoverycop.com/index.

html.

Mouza, C. D. and Rigaux, P. (2005). Mobility patterns.

GeoInformatica, 9(4):297–319.

OSM (2010). Open street map. http://www.OpenStreetMap

.org.

Rhee, I., Shin, M., Hong, S., Lee, K., and Chong, S. (2008).

On the Levy-Walk nature of human mobility. In IN-

FOCOM ’08: Proceeding of the IEEE Conference on

Computer Communications. IEEE.

Sun, B., Yu, F., Wu, K., Xiao, Y., and Leung, V. (2007).

Enhancing security using mobility-based anomaly de-

tection in cellular mobile networks. In IEEE Transac-

tions on Vehicular Technology, volume 55, pages 1385

–1396.

Thornton, P. and Houser, C. (2004). Using mobile phones

in education. In WMTE ’04: Proceedings of the 2nd

IEEE International Workshop on Wireless and Mobile

Technologies in Education. IEEE.

Trestian, I., Ranjan, S., Kuzmanovic, A., and Nucci, A.

(2009). Measuring serendipity: Connecting people,

locations and interests in a mobile 3g network. In IMC

’09: Proceedings of the 9th ACM SIGCOMM Confer-

ence on Internet Measurement. ACM.

Xie, K., Deng, K., and Zhou, X. (2009). From trajectories to

activities: a spatio–temporal join approach. In LBSN

’09: Proceedings of the 2009 International Workshop

on Location Based Social Networks, Seattle, Washing-

ton. ACM.

Yan, G., Eidenbenz, S., and Sun, B. (2009). Mobi–

watchdog: You can steal, but you can’t run! In WiSec

’09: Proceedings of the Second ACM Conference on

Wireless Network Security. ACM.

Yazji, S., Chen, X., Dick, R. P., and Scheuermann, P.

(2009). Implicit user re-authentication for mobile de-

vices. In UIC ’09: Proceedings of the 6th Inter-

national Conference on Ubiquitous Intelligence and

Computing, pages 325–339. Springer-Verlag.

PROTECTING PRIVATE DATA ON MOBILE SYSTEMS BASED ON SPATIO-TEMPORAL ANALYSIS

123