Use of Machine Learning for Expanding Realistic and Usable Routes for

Data Analysis on Sustainable Mobility

Fabian Schirmer

, Andreas Freymann

, Anamaria Cristescu

and Niklas Geisinger

Fraunhofer IAO, KEIM, Esslingen, Germany

Keywords:

Neuronal Networks, Cloaking, Data Protection, Location Data, Mobility, Machine Learning.

Abstract:

The current mobility or the transition to more sustainable alternatives are constantly changing. For Promoting

a sustainable mobility and for investing in a proper infrastructure, we need accurate data regarding the mo-

bility behavior. Gathering location information such as GPS can help to improve the charging infrastructure

and the bicycle or pedestrian paths. This motivates the citizens to use sustainable means of transportation such

as bicycles or electric cars. However, using personal information via GPS data can cause some challenges:

preserving data privacy while keeping data quality to get useful analysis results. This paper presents an ad-

vanced approach of processing GPS data based on machine learning and spatial cloaking in contrast to current

approaches focusing on common algorithms only. The evaluation has been conducted by generating simulated

GPS trips. As a result, the presented approach provides an algorithm that prevents a complete loss of useful

data while protecting the privacy of each user in cases where cloaking areas are close together.

1 INTRODUCTION

One of the biggest challenges that governments must

face worldwide, is to achieve the Paris Agreement

goals, which demand to keep the temperature below

2°C and reduce emissions (Pan et al., 2017). The

transport sector is a crucial intervention ﬁeld since

it plays a main role in global carbon emission (Ort-

meyer and Pillay, 2001). In fact, almost 85% of trans-

port emissions can be assigned to road travel (Ritchie,

2020). In Germany, sustainability of the transport sec-

tor has become a major concern of the government

(Merkel, 2018). Germany is one of the most motor-

ized countries globally, where the private usage of the

car is up to 43% due to the total number of trips (Bun-

desministerium f

ur Verkehr und digitale Infrastruktur,

2020). As an example, Chemnitz, a German city, is

no exception, where the private usage of a car is be-

ing a preferred mode of transport (Chemnitz, Stadt der

Moderne, 2020).

Promoting sustainable development in the trans-

port sector is, therefore, an essential topic for science

as well. According to a recent research on transporta-

https://orcid.org/0000-0002-7032-8242

https://orcid.org/0000-0002-3735-4545

https://orcid.org/0000-0002-5299-1972

https://orcid.org/0000-0003-0224-8213

tion planning measures, a better infrastructure can

motivate citizens to swift to more sustainable modes

of transport which comprises, e.g., separated bicycle

paths increase passengers’ readiness to use the bicy-

cle up to 55% (Wardman, 2007). (Martens, 2007)

demonstrates that high-quality cycling infrastructure

increases bike-and-ride mobility behavior. This im-

plies that satisfaction with the urban mobility infras-

tructure is crucial for shifting to sustainable modes

of transport, such as public transport, walking, or cy-

cling. Therefore, this requires further studies on pas-

sengers ‘mobility behavior and needs, if we want to

design the infrastructure that facilitates the transport

mode shift effectively.

The city of Chemnitz recently launched the

project New Urban Awareness of Mobility in Chem-

nitz (NUMIC) to address the issue of mobility shift

(Chemnitz, Stadt der Moderne, 2020). Within this

project, a smartphone application has been developed

to track trips of the citizens of Chemnitz via GPS. By

sharing the GPS information of their trips, the citizens

can contribute to the understanding of the mobility

behavior patterns, which would help to increase the

sustainability of the mobility infrastructure. GPS data

collection has become a popular method for the evalu-

ation of mobility behavior (Niu et al., 2014). Such ap-

proach can be used to connect the information about

the infrastructure with speciﬁc location data. Another

156

Schirmer, F., Freymann, A., Cristescu, A. and Geisinger, N.

Use of Machine Learning for Expanding Realistic and Usable Routes for Data Analysis on Sustainable Mobility.

DOI: 10.5220/0010395201560163

In Proceedings of the 6th International Conference on Internet of Things, Big Data and Security (IoTBDS 2021), pages 156-163

ISBN: 978-989-758-504-3

feature of the application is that the user can make

suggestions for the infrastructure improvements for

an exact GPS location. The tracking of the entire trip

and the location-based notiﬁcations can then be used

to analyze the mobility behavior and needs. Based

on the results, planners can improve and develop the

existing mobility infrastructure. However, the main

challenge in this context is to ensure the data privacy

because tracking of GPS trips usually comprises im-

portant personal information. A possible data protec-

tion solution is to reduce the precision of the GPS

trips, which can be done by cloaking the start point

and end point of each trip. This is done by stan-

dard cloaking algorithms that replace GPS informa-

tion of the trips within a speciﬁc area. This solution

prevents the possibility of extracting personal infor-

mation, and as a result, it does not allow the identiﬁ-

cation of a tracked person. However, even if the use

of this standard cloaking algorithm protects the pri-

vacy, it inevitably leads to the loss of the most relevant

GPS data. This paper presents an advanced approach

that preserves relevant information of such GPS trips

while simultaneously ensures the data protection and

privacy of each tracked person. This approach sug-

gests an algorithm, which expands GPS data points of

a trip into another free deﬁned cloaking areas to blur

the start and the end of a trip. We call this algorithm

Expanded Cloaking Algorithm (EXCL-Algorithm).

We applied this algorithm to two speciﬁc scenarios

to demonstrate that this algorithm maintains relevant

data while ensuring data privacy. In the upcoming

chapters, we describe these evaluation scenarios. The

paper is structured as follows. Section 2 focuses on

the related publications and provides insights into rel-

evant topics, including cloaking algorithms and data

privacy. Section 3 describes the problem scenarios in-

cluding the use of a standard cloaking algorithm. The

privacy conditions are explained in section 4. Section

5 presents the solution to cloak the GPS tracks based

on EXCL-Algorithm. Section 6 summarizes the re-

sults and outcomes of the suggested solution and, ﬁ-

nally, Section 7 closes this paper by providing the

conclusion and an outlook on future work.

2 RELATED WORK

Tracking has become a prominent IT ﬁeld, which

comprises various technologies and methods devel-

oped to date. Especially widespread is the usage

of location-based data in our daily life (Zhu et al.,

2013). However, the data protection associated with

tracking has become a controversial topic since GPS

data can reveal highly personal information(Trujillo-

Rasua and Domingo-Ferrer, 2015; Feng and Timmer-

mans, 2017). This poses a challenge for the develop-

ers of tracking services because the data needed for

the analysis has, in many cases, to be protected. In

the case of the NUMIC project, the mobility track-

ing application requires sensitive data on commuters’

daily routes to analyze the mobility patterns and in-

frastructure deﬁciencies; some parts of this data must

be protected to avoid identifying the persons who par-

ticipate. This section discusses several common ap-

proaches to data privacy protection, namely data ag-

gregation, k-anonymity, cloaking and extension.

Data aggregation brings some information to a

higher level by removing speciﬁc identiﬁers. How-

ever, a well-known weakness of this approach is that

speciﬁc information can be re-identiﬁed, e.g. by en-

riching aggregations with other data (Sweeney, 2012).

This approach moves the users’ identity away

from the original position (Zhu et al., 2013). Clau-

dio et al. (2005) address this issue in the context

of location-based services (i.e. navigation services

for weather forecasting or location-based marketing),

which represent sensitive private information. Their

research contribution regards a framework that evalu-

ates the risk of using pseudonyms to protect sensitive

data and presents using k-anonymity as a more secure

approach. Niu et al. (2014) use the k-anonymity for

location-based services to increase privacy by offer-

ing an algorithm based on dummy locations. The re-

sult also shows an increase in protecting the privacy

level. However, the probability that all persons have

similar sensitive data increases if data from many per-

sons are collected. In that case, the k-anonymity lacks

in anonymity (Claudio et al., 2005). In the speciﬁc

case described in this paper, k-anonymity is not an

appropriate approach since it is not determinable how

many users participate. Since the number of app users

in the NUMIC project is not estimable, this approach

is for the evaluation of the collected data not suitable.

Another technique to protect data privacy is spa-

tial cloaking (Jeansoulin et al., 2010; Chow and Mok-

bel, 2011), which is also acknowledged in the context

of location-based services (Wang and Wang, 2010).

Chow et al. (2011) describe a common approach that

blurs sensitive information into a cloaking area. They

contribute to it by offering a spatial cloaking algo-

rithm designed for mobile peer-2-peer environments,

which can increase the scalability, privacy protection,

and effectiveness of the algorithm. Ghinita et al. (Gh-

inita et al., 2007) go one step further and present a

distributed architecture for anonymous location-based

queries, which should ﬁll in the gap of existing solu-

tions. They claim, for example, that centralized so-

lutions cause a bottleneck related to the amount of

Use of Machine Learning for Expanding Realistic and Usable Routes for Data Analysis on Sustainable Mobility

157

location-based queries. Furthermore, Wang and Wang

(2010) also claim that traditional spatial cloaking ap-

proaches lack a central service where all information

comes together. Thus, they address an in-device spa-

tial cloaking and propose using cloud service to get

relevant information on the cloak region

The literature review leads us to a central

dilemma: the necessary data protection automati-

cally causes a greater inaccuracy of the GPS data,

which in turn decreases the informative value of the

data. Hohet al. addressed this issue by presenting

an uncertainty-aware path cloaking algorithm using a

novel time-to-confusion metric in the context of GPS

trips of vehicles (Hoh et al., 2007).

Similarly, Scheider et al. (2020) displayed and ex-

amined a strategy to obfuscate GPS trips by extending

the trip. In addition, they introduced a method for

simulated crowding. These methods have numerous

advantages, but also a few disadvantages. Firstly, the

length of a trip increases by 50%. Secondly, it re-

quires more than a doubled runtime.

Thus, this paper builds on the research ﬁndings

of Scheider et al. (2020) and adds the spatial cloak-

ing (Wang and Wang, 2010) to eliminate disadvan-

tages of each algorithm. The spatial cloaking by itself

deletes signiﬁcant GPS data needed for further anal-

ysis, whereas the simulated crowding in the study of

Scheider et al. produces too much unrealistic data (?).

However, the combination of these two methods out-

weighs their disadvantages.

3 PROBLEM DEFINITION

Figure 1 shows a running example of how a com-

mon spatial cloaking can be applied. It illustrates a

GPS trip between a start and a target area in the city

of Chemnitz, Germany. Additionally, Feedback mes-

sages from the user can be added to the trip. While

the path between these two areas is visible, the exact

GPS points within the start area and target area have

been blurred out (i.e., GPS points have been deleted)

into cloak regions. This deletion generally enables the

privacy protection of a user who uses, for example,

a smartphone application to record a trip. The start

and target points of the trip are not determinable any-

more. This makes it impossible for an outsider to map

the trip to a home, a working place, or other visited

places of a speciﬁc person. However, the main trip

between the start and the target area has been main-

tained, which can be used to connect a text message

to a speciﬁc location on the path.

The size of each cloaking area depends on the

population density. Large cloaking areas represent the

Figure 1: Spatial cloaking.

urban areas with a lower population density, whereas

small cloaking areas represent the urban areas with

a higher population density. The described scenario

shows a common way to use cloaking. However, in

some speciﬁc scenarios that can occur in this context,

the start and target points might be too close to each

other. Based on the use of the standard cloaking algo-

rithm, this results in a vast loss of GPS data or even

in a deletion of the entire trip. Two scenarios can be

identiﬁed, which are affected by the described prob-

lem of being to close to each other. They will be de-

scribed in the following.

Figure 2: Scenario one: One area.

Scenario One - One Area: In some cases, especially

for short trips, there is a high probability that the start

and target points of a trip lie within a single cloak-

ing area. More precisely, the start point and the target

point of the trip would be the same and are respec-

tively congruent (see Figure 2). If we use the com-

mon cloaking algorithm and the user remains in this

cloaking area, all data points would be deleted, and

no GPS data points would lie between these two ar-

eas. The only information left is the red area.

Figure 3: Scenario two: bordering areas.

Scenario Two - Areas Adjoin Each Other: Another

scenario is when the start point and the target point

IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security

158

lie in different cloaking areas, but these areas are di-

rectly adjoined to each other as it is shown in Figure

3. Similar to Scenario one, there would be no GPS

data points between these two areas since all the data

points would be deleted by the common cloaking al-

gorithm. As a result, the important analysis could not

be performed. Since two areas replace the tracked

GPS data, more data is lost.

For both mentioned reasons, we created a unique

cloaking algorithm (EXCL-Algorithm) to handle such

situations by preserving GPS data as well as protect-

ing the privacy.

4 PRIVACY CONDITIONS

For training and testing the EXCL-Algorithm, GPS

trips were generated through the method of simula-

tion by considering the data protection regulations.

We used the open source software tool SUMO (Sim-

ulation of Urban MObility) (Sumo, 2020) since it en-

ables to create realistic GPS trips. This includes for

example, waiting times due to trafﬁc lights or multi-

modal trafﬁc (using a combination of several different

modes of transportation). Finally, this tool enables to

test different modes of transportation as well as to test

the plausibility of such generated and expanded trips.

For the simulation of GPS trips, two random adja-

cent areas within the city of Chemnitz have been cho-

sen. The map data is retrieved from OpenStreetMap

(OSM) and imported into the SUMO model. For sce-

nario one, one single area is used, whereas in scenario

two, two cloaking areas lie directly next to each other.

For each of those two scenarios, 20 trips simulated by

SUMO are used. Several irregular trips have also been

included in the scenarios (e.g., trips that start or end in

dead-end roads) in order to verify the algorithm even

for such complex cases. Furthermore, each scenario

comprises ten trips simulated by vehicles and ten trips

simulated by pedestrians as they pass parks and other

pedestrian-only areas. The routes of the trips have

various lengths, ranging from 24 meters in scenario

one to 973 meters in scenario two.

With specifying the trips as input for SUMO, the

simulation results, ﬁnally, comprises the position of

each vehicle or pedestrian. To obtain realistic GPS

trips, the exported data is then thinned out to only in-

clude data points with a random distance between 0.5

and 3 meters to each other, which results in having

between 15 to 632 data points per trip. These differ-

ent distances reﬂect the stop-and-go behavior of each

user. Table 1 shows the average distance between

two consecutive data points (column: “Avg. dist.”),

the range of total length (column: “Length”), and the

number of data points (column: “Points”) of the trips

for the two scenarios, each further divided into car

(denoted as C) and pedestrian routes (denoted as P).

Table 1: Overview of the simulated data.

Scenario Length Points Avg. dist.

One (C) 303 - 831 m 177 - 460 1.73 m

One (P) 24 - 692 m 15 - 365 1.54 m

Two (C) 135 - 870 m 79 - 367 1.72 m

Two (P) 52 - 973 m 36 - 632 1.53 m

5 SOLUTION APPROACH

As mentioned before, using the common cloaking ap-

proach for both presented scenarios, all data points

within the cloaking areas would be deleted. One the

one hand, this ensures data protection as it makes

it impossible to retrieve any information from the

trips anymore. On the other hand, detailed data anal-

ysis is impossible. This section presents the Ex-

panded Cloaking Algorithm (EXCL-Algorithm) that

preserves the single GPS data points while simultane-

ously preserving the data privacy.

One important characteristic of the EXCL-

Algorithm is the expansion of the available GPS trip

(red bubbles), which is shown in 4. For that, the

EXCL-Algorithm creates new cloaking areas (blue

rectangles) where new artiﬁcial start and target points

will be located. These artiﬁcial points are essen-

tial, because they expanse the original start and tar-

get points with artiﬁcial start and target points (green

bubbles) in order to blur the original start and target

points into just GPS points without having a starting

or ending character. This results in protecting private

information such as private addresses. In compari-

son to the common cloaking algorithm, which would

delete all original GPS data points (red bubbles), the

EXCL-Algorithm creates due to the additional arti-

ﬁcial GPS data points (green bubbles) new artiﬁcial

cloaking areas (blue rectangles): start and target area.

As a result, the data points between these new two

areas can be maintained and used for further analy-

sis and studies. The protection of private data is also

granted, since the trip cannot be linked to a speciﬁc

person. Furthermore, the EXCL-Algorithm enables

that it is not apparent which data points are artiﬁcially

generated and which are real. The only information

about each of the GPS trips is that it was manipulated

by an expanded algorithm.

The cloaking of each GPS using the EXCL-

Algorithm is done in two directions: from the start

as well as from the target point of the trip. For that

Use of Machine Learning for Expanding Realistic and Usable Routes for Data Analysis on Sustainable Mobility

159

Figure 4: Scenario Two: Two areas.

a new artiﬁcial starting point as well as a target point

are generated by adding and expanding a new starting

and target cloaking area. Finally, both sides of each

GPS trip are then expanded until the origin cloaking

area (red area) is left. The new GPS track consists of

a new artiﬁcial start point and a new artiﬁcial target

point which both lie within a new and differing cloak-

ing area (blue rectangles). All GPS points within the

origin cloaking area are preserved.

Figure 5: Steps of the EXCL-Algorithm.

Figure 5 gives an overview of the main steps of

the EXCL-Algorithm explained in the respective sub-

section. Generally, the outcome of these steps are the

artiﬁcially generated GPS points of a trip. The ﬁrst

step (S1) is responsible for generating new artiﬁcial

GPS data for the new cloaking areas with the help of

machine learning (see subsection 5.1). The result of

the ﬁrst step will then be checked afterwards in step

two (S2) which checks the realistic trajectory and step

three (S3) which is responsible for checking the area

trajectory. If S1 or S2 fails, the previous step is active

again. The checking of S2 and S3 includes the check-

ing of the distance between each of the trajectories

and whether the original area is deleted (see subsec-

tion 5.2 and 5.3). The results are then added to the

start and to the target of the existing trip in the last

step 4 (S4), which enables the possibility to use the

whole trip for further data analysis.

5.1 Generating New GPS Data (S1)

Step S1 is responsible for generating new artiﬁcial

GPS data, which is essential to expand the original

trip into new deﬁned cloaking areas. Generating new

GPS data in addition to the existing ones is done with

one of the machine learning algorithms of section

5.1.2. Thus, the following section deals, ﬁrstly, with

structuring of the input data, which is needed for the

machine learning algorithms and, secondly, with the

comparison of different machine learning algorithms.

5.1.1 Input Data

The input data (i.e., GPS data points from the orig-

inal GPS trip) is always different due to the differ-

ent length of each trip. Large areas may allow more

GPS data than in smaller areas. Moreover, it is im-

portant to consider the distances between the GPS

points. Due to the different modes of transportation,

distances from GPS points vary signiﬁcantly. Regrad-

ing the training data, it is structured in a way that the

data point d(t) is used to estimate the next data point

d(t+1). This training method is used for the whole

GPS data. Figure 6 shows the procedure how the in-

put data is processed by the EXCL-Algorithm. The

black arrow on the upper side stands for the course of

the training.

Figure 6: Structuring the input data.

The last GPS point, which is represented by the

blue rectangle in Figure 6, is used for the ﬁrst check of

the realistic trajectory to control the distance between

the GPS points in S2. Using this structure enables the

machine learning algorithm to get a meaning of the

overall trajectory.

5.1.2 Comparison of Different Machine

Learning Algorithms

Based on the above mentioned input data, differ-

ent machine learning algorithms (e.g., Neural Net-

works, Linear Regression, or Long Short-Term Mem-

ory) need to be trained and evaluated. The compar-

ison is done based on accuracy and realistic values

(S2 and S3). In addition, the amount of training data

is required to generate the GPS data points. For this

reason, different lengths are simulated, as shown in

Table 1. The machine learning algorithms are based

on supervised learning, where the training data repre-

sents the existing GPS data points. The decision of

appropriate output data of the algorithm is based on

checks of the approximately same distance between

the GPS data and the leaving of the original area.

For our EXCL-Algorithm we compare three ma-

chine learning algorithms for generating new GPS

data: Linear Regression (LR), neural network (NN)

and Long Short-Term Memory(LSTM). Table 2

IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security

160

shows these machine learning algorithms in terms of

their accuracy and duration of training in epochs. The

values in the table are based on the lowest amount of

training data. Increasing the training data will pro-

duce slightly better values for model accuracy and the

number of epochs needed for good results. Finally,

we picked the best training accuracy for the ﬁrst try to

expand the existing trip.

Table 2: Overview machine learning.

Criterion LR NN LSTM

Model accuracy 10

−8

−9

Epochs - 40 20

Each machine learning algorithm is equipped with

Early Stopping, which stops the training if the accu-

racy starts to increase.

Linear Regression. Linear Regression always pro-

duces well results for expanding each trip. Linear Re-

gression uses the method of least-squares, which is

the method of ﬁnding the best-ﬁtting line for the train-

ing data by minimizing the sum of the squares of the

vertical deviations from each training point to the line.

(Seber and Lee, 2012, p. 35 ff.)

Neuronal Network. The general neural network

consists of an input-layer, one or more hidden lay-

ers, and an output layer. Each of the layers, in turn,

consists of neurons. (G

eron, 2018, p. 253 ff.)

∑

i=1

· w

(1)

Calculation of the neurons (G

eron, 2018, p. 253 ff.)

The output depends on the inputs x, weights w

(see equation 1), and the activation function for each

neuron. The activation function is linear. The opti-

mizer is Adam (a method of stochastic optimization)

and the loss function is mean squared error. This spe-

ciﬁc choice is based on the most accurate result for

the output of the model.

Long Short-Term Memory. One strength of the

Long Short-Term Memory (LSTM) is extracting use-

ful information about historical records. These cells

learn over a long-range dependency to forecast the

data. Most of the other algorithms still have prob-

lems with such a behavior. (Hua et al., 2019) We used

it in the following case to get a meaning of the whole

trajectory and predict realistic values. The optimizer

chosen is Adam and the loss function is mean squared

error.

5.2 Check Realistic Trajectory (S2)

After the artiﬁcially generating GPS points in S1, we

conducted the feasibility. The purpose of the ﬁrst

check (S2) is to ensure that the generated GPS data

points are as realistic as possible (the distances be-

tween each of the GPS points are nearly the same). If

the ﬁrst validation is successfully done the next step

S3 is checked. If not, the generation of new GPS

data points will generate a new GPS point. The GPS

data points should be generated in a realistic manner,

so that the generated GPS points should be close to

each other in the same way as the data given. This

means that the distances (d) between each GPS point

are nearly identically to the other points follow after-

wards (see equation 2).

d(t + 1) − d(t) ≈ d(t + 2) − d(t + 1)... (2)

5.3 Check Area Trajectory (S3)

If the generated GPS data points for the start and tar-

get area are added and the ﬁrst check (S2) was suc-

cessful, the next check (S3) will be performed. The

second check (S3) examines whether the origin area

is left or not. This enables preserving the whole trip

and tries at the time to minimise the artiﬁcially gen-

erated GPS data points as much as possible without

losing its data. The expansion of each GPS trip must

be as short as possible. The automated generation of

new GPS points is exactly speciﬁed. If a new area is

entered, the generation of new GPS data is ﬁnished.

However, if the newly entered areas at the beginning

of the trip and at the end of the trip lie directly next

to each other (scenario two, see ﬁgure 3), the gener-

ation of the GPS points is done again. Equally if the

beginning and the end of the trip lie in the same zone

(scenario one, see ﬁgure 2), the generation is done

again, too.

while(GPSPointWithinOriginArea){

generateNewGPSPoints();

}

if (startArea==targetArea){

startNewGenerationGPSPoints();

}

Above listing shows the logic of the second check

(S3). As long as the current generated GPS points

stay within the original area, the generation of new

GPS points will continue. If the origin area is left, the

generation of new GPS points is over. If the start and

target area are the same, the generation of new GPS

points will start from the beginning to get different

areas for the start and target area.

Use of Machine Learning for Expanding Realistic and Usable Routes for Data Analysis on Sustainable Mobility

161

5.4 Generated New GPS Data (S4)

S4 does not only create a new artiﬁcial GPS data.

To make it more difﬁcult to differentiate between the

original GPS data and the artiﬁcially generated GPS

data, S4 also adds a small noise, as simulated data al-

ways seems identical. Bringing in a small deviation

creates a more realistic data and complicates it to dis-

tinguish from the original data. Scenarios like waiting

at trafﬁc lights or bus stops produce more GPS data

at one speciﬁc point than running at a higher pace.

These both scenarios are also taken into account.

6 RESULTS

The results show that the EXCL-Algorithm can be

performed with one of the presented machine learn-

ing algorithms of section 5.1.2. They produce similar

results and provide realistic new trips indistinguish-

able from real trips. Unlike the common cloaking al-

gorithm, the EXCL-Algorithm preserves the data of

both presented scenarios that can be used for further

analysis and studies and simultaneously protects the

privacy as the trip cannot be mapped to a speciﬁc

user. This also means that additional data such as

the feedback assigned to each trip are not deleted in

the scenario one and scenario two. In particular, trips

with smaller distances can now be used for subse-

quent analysis. In the following the results and bene-

ﬁts of the EXCL-Algorithm will be presented for each

of the two scenarios described in Section 3.

6.1 Extrapolation Scenario One

Figure 8 shows the trip expanded with the EXCL-

Algorithm. In comparison to Figure 7, the entire trip

will be preserved. Normally, all data points lying in

the red area will be deleted. So for further analysis

only the red area is available.

Figure 7: Initial situation:

scenario one.

Figure 8: Result of sce-

nario one using EXCL-

Algorithm.

Due to the use of the EXCL-Algorithm the trip

still can be used because the EXCL-Algorithm deﬁnes

a new start and target area (yellow areas in Figure 8)

of the trip. For an outsider, it is impossible to distin-

guish between the artiﬁcially generated GPS data and

the original tracked GPS data from the users.

6.2 Extrapolation Scenario Two

Figure 9: Initial situation:

scenario two

Figure 10: Result of sce-

nario two using EXCL-

Algorithm

As well as scenario one, scenario two can also be

optimized through our EXCL-Algorithm. Figure

10 demonstrates the result of the EXCL-Algorithm.

Compared to the initial consideration (see Figure 9),

two completely new areas (yellow territories see ﬁg-

ure 10) are entered, which are the new starting area

(the top left corner of Figure 10) and the target area

(the lower right corner of Figure 10). All data points

within the red area and orange area (Figure 9) are pre-

served. Normally the whole data points in Figure 9

cannot be used for the mobility analysis. Due to the

use of our EXCL-Algorithm the complete number of

data points can now be used for further work.

7 CONCLUSION

In this paper, we presented an expanded cloaking al-

gorithm, called EXCL-Algorithm, based on super-

vised learning to expand real GPS data points of a trip

by adding new artiﬁcial start and target points within

new deﬁned cloaking areas. The EXCL-Algorithm

enables to ensure the privacy of each user as well as

preserves the main trip data in reference to two spe-

ciﬁc scenarios where the original start points and tar-

get points are close together. The essential advantage

of this method is to protect trip data against deletion

in compliance with data protection legislation.

The core of the EXCL-Algorithm consists of four

steps. They comprise, for example, the examination

of different machine learning algorithms in order to

generate suitable artiﬁcial GPS data points. The ver-

iﬁcation of the artiﬁcially generated GPS points is

done by focusing on the distance between the GPS

data points as well as by ensuring that new areas are

entered to prevent that scenario one (only one area

without any GPS data) or scenario two (two areas di-

rectly lie to each other but no GPS data) occur again.

These veriﬁcation steps, ﬁnally, enable to evaluate our

EXCL-Algorithm based on two speciﬁc scenarios.

IoTBDS 2021 - 6th International Conference on Internet of Things, Big Data and Security

162

The content of this research refers to a generic

view to all modes of transports. However, this means

that the generation of new GPS data is not only based

on the existing car or bicycle routes. The generated

data can for example vary from roads or enter green

areas as well. So, for an outsider, it is still impossible

to distinguish between artiﬁcially generated data or

real data because all modes of transportation can be

combined for tracking the routes in the application.

Finally, the EXCL-Algorithm preserves data that

can be used for further studies and protects the pri-

vacy of each user who contributes by tracking the

daily trips. Further studies can focus to generate only

those GPS data points lying on roads for cars or pave-

ments for pedestrians depending on the provided data.

This studies would precise the EXCL-Algorithm and

its data generation and would focus on only one mode

of transportation.

ACKNOWLEDGEMENTS

This paper is based on the research and develop-

ment joint project NUMIC - New Urban Awareness

of Mobility in Chemnitz. It is founded by the Fed-

eral Ministry of Education and Research (BMBF) and

the European Social Fund under the grant number

01UR1804A. The responsibility for this publication

lies with the authors.

REFERENCES

Bundesministerium f

ur Verkehr und digitale Infrastruk-

tur (2020). Mobilit

at in Deutschland (MiD). ttps:

//www.bmvi.de/SaredDocs/DE/Artikel/G/ mobilitaet-

in-deutschland.html. [Online, accessed 2-November-

2020].

Chemnitz, Stadt der Moderne (2020). Mobilit

atsverhalten

- Untersuchung “Mobilit

at in St

adten”. URL:

https://www.chemnitz.de/chemnitz/de/unsere-stadt/

verkehr/verkehrsplanung/mobilittsverhalten/index.

html. [Online, accessed 30-October-2020].

Chow, C.-Y. and Mokbel, M. F. (2011). Privacy of spatial

trajectories. In Computing with spatial trajectories,

pages 109–141. Springer.

Claudio, B., Sean Wang, X., and Jajodia, S. (2005). Pro-

tecting privacy against location-based personal iden-

tiﬁcation. Jonker, Petkovi

c (Hg.) 2005 – Secure Data

Management, page 185–199.

Feng, T. and Timmermans, H. J. (2017). Using recurrent

spatio-temporal proﬁles in gps panel data for enhanc-

ing imputation of activity type. Big Data for Regional

Science, pages 121–130.

Ghinita, G., Kalnis, P., and Skiadopoulos, S. (2007). Pro-

ceedings of the 16th international conference on world

wide web. page 371.

eron, A. (2018). Machine Learning mit Scikit-Learn und

Tensorﬂow. O’Reilly, Heidelberg.

Hoh, B., Gruteser, M., Xiong, H., and Alrabady, A. (2007).

Preserving privacy in gps traces via uncertainty-aware

path cloaking. page 161.

Hua, Y., Zhao, Z., Li, R., Chen, X., Liu, Z., and Zhang,

H. (2019). Deep learning with long short-term mem-

ory for time series prediction. IEEE Communications

Magazine, 57(6):114–119.

Jeansoulin, R., Papini, O., Prade, H., and Schockaert, S.

(2010). Methods for handling imperfect spatial infor-

mation, volume 256. Springer.

Martens, K. (2007). Promoting bike-and-ride: The dutch

experience. Transportation Research Part A: Policy

and Practice, 41(4):326–338.

Merkel, D. A. (2018). Rede von bundeskanzlerin dr. angela

merkel beim ix. petersberger klimadialog am 19. juni

2018 in berlin. Bulletin, die Bundesregierung, 68.

Niu, B., Li, Q., Zhu, X., Cao, G., and Li, H. (2014). Achiev-

ing k-anonymity in privacy-aware location-based ser-

vices. IEEE INFOCOM 2014 2014, page 754–762.

Ortmeyer, T. H. and Pillay, P. (2001). Trends in transporta-

tion sector technology energy use and greenhouse gas

emissions. pages 1837–1847.

Pan, X., den Elzen, M., H

ohne, N., Teng, F., and Wang,

L. (2017). Exploring fair and ambitious mitigation

contributions under the paris agreement. pages 49–56.

Ritchie, H. (2020). Cars, planes, trains: where

do CO2 emissions from transport come

from? URL: ttps://ourworldindata.org/

co2-emissions-from-transport.\newblock[Online,

accessed28-October-2020].\bibitem[Sceider et al.,

2020]ExtendTrack Scheider, S., Wang, J., Mol, M.,

Schmitz, O., and Karssenberg, D. (2020). Obfus-

cating spatial point tracks with simulated crowding.

International Journal of Geographical Information

Science, 34(7):1398–1427.

Seber, G. and Lee, A. (2012). Linear Regression Analysis.

Wiley Series in Probability and Statistics. Wiley.

Sumo (2020). Simulation of urban mobility. URL: eclipse.

org/sumo/. [Online, accessed 07-September-2020].

Sweeney, L. (2012). k-anonymity: A model for protecting

privacy: International journal of uncertainty, fuzziness

and knowledge-based systems, 10(05), 557-570.

Trujillo-Rasua, R. and Domingo-Ferrer, J. (2015). Privacy

in spatio-temporal databases: A microaggregation-

based approach. 567:197–214.

Wang, S. and Wang, X. S. (2010). In-device spatial cloaking

for mobile user privacy assisted by the cloud. pages

381–386.

Wardman, M., T. M. . P. M. (2007). Factors inﬂuencing the

propensity to cycle to work. Transportation Research

Part A: Policy and Practice, 41(4):339 – 350.

Zhu, X., Chi, H., Niu, B., Zhang, W., Li, Z., and Li, H.

(2013). Mobicache: When k-anonymity meets cache.

pages 820–825. IEEE.

Use of Machine Learning for Expanding Realistic and Usable Routes for Data Analysis on Sustainable Mobility

163