Explainable Federated Learning for Taxi Travel Time Prediction

Jelena Fiosina

Institute of Informatics, Clausthal Technical University, Julius-Albert Str. 4, D-38678, Clausthal-Zellerfeld, Germany

Keywords:

FCD Trajectories, Trafﬁc, Travel Time Prediction, Federated Learning, Explainability.

Abstract:

Transportation data are geographically scattered across different places, detectors, companies, or organisations

and cannot be easily integrated under data privacy and related regulations. The federated learning approach

helps process these data in a distributed manner, considering privacy concerns. The federated learning archi-

tecture is based mainly on deep learning, which is often more accurate than other machine learning models.

However, deep-learning-based models are intransparent unexplainable black-box models, which should be

explained for both users and developers. Despite the fact that extensive studies have been carried out on

investigation of various model explanation methods, not enough solutions for explaining federated models

exist. We propose an explainable horizontal federated learning approach, which enables processing of the dis-

tributed data while adhering to their privacy, and investigate how state-of-the-art model explanation methods

can explain it. We demonstrate this approach for predicting travel time on real-world ﬂoating car data from

Brunswick, Germany. The proposed approach is general and can be applied for processing data in a federated

manner for other prediction and classiﬁcation tasks.

1 INTRODUCTION

Most real-world data are geographically scattered

across different places, companies, or organisations,

and, unfortunately, cannot be easily integrated under

data privacy and related regulations. This is especially

topical for the transportation domain, in which ubiq-

uitous trafﬁc sensors and Internet of Things create a

world-wide network of interconnected uniquely ad-

dressable cooperating objects, which enable exchange

and sharing of information. With the increase in the

amount of trafﬁc, a large number of available decen-

tralised data is available.

Fuelled by a large amount of data collected in

various domains and the high available computing

power analytical procedures and statistical models for

data interpretation and processing rely on methods

of artiﬁcial intelligence (AI). In recent years, a large

progress of AI has been achieved. These data-driven

methods replace complex analytical procedures by

multiple calculations. They are easily applicable and,

in most cases, more accurate considering their ma-

chine learning (ML) ancestry. The accuracy and inter-

pretability are two dominant features present in suc-

cessful predictive models. However, more accurate

black-box models are not sufﬁciently explainable and

https://orcid.org/0000-0002-4438-7580

transparent. This feature of AI-driven systems com-

plicates the user acceptance and can be troublesome

even for model developers.

Therefore, the contemporary AI technologies

should be capable of processing these data in a de-

centralised manner, according to the data privacy reg-

ulations. Moreover, the algorithms should be maxi-

mally transparent, which makes the decision-making

process user-centric.

Federated learning is a distributed ML approach,

which enables model training on a large corpus of de-

centralised data (Konecn

y et al., 2016). It has three

major advantages: 1) it does not need to transmit the

original data to the cloud, 2) the computational load

is distributed among the participants, and 3) it as-

sumes synchronisation of the distributed models with

a server for more accurate models. The main assump-

tion is that the federated model should be parametric

(e. g., deep learning) because the algorithm synchro-

nises the models by synchronising the parameters. A

known limitation of deep learning is that neural net-

works inside it are unexplainable black-box models.

Numerous model-agnostic and model-speciﬁc (e.g.,

Integrated gradients, DeepLIFT) methods for expla-

nation of black-box models are available (Kraus et al.,

2020). Distributed versions of these methods exist,

which allow them to be executed on various processes

670

Fiosina, J.

Explainable Federated Learning for Taxi Travel Time Prediction.

DOI: 10.5220/0010485606700677

In Proceedings of the 7th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2021), pages 670-677

ISBN: 978-989-758-513-5

on graphics processing units (GPUs)

. However, to

the best of our knowledge, studies on the explanation

of geographically distributed federated deep learning

models are lacking.

The mentioned challenges are topical in the trans-

portation domain, in which the generation and pro-

cessing of big data are necessary. We propose

a privacy-preserving explainable federated model,

which achieves a comparable accuracy to that of the

centralised approach on the considered real-world

dataset. We predict the Brunswick taxi travel time

based on ﬂoating car data trajectories obtained from

different taxi service providers, which should remain

private. The proposed model makes predictions for

the stated problem and allows a joint learning pro-

cess over different users, processing the data stored

in each of them without exchanging their raw data,

but only parameters, as well as providing joint expla-

nations about variable importance.

We address several research questions. 1) Which

is the most accurate ML prediction method for the

given data in a centralised manner? We identify the

best hyper-parameters for each method. 2) Under

which conditions federated learning is effective? We

distribute the dataset among various providers, and

analyse after which point the distributed and non-

synchronised models lose their accuracy and feder-

ated learning is beneﬁcial. We deﬁne an optimal syn-

chronisation plan for parameter exchange, identifying

the hyper-parameters and frequency of parameter ex-

change that is acceptable and beneﬁcial. 3) Do exist-

ing black-box explanation methods successfully ex-

plain federated learning models? We investigate how

the state-of-the-art explainability methods can explain

federated models.

The rest of the paper is organised as follows. Sec-

tion 2 describes the state-of-the-art. Section 3 de-

scribes the proposed explainable federated deep learn-

ing concept and parameter synchronisation mech-

anisms. Section 4 introduces the available data,

presents the experimental setup, and provides insights

into the data preprocessing step. Section 5 presents

the model validation and experimental results.

2 STATE OF THE ART

2.1 Distributed Data Analysis

The large amount of contemporary generated data

in the transportation domain requires application of

state-of-the-art methods of distributed/decentralised

https://captum.ai/

data analyses as well as investigation of novel ap-

proaches capable of processing distributed data often

with data privacy requirements.

When data centralisation is available, accurate

prediction models can be developed, which address

the big data challenge through smart ‘artiﬁcial’ par-

titioning and parallelisation of data and computation

within a cloud-based architecture or powerful super

computers (Fiosina and Fiosins, 2017).

Often, data should be physically and logically dis-

tributed without transmission of big information vol-

umes, without the need to store, manage, and pro-

cess massive datasets in one location. This approach

enables a data analysis with smaller datasets. How-

ever, scaling it up requires novel methods to efﬁ-

ciently support the coordinated creation and main-

tenance of decentralised data models. Speciﬁc de-

centralised architectures (e.g., multi-agent systems

(MAS)) should be implemented to support the de-

centralised data analysis, which requires a coordi-

nated suite of decentralised data models, including

parameter/data exchange protocols and synchronisa-

tion mechanisms among the decentralised data mod-

els (Fiosina et al., 2013a).

The MAS based representation of transportation

networks helps overcome the limitations of cen-

tralised data analyses, which will enable autonomous

vehicles to make better and safer routing decisions

(Dotoli et al., 2017). Various cloud-based architec-

tures for intelligent transportation systems were pro-

posed (Khair et al., 2021). A cloud-based architec-

ture, which focuses on decentralised big transporta-

tion data analyses was presented in (Fiosina et al.,

2013a).

Travel-time is an important parameter of trans-

portation networks, which accurate prediction helps

to reduce delays and transport delivery costs, im-

proves reliability through better selection of routes

and increases the service quality of commercial deliv-

ery by bringing goods within the required time win-

dow (Ciskowski et al., 2018). A centralised deep

learning based approach to the estimation of travel-

time for ride-sharing problem was discussed in (Al-

Abbasi et al., 2019).

Often proper travel time forecasting model needs

a pre-processing such as data ﬁltering and ag-

gregation. Travel-time aggregation models (non-

parametric, semi-parametric) for decentralized data

clustering and corresponding coordination and pa-

rameter exchange algorithms were researched in

(Fiosina et al., 2013b). Travel-time estimation and

forecasting using decentralized multivariate linear

and kernel-density regression with corresponding pa-

rameter/data exchange was proposed in (Fiosina et al.,

Explainable Federated Learning for Taxi Travel Time Prediction

671

2013a).

Initial requirements and ideas for methods of de-

centralised data analysis development in the trans-

portation domain operating with big data ﬂows have

been identiﬁed (Fiosina et al., 2013b). The impact

of incorporating decentralised data analysis methods

into MAS-based applications taking into account in-

dividual user routing preferences has been assessed

(Fiosina and Fiosins, 2014).

2.2 Federated Learning

Federated learning (Konecn

y et al., 2016) was pro-

posed by Google and continues the research line

of distributed data analyses, which focuses mainly

on development of privacy-preserved ML models for

physically distributed data. When the isolated dataset

used by each company cannot create an accurate

model, the mechanism of federated learning enables

access to more data and better training of models.

Federated learning enables different devices to collab-

oratively learn a shared prediction model while main-

taining all training data on the device, without the

need to store the data in the cloud. The main differ-

ence between federated learning and distributed learn-

ing is attributed to the assumptions on the properties

of the local datasets, as distributed learning originally

aims to parallelise the computing power, whereas fed-

erated learning originally aims to train on heteroge-

neous datasets (Konecn

y et al., 2016). This approach

may use a central server that orchestrates the different

steps of the algorithm and acts as a reference clock

or they may be peer-to-peer, where no such central

server exists. In this study, we use a central server

for this aggregation, while local nodes perform local

training (Yang et al., 2019).

The general principle consists of training local

models on local data samples and exchanging param-

eters (e.g., the weights of a deep neural network) be-

tween these local models at some frequency to gener-

ate a global model (Bonawitz et al., 2019). To ensure

a good task performance of a ﬁnal central ML learn-

ing model, federated learning relies on an iterative

process broken down to an atomic set of client–server

interactions referred to as federated learning round

(Yang et al., 2019). Each round of this process con-

sists of transmitting the current global model state to

participating nodes, training local models on these lo-

cal nodes to produce a set of potential model updates

at each node, and aggregating and processing these

local updates into a single global update and applying

it to the global model.

We consider N data owners {F

}

i=1

, who wish to

train an ML model by consolidating their respective

Figure 1: Architecture of horizontal federated learning.

data {D

}

i=1

. A centralised approach uses all data

together D = ∪

i=1

to train a model M

. A feder-

ated system is a learning process in which the data

owners collaboratively train a model M

, where any

data owner F

does not expose its data D

to others.

In addition, the accuracy of M

, denoted as V

should be very close to the performance of M

, V

(Yang et al., 2019). Each row of the matrix D

repre-

sents a sample, while each column represents a fea-

ture. Some datasets may also contain label data. The

feature X , label Y , and sample Ids I constitute the

complete training dataset (I, X, Y ). The feature and

sample space of the data parties may not be identi-

cal. We classify federated learning into horizontal,

vertical, and federated transfer learning based on the

data distribution among various parties. In horizontal

or sample-based federated learning data sets share the

same feature space but different in samples. In verti-

cal or feature-based federated learning data sets share

the same sample ID space but differ in feature space.

In federated transfer learning data sets differ in both

sample and feature space, having small intersections.

In this study, we develop a joint privacy-

preserving model of travel time forecasting of differ-

ent service providers based on the horizontal feder-

ated learning architecture. The training process of

such a system is independent on speciﬁc ML algo-

rithms. All participants share the ﬁnal model parame-

ters (Figure 1).

2.3 Explainable AI

Conventional ML methods, such as linear regression,

decision trees, and support vector machine, are inter-

pretable in nature. Typically, highly accurate complex

deep learning-based black-box models are favoured

over less accurate but more interpretable conventional

ML models. Extensive studies have been carried out

in the past few years to design techniques to make AI

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

672

methods more explainable, interpretable, and trans-

parent to developers and users (Molnar, 2020). Join-

ing such methods in hybrid systems (e.g., ensem-

bling) further increases their explainability and ob-

tained accuracy, (Holzinger, 2018). AI for explain-

ing decisions in MAS was discussed in (Kraus et al.,

2020).

Model-agnostic and model-speciﬁc explanation

methods have been reported (Molnar, 2020). Model-

agnostic methods, such as LIME, Sharpley Values,

are implementable for each model. However, they re-

quire a large number of computations and often are

not applicable for big datasets used in deep learn-

ing (Molnar, 2020). Sharpley values method was

implemented in (Wang, 2019) to interpret a vertical

federated learning model. Model-speciﬁc methods

are more suitable for deep learning, which focus on

only one type of model and are more computationally

effective (Molnar, 2020), e.g., Integrated Gradients

(Sundararajan et al., 2017), and DeepLIFT (Shriku-

mar et al., 2017), (Ancona et al., 2018). These meth-

ods have an additive nature, which enables comput-

ing them in a distributed manner across processors,

machines, or GPUs. For example, the explainabil-

ity scores can be calculated on the participants’ local

data, and then accumulated at the server together

In this study, we focus on integrated gradients,

which represent the integral of gradients with respect

to inputs along the path from given baseline to input.

3 EXPLAINABLE FEDERATED

LEARNING

We propose a federated model explaining strategy and

illustrate it on a travel time prediction problem. Our

aim is to describe the application of state-of-the-art

explainability methods to federated learning, while

maintaining data privacy. We apply the federated

learning architecture and explainability methods to

the focal problem and consider what information and

how often should be exchanged. Moreover, the ap-

plication of each explainability method to a concrete

task only produces baseline results because the result

interpretation is speciﬁc to the particular task or ap-

plication at hand (Fiosina et al., 2019).

Let N participants {F

}

i=1

own datasets {D

}

i=1

as previously deﬁned. For the federated learning

process, each participant F

divides its dataset D

T R

∪ D

T E

into training set D

T R

and test set D

T E

We train the models on D

T R

and calculate attribution

scores on D

T E

. {M

}

i=1

are the participants’ local

https://captum.ai/

Figure 2: Explainable federated learning architecture.

models, while M

is the federated model. As we

consider learning on batches, M

<epoch,batch>

is the lo-

cal model of the participant F

for the current epoch

and batch of data, while M

<epoch,batch>

is the cur-

rent federated model. w

epoch,batch

are the current pa-

rameters of the model M

<epoch,batch>

: w

epoch,batch

w(M

<epoch,batch>

). The training process is described

in Algorithm 1. Note, that the synchronisation should

not appear each batch, so a logical variable synchro-

nisation supervises this process. FedAverage() is a

parameter synchronisation procedure at server side,

which, in the simplest case, is an average value, cal-

culated for each parameter over all local models. We

start the variable explanation process when the fed-

erated training process is ﬁnished and a copy of the

common federated model M

of each F

is lo-

cally available. The scoring algorithm is one of the

explainability methods mentioned above.

4 EXPERIMENTAL SETUP

We predict the Brunswick taxi travel time based on

ﬂoating car data (FCD) trajectories obtained from two

different taxi service providers. The data were avail-

able for the period of January 2014 - January 2015.

Each raw data item contains: car identiﬁer, time mo-

ment, geographical point and status (free, drives to

client, drives with client, etc.). The data were received

approximately every 40 s. We consider the map of

Brunswick and surrounding as a box with coordinates

in the latitude range of 51.87

◦

−52.62

◦

and longitude

range of 10.07

◦

− 11.05

◦

(Figure 3).

To evaluate whether the travel times depend on

weather conditions, we used corresponding histori-

cal data about rains, wind, temperature, atmospheric

pressure, etc. However, good-quality weather data for

the given region were available only until the end of

Explainable Federated Learning for Taxi Travel Time Prediction

673

Algorithm 1: Explainable federated learning train-

ing process.

Result: Trained M

model

Deﬁne initial w

for M

, epoch=1 ;

while The loss function does not converge do

foreach batch of data do

foreach F

in parallel do

Train(M

<epoch,batch>

, D

T R,batch

);

if synchronisation then

sends w

<epoch,batch>

to the

server;

if synchronisation then

Server averages the parameters and

broadcasts them: w

<epoch,batch>

FedAverage(w

<epoch,batch>

);

foreach F

in parallel do

receives updated parameters

from the server and updates its

model:

<epoch,batch>

= M

<epoch,batch>

;

epoch = epoch + 1

Training process is over. Last obtained

<epoch,batch>

is the ﬁnal model: M

of each F

;

foreach participant F

in parallel do

foreach instance j of D

T E

dataset do

Calculates attribution scores: sc

i, j

ScoringAlgorithm(M

T E

i, j

);

calculates its average scores and sends the

result to the server: sc

i∗

∑

i, j

T E

;

Server aggregates the participant scores and

broadcasts the result: sc

∑

i∗

T E

|∪

T E

;

foreach participant F

in parallel do

Each F

updates its attribution scores:

i∗

= sc

;

September 2014, so that we reduced our trajectory

dataset accordingly.

We developed a multi-step data pre-processing

pipeline. We constructed a script to transform the data

into trajectories according to time points, locations,

and car identiﬁer. The raw trajectories were then anal-

ysed to determine their correctness. We split trajecto-

ries with long stay periods into shorter trips. Round

trips with the same source and destination were sepa-

rated into two trips. Some noisy unrealistic data with

probably incorrect global positioning system (GPS)

signals, with incorrect status, or unrealistic average

speeds (less than 13 km or more than 100 km), were

also removed. After this cleaning, the number of tra-

jectories was 542066.

Figure 3: Road network of Brunswick and surrounding

We then connected the trajectories with the open

street map and obtained a routable graph. Addition-

ally, we divided the map into different size grids (e.g.,

200m×200m) to determine whether this aggregation

can improve our forecasts. Therefore, we knew to

which zone the start and end points of each trip be-

long. Moreover, we found the nearest road graph

node to the source and destination of each trip and

calculated the shortest-path distance. To improve the

prediction model and ﬁlter the noisy data, we calcu-

lated the distance between the start and end points of

each trip according to the FCD trajectory. If the dis-

tances of the shortest path trajectory and FCD trajec-

tory were considerably different, we analysed the trip

more closely and divided it into a couple of more re-

alistic trips, excluding false GPS signal places.

We stored the raw data, trip data, weather data,

and graph data obtained from open street map (roads

and nodes) in the PostgreSQL database.

For visual presentation of trips, their sources and

destinations as well as graph representation of roads

network of Brunswick we used QGIS

(Figure 3).

We predicted the travel time using different methods

(Table 1) and found the corresponding best hyper-

parameters by the grid search.

The forecasting was based on the following fac-

tors: coordinates of the start zone and end zone (200m

× 200m), FCD distance, transformed (with sine and

cosine) weekday and hour, as well as temperature, air

pressure, and rain. We forecasted the travel time in

seconds. We divided the dataset into training (80%)

and test (20%) sets. The dataset was normalised with

MinMax Scaler before the application of the above

methods. We used the mean squared error (MSE)

as an efﬁciency criterion and 5-fold cross validation

for model comparison. The accuracy with an MSE of

.0010 corresponds to 5 min, while that with an MSE

of .0018 to 7.5 min. We used Python programming

language, PyTorch for deep learning models, PySyft

library

for the federated learning (Ryffel et al., 2018)

https://www.qgis.org/

https://pysyft.readthedocs.io/

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

674

Table 1: Optimal model hyper-parameters.

Model Hyper-parameters

Regression

Linear (no);

Ridge (α = 0.09);

Lasso (α = 1e − 9)

XGBoost

colsample bytree = 0.7;

learning rate = 0.12;

max depth = 9, α = 15;

n estimators = 570

Random forest

num trees=100;

max depth and min samples leaf

are not restricted

Deep learn.

fully conn. percentron

with 2 hidden layers,

number of neurons: 64-100

Re-Lu act. function;

0.2 dropout between hidden layers;

optimiser SGD;

MSE loss function;

NN batchsize=128, epochs=800;

learning rate=0.02

Federated learn.

synchronisation each 2nd batch,

NN batch size is proportional to

the size of each provider’s dataset,

the sum of all provider’s

NN batch size= 128.

and captum

library to interpret the models.

5 EXPERIMENTS

5.1 Alternative Predictors

We identiﬁed the best ML prediction model (Table

2). For a single data provider (centralised approach),

the best results were obtained by the XGBoost and

random forest methods (.00097 and .0010). Conven-

tional regression methods such as linear, Lasso and

Ridge regressions provided the same inaccurate re-

sults. Deep learning exhibited a slightly lower perfor-

mance than those of the best models. The application

of federated deep learning was impossible because of

a single data owner.

Our next task was to determine under which con-

ditions federated deep learning was effective. De-

spite the fact that XGBoost and random forest meth-

ods provided the most accurate results for the cen-

tralised approach, federated learning could be imple-

mented only on parametric models like deep learn-

ing. Unfortunately it was unknown, which data be-

longed to which provider. Thus we randomly dis-

tributed the data among the providers and this led to

the assumption of identically distributed and equally

https://captum.ai/

Table 2: MSE of travel time prediction with different ML

methods.

Model Number of data providers

1 4 8 16 32

Linear,

.0019 .0019 .0019 .0020 .0020

Lasso,

Ridge

regression

XGBoost .00097 .0011 .0012 .0012 .0013

Random

.0010 .0011 .0012 .0012 .0013

forest

Deep

.0011 .0012 .0013 .0014 .0015

learning

Federated

— .0011 .0011 .0011 .0011deep

learning

sized local datasets, which in federated learning is of-

ten not true. Then, we executed various ML models

locally on each provider without synchronisation. Fi-

nally, we analysed after which point the distributed

nonsynchronised models lose their accuracy and fed-

erated learning outperformed other locally executed

ML methods.

The next columns of Table 2 show the MSE of

travel time prediction on distributed data. The average

accuracy of all models except federated deep learning

was reduced. The federated approach led to the same

result as that of the centralised deep learning. There-

fore, the accuracy of the federated model was compa-

rable to those of XGBoost and random forest. Start-

ing at eight data providers, federated learning became

beneﬁcial because it does not lose its accuracy. With

more data providers, the beneﬁts of federated learn-

ing became more evident. The MSE of the federated

model’s prediction remained constant, 0.0011.

Another important parameter that inﬂuences the

accuracy of the federated model is the batch size. In

our centralised deep learning model, the optimal so-

lution was obtained with batch size = 128 or smaller.

The experiments show that with the increase in the

batch size, the computing speed increases, but the ac-

curacy of the model decreases. This implies that, to

obtain the same accuracy by federated learning, we

have to distribute the batches proportionally among

the providers. Accordingly, with eight data providers,

with equally sized datasets, the batch size will be

128/8 = 16.

5.2 Synchronisation of Models

We investigated the effect of the synchronisation fre-

quency on the accuracy of the federated model. The

accuracy decreased with the step-wise decrease in the

synchronisation frequency (Table 3); we aimed to in-

vestigate this tendency. With synchronisation per-

Explainable Federated Learning for Taxi Travel Time Prediction

675

formed in each batch or even each second batch, the

accuracy remained the same as that of the centralised

approach. The accuracy decreased with a rarer syn-

chronisation.

Table 3: MSE of travel time prediction for different syn-

chronisation frequencies.

Synchronisation each n-th batch Av.

for eight providers MSE

1, 2 .0011

3, 4 .0012

5 .0013

5.3 Explaining of the Federated Model

In this section, we compare the results of variable

importances for local, federated and centralised ap-

proaches. We have chosen Integrated gradients ex-

plainability method as an explainability scoring algo-

rithm because of its simplicity and speed. Moreover

the baseline in this algorithm was taken equal to the

average value of each feature.

Figure 4 contains variable importance calculated

with the federated model for each of eight data

providers locally using their test data. Despite the fact

that the main tendency in variable importance by all of

eight providers remains the same, the locally obtained

results differ from the importance scores, calculated

with all test data. This may lead to inaccurate ex-

plainability by some local providers, especially with

small testset sizes. The proposed importance scores

averaging mechanism (Algorithm 1) allows to avoid

this inaccuracy without transmitting the local testsets.

0.005

0.01

0.015

0.02

0.025

0 1

2 3

4 5

6 7

distance

cos/sin

hour

cos/sin

weekday

temp

rain

Data Providers

wind

lon

lat

lon

start zone

end zone

lat

Figure 4: Explainability of individual models.

Figure 5 presents variable importances calculated

for centralised and federated learning approaches us-

ing aggregated test datasets (centralised) or aggrega-

tion of scores (federated), which led to the same re-

sults. We observe that without raw data transfer our

approach allows more accurate calculation of variable

importance than one each provider can obtain using

only its local test set.

Our next task is to investigate, which parameters

0.005

0.01

0.015

0.02

0.025

Federated

Centralised

distance

start zone

lon

lat

cos/sin

hour

end zone

sin/cos

weekday

temp

rain

wind

lon

lat

Figure 5: Explainability of federated vs centralised ap-

proach.

have the biggest inﬂuence on the results. According

to Figures 4 and 5 the most important variable for all

the models was FCD distance, which was expected.

The next important variables were zones’ coordinates,

sine and cosine of the traveling hour and day of the

week. The division of the map into zones improved

the predictions. However, despite of our expectations,

almost all weather parameters do not signiﬁcantly in-

ﬂuence the predictions.

6 CONCLUSION

We analysed Brunswick taxi FCD data trajectories for

travel time prediction with different prediction meth-

ods. We identiﬁed that XGBoost was the best predic-

tion model for the centralised approach, which pre-

dicted the corresponding travel-time with an MSE of

0.00097. In the case of distributed data providers,

starting at eight providers with equal distributions of

data sources, the federated approach outperformed

XGBoost and random forest methods if they were

executed locally by each data provider. Upon the

synchronisation executed on each second batch, the

corresponding federated deep learning model did not

lose its accuracy compared to the centralised model.

We proposed an approach to explain the horizontal

federated learning model with explainability meth-

ods without transferring raw data to the central server.

This enabled more accurate determination of the most

important prediction variables of the federated model.

We illustrated the considered approach on a travel

time prediction problem using a horizontal federated

learning architecture. For calculation simplicity in

our experiments all providers’ local datasets were of

the same size and the data were identically distributed

in all datasets, which is often not true in real-world

problems. We are planning to consider not equally

sized and not identically distributed datasets in our

future experiments. We assume that the participants

VEHITS 2021 - 7th International Conference on Vehicle Technology and Intelligent Transport Systems

676

with smaller datasets will have more beneﬁts from the

accurate prediction model. A particularly beneﬁcial

case could be if we suppose that each taxi’s dataset

is conﬁdential and should be processed locally (self-

interested ride providers) as in (Ramanan et al., 2020).

Despite the fact, that the proposed explainable

federated approach does not depend on the explain-

ability method, only one explainability method (Inte-

grated gradients) was analysed. We plan to compare

different explainability methods and their combina-

tions as our future work. We believe that this ap-

proach could be successfully implemented for more

complex prediction and classiﬁcation tasks with more

complex deep learning and federated learning archi-

tectures. For example, we aim to implement explain-

able federated learning for trafﬁc demand forecasting

models.

ACKNOWLEDGEMENTS

The research was funded by the Lower Saxony Min-

istry of Science and Culture under grant number

ZN3493 within the Lower Saxony “Vorab“ of the

Volkswagen Foundation and supported by the Center

for Digital Innovations (ZDIN).

REFERENCES

Al-Abbasi, A. O., Ghosh, A., and Aggarwal, V. (2019).

Deeppool: Distributed model-free algorithm for ride-

sharing using deep reinforcement learning. IEEE

Transactions on ITSs, 20(12):4714–4727.

Ancona, M., Ceolini, E.,

Oztireli, C., and Gross, M. (2018).

Towards better understanding of gradient-based attri-

bution methods for deep neural networks. In In Proc.

of 6th Int. Conf. on Learning Representations, ICLR.

Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Inger-

man, A., Ivanov, V., Kiddon, C., Konecn

y, J., Maz-

zocchi, S., McMahan, H. B., Overveldt, T. V., Petrou,

D., Ramage, D., and Roselander, J. (2019). Towards

federated learning at scale: System design. CoRR,

abs/1902.01046.

Ciskowski, P., Drzewinski, G., Bazan, M., and Janiczek, T.

(2018). Estimation of travel time in the city using neu-

ral networks trained with simulated urban trafﬁc data.

In DepCoS-RELCOMEX 2018, volume 761 of Ad-

vances in Intelligent Systems and Computing, pages

121–134. Springer.

Dotoli, M., Zgaya, H., Russo, C., and Hammadi, S. (2017).

A multi-agent advanced traveler information system

for optimal trip planning in a co-modal framework.

IEEE Transactions on Intelligent Transportation Sys-

tems, 18(9):2397–2412.

Fiosina, J. and Fiosins, M. (2014). Resampling based mod-

elling of individual routing preferences in a distributed

trafﬁc network. Int. Journal of AI, 12(1):79–103.

Fiosina, J. and Fiosins, M. (2017). Distributed nonpara-

metric and semiparametric regression on spark for big

data forecasting. Applied Comp. Int. and Soft Com-

puting, 2017:13.

Fiosina, J., Fiosins, M., and Bonn, S. (2019). Explainable

deep learning for augmentation of sRNA expression

proﬁles. Journal of Computational Biology. (to ap-

pear).

Fiosina, J., Fiosins, M., and M

uller, J. (2013a). Big data

processing and mining for next generation intelligent

transportation systems. Jurnal Teknologi, 63(3).

Fiosina, J., Fiosins, M., and M

uller, J. (2013b). Decen-

tralised cooperative agent-based clustering in intelli-

gent trafﬁc clouds. In Proc. of 11th German Conf. on

MAS Technologies, volume 8076 of LNAI, pages 59–

72. Springer.

Holzinger, A. (2018). Explainable AI (ex-AI). Informatik

Spektrum, 41:138–143.

Khair, Y., Dennai, A., and Elmir, Y. (2021). A survey on

cloud-based intelligent transportation system, icaires

2020. In AI and Renewables Towards an Energy Tran-

sition, volume 174 of LNNS. Springer.

Konecn

y, J., McMahan, H. B., Ramage, D., and Richt

arik,

P. (2016). Federated optimization: Distributed ma-

chine learning for on-device intelligence. CoRR,

abs/1610.02527.

Kraus, S., Azaria, A., Fiosina, J., Greve, M., Hazon, N.,

Kolbe, L., Lembcke, T., M

uller, J., Schleibaum, S.,

and Vollrath, M. (2020). AI for explaining decisions in

multi-agent environment. In AAAI-2020. AAAI Press.

Molnar, C. (2020). Interpretable Machine Learning. A

Guide for Making Black Box Models Explainable.

Lulu.com.

Ramanan, P., Nakayama, K., and Sharma, R. (2020). Bafﬂe:

Blockchain based aggregator free federated learning.

CoRR, abs/1909.07452.

Ryffel, T., Trask, A., Dahl, M., Wagner, B., Mancuso, J.,

Rueckert, D., and Passerat-Palmbach, J. (2018). A

generic framework for privacy preserving deep learn-

ing. CoRR, abs/1811.04017.

Shrikumar, A., Greenside, P., and Kundaje, A. (2017).

Learning important features through propagating ac-

tivation differences. In Proc. of ML Research, vol-

ume 70, pages 3145–3153, Int. Convention Centre,

Sydney, Australia. PMLR.

Sundararajan, M., Taly, A., and Yan, Q. (2017). Ax-

iomatic attribution for deep networks. In Proc. of the

34th Int. Conf. on ML, volume 70 of ICML’17, page

3319–3328. JMLR.org.

Wang, G. (2019). Interpret federated learning with shapley

values. CoRR, abs/1905.04519.

Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019). Federated

machine learning: Concept and applications. ACM

Trans. Intell. Syst. Technol., 10(2).

Explainable Federated Learning for Taxi Travel Time Prediction

677