Transforming Data Flow Diagrams for Privacy Compliance

Hanaa Alshareef

, Sandro Stucki

2 a

and Gerardo Schneider

2 b

Chalmers University of Technology, Gothenburg, Sweden

University of Gothenburg, Gothenburg, Sweden

Keywords:

Privacy by Design, Data Flow Diagrams, GDPR.

Abstract:

Most software design tools, as for instance Data Flow Diagrams (DFDs), are focused on functional aspects

and cannot thus model non-functional aspects like privacy. In this paper, we provide an explicit algorithm

and a proof-of-concept implementation to transform DFDs into so-called Privacy-Aware Data Flow Diagrams

(PA-DFDs). Our tool systematically inserts privacy checks to a DFD, generating a PA-DFD. We apply our

approach to two realistic applications from the construction and online retail sectors.

1 INTRODUCTION

The European General Data Protection Regulation

(GDPR) imposes stringent constraints on how indi-

viduals’ personal data are to be collected and pro-

cessed, stipulating heavy penalties in case of viola-

tions (European Commission, 2016). Complying the

regulation is a hard task and software engineers trying

to meet the required data protection principles often

face a conﬂict between system and privacy require-

ments (Oetzel and Spiekermann, 2014).

An additional difﬁculty is that privacy does not

refer to one particular property but rather to a

set of properties, including conﬁdentiality, secrecy,

data minimisation (DM), privacy impact assessment

(PIA), user consent, the right to be forgotten, pur-

pose limitation, etc. So, it does not make sense to talk

about “privacy compliance” but rather to refer to spe-

ciﬁc privacy properties. But even when restricted to

a speciﬁc privacy property, verifying privacy compli-

ance is in general undecidable (Tsormpatzoudi et al.,

2015; Schneider, 2018).

We therefore advocate the Privacy by Design

(PbD) principle (Cavoukian, 2012), in which any

(computerised) personal data processing environment

should be designed taking privacy into account from

the very beginning of the (software) development pro-

cess It has been argued that PbD is more tractable

than retroﬁtting legacy software for privacy compli-

ance (see e.g. Danezis et al., 2015).

https://orcid.org/0000-0001-5608-8273

https://orcid.org/0000-0003-0629-6853

Still, the implementation of privacy principles

such as PbD, PIA or DM requires a lot of work

from software engineers and developers: they con-

sider such principles to be overly complicated and im-

practical, and they lack the necessary knowledge to

implement them (Senarath and Arachchilage, 2018;

Sirur et al., 2018; Freitas and Mira da Silva, 2018).

Hence, despite having been advocated since the mid-

1990s, PbD has gained momentum only in recent

years, mostly due to the GDPR.

An example is the work by Antignac et al. (2016,

2018), who propose an approach to automatically

add privacy checks already at the design level. The

idea is based on model transformations, enhancing

Data Flow Diagrams (DFDs) with checks for speciﬁc

privacy concepts, notably concerning retention time

and purpose limitation for each operation on sensitive

(personal) data (storage, forwarding, and processing

of data). The enhanced diagram is called a Privacy-

Aware Data Flow Diagram (or PA-DFD for short). In

that proposal the software engineer designs a DFD,

pushes a button to obtain a PA-DFD, inspects it man-

ually, and then generates a program template from the

PA-DFD that guides the programmer in the concrete

implementation of the privacy checks. Antignac et al.

describe their transformation from DFDs to PA-DFDs

through high-level graphical “rules” but provide nei-

ther a full algorithm nor a reference implementation.

The main purpose of our paper is to provide these

missing pieces.

In summary, we make the following contributions.

1. We give algorithms to check and automatically

transform DFDs into PA-DFDs. We identiﬁed

Alshareef, H., Stucki, S. and Schneider, G.

Transforming Data Flow Diagrams for Privacy Compliance.

DOI: 10.5220/0010255002070215

In Proceedings of the 9th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2021), pages 207-215

ISBN: 978-989-758-487-9

207

Customer

Browsing

Amazon

Products

Create

Amazon

Account

Get Customer

Information

Supplier Item

Inventory

Customer Info

Product Info

Request

Create Account

Product Info

Request

Figure 1: Example of a DFD: high-level design of part of

the e-store ordering system.

some ambiguities and inaccuracies in the original

description given in the hotspots’ translation by

Antignac et al. (2016, 2018). (Section 3).

2. We provide an open-source Python implementa-

tion of our algorithms,

which processes DFD di-

agrams in an XML format compatible with the

popular draw.io platform (Section 3).

3. We evaluate our algorithms on two case studies:

an automated payment system and an online retail

system (Section 4).

2 PRELIMINARIES

We recall here relevant GDPR concepts, the deﬁnition

of DFDs, as well as the transformation into PA-DFDs

given by Antignac et al. (2018).

GDPR. The European General Data Protection

Regulation (GDPR) contains 99 articles regulating

personal data processing. The GDPR is organ-

ised around a number of key concepts, most notably

its seven principles of personal data processing, the

rights of data subjects and six lawful grounds for

data processing operations. Relevant to this paper are

the principles of purpose limitation (data may only

be used for purposes to which the data subject con-

sented) and accountability, as well as the right to be

forgotten and the lawful ground of consent. See (Eu-

ropean Commission, 2016) and Hert and Papakon-

stantinou (2016) for more details on the GDPR.

Data Flow Diagrams (DFDs). A data ﬂow dia-

gram (DFD) is a graphical representation of how data

ﬂows among software components. As shown in

Fig. 1, DFDs are composed of activators and ﬂows.

Activators can be external entities (rectangles), pro-

cesses (ellipses) and data stores (double horizontal

lines). Processes may represent detailed low-level

operations or complex high-level functionality that

https://github.com/alshareef-hanaa/PA-DFD-Paper

d’

External

entity

Process

External

entity

External

entity

Limit

pol

Log

pol

d,pol

Request

pol

External

entity

Limit

Request

pol

Log

pol

d,pol

pol

Limit Log

Log

d,pol

d’,pol

d’

d’,pol

Process

Request

pol

pol’

Reason

pol

TransformationsHotspots

Collection

Disclosure

Usage

Figure 2: Selection of B-DFD hotspots and corresponding

PA-DFD elements (Antignac et al., 2018).

could be reﬁned into sub-processes (the latter are

drawn as double-lined ellipses). Data ﬂow is repre-

sented by arrows. We chose DFDs since they are a

widely used for modelling digital systems and for se-

curity and privacy analysis (Shostack, 2014; Wuyts

et al., 2014).

Antignac et al. (2016, 2018) extended DFDs with

a data deletion type of ﬂow and a data structure to

specify personal data: (i) the owner of personal data,

(ii) the purpose for which the data can be used con-

sented by the data subject, and (iii) the retention time

for the data. This extension is referred to as Business-

oriented DFD (B-DFD).

Adding Privacy Checks to DFDs. Antignac et al.

(2016, 2018) aimed at (automatically) add privacy

checks to a B-DFD, obtaining a Privacy-Aware Data

Flow Diagram (PA-DFD) which contains relevant

privacy checks for purpose limitation and retention

time, as well as to ensure accountability and policy

management. They deﬁned hotspots in the B-DFD to

perform the transformation compositionally.

The left-hand side of Fig. 2 shows three types

of hotspots, each deﬁned by a pattern of activators

and ﬂows that corresponds to a basic data process-

ing operation, such as “collection”, “disclosure”, etc.

The right-hand side of Fig. 2 shows, for each B-DFD

hotspot, the corresponding PA-DFD containing new

activators and ﬂows for speciﬁc privacy mechanisms.

Tables 1 and 2 in Antignac et al. (2018) describes

the privacy properties of interest for each hotspot,

MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development

208

pol

Customer

Limit

Request

Log

Customer Info

pol

Customer Info,

pol

Customer Info

pol

Customer Info,

pol

Limit

Customer Info,

Request

pol

Log

Create Account,

Reason

Get Customer

Information

Create

Amazon

Account

Create

Account,

pol

pol’

Log

Create Account

Figure 3: Example of a PA-DFD generated by the old trans-

formation rules

derived from the GDPR. To capture the (new) pri-

vacy checks and to facilitate the transformation, the

set of activator types in PA-DFDs is augmented with

ﬁve different “Process” subtypes: “Limit”, “Reason”,

“Request”, “Log” and “Clean”, each corresponding to

a particular privacy enforcement mechanism. “Limit”

inspects whether the purpose of data processing is

compatible with the data subject consent, which de-

mands a policy from the data subject, given by “Re-

quest”. “Log” is used to create log ﬁles in a Log data

store. The “Reason” activator is used to get an up-

dated policy corresponding to a newly computed data

value. Finally, “Clean” guarantees that personal data

is eliminated from the data store. See Antignac et al.

(2018) for more details about PA-DFDs.

To illustrate Antignac et al.’s transformation, con-

sider the B-DFD shown in Fig. 1 and (part of) its cor-

responding PA-DFD in Fig. 3. In the ﬁgure, “pol”

is a policy related to data “d”. Two rules (collection

and usage) have been applied to a subset of the B-

DFD from Fig. 1 (the part inside the dashed line). The

external entity “Customer” provides “Customer Info”

data and its associated policy “pol”. The data ﬂows to

the “Limit” process which veriﬁes that the data sub-

ject has consented to the use of “Customer Info” for

downstream processing. The consent is speciﬁed in

the policy “pol”, received via the “Request” process.

The data value and its policy are logged by the “Log”

process in the “Log” store.

Note that the PA-DFD in Fig. 3 contains a dan-

gling arrow “pol”: this is an unfortunate side-effect

of the way the original transformation rules were for-

mulated (Fig. 2). This and other shortcomings and

inaccuracies are discussed at the end of Section 3.2.

3 FROM B-DFDS TO PA-DFDS

Here we present our algorithms for transforming

B-DFDs to PA-DFDs. The transformation process

consists of two steps: type-inference followed by the

actual transformation. Type-inference ensures that

the input B-DFD is well-formed before it is trans-

Type-inference Transformation

XML/CSV file

B-DFD

XML/CSV file of

well-formed B-DFD

XML/CSV file of

PA-DFD

Customer

Customers

Database

Customer

Info

Create

Account

Create

Account

Get

Customer

info

pol

Customers

Database

Create

Account

Reaso

Customer

log

Get

Customer

info

LimitREQ

LOG

log

Limit

LOG

REQ

Limit

LOG

Reaso

REQ

log

Polic

pol

PA-DFD

Figure 4: A general architecture of the approach.

formed into a PA-DFD in the second step. Fig. 4

shows the general architecture of our approach.

3.1 Type-inference

The B-DFDs we read from input ﬁles are not nec-

essarily well-formed. They may, for example, con-

nect external entities directly, or contain a data dele-

tion ﬂow connecting two process entities rather than

a process and a data store. Such inconsistencies re-

veal errors made by designers. Our tool detects and

reports such issues. For this purpose, we distinguish

between two kinds of B-DFDs: raw B-DFDs corre-

spond to diagrams read from input ﬁles and may con-

tain inconsistencies; well-formed B-DFDs are free of

inconsistencies and satisfy all the necessary invariants

required by our transformation algorithm.

We represent both kinds of B-DFDs as attributed

multigraphs with activators as nodes and ﬂows as

edges. Attributes allow us to specify properties of

activators and ﬂows, such as their type or associated

privacy information.

Deﬁnition 1. An attributed multigraph G is a tuple

G = (N,F,A,V,s,t,`

) where N, F, A and V

are sets of nodes, edges, attributes and attribute val-

ues, respectively; s,t : F → N are the source and tar-

get maps; `

: N → (A *V) and `

: F → (A *V)

are attribute maps that assign values for the different

attributes to nodes and ﬂows, respectively.

Note that the attribute maps are partial, i.e., nodes and

edges may lack values for certain attributes.

Henceforth, we use the letters n, m to denote nodes

and e, f to denote edges. We write e: n m to indi-

cate that e has source s(e) = n and target t(e) = m. We

use “.” to select attributes, writing n.a for `

(n)(a)

and f .a for `

( f )(a). The set S(G) ⊆ N of source

nodes in G is deﬁned as S(G) = {n | ∃e.s(e) = n};

similarly, T (G) denotes the set of target nodes in G.

A (raw) B-DFD is simply an attributed multigraph

with a ﬁxed choice of attributes A = {type}. The type

attribute describes the type of activators and ﬂows.

Activators can be external entities (ext), processes

Transforming Data Flow Diagrams for Privacy Compliance

209

(proc) and data stores (db); ﬂows are classiﬁed as ei-

ther plain data ﬂows (pf ) or data deletions (df ). Fig. 1

shows an example of a B-DFD with ﬁve activators (an

external entity, a datastore and three processes) that

are connected by plain ﬂows.

Deﬁnition 2. We deﬁne the set of data node types

as T

= {ext,proc,db} and the set of raw ﬂow types

as T

= {pf ,df }. A (raw) B-DFD is an attributed

multigraph G with activators as nodes and ﬂows as

edges, and where we ﬁx A and V to be A = {type},

V = T

] T

. In addition, every activator and ﬂow

must have a type, i.e., n.type ∈ T

and f .type ∈ T

must be deﬁned for all n and f .

Since the type attribute plays an important role in all

DFDs, we introduce shorthands for typing activators

and ﬂows. We write n: t to abbreviate n.type = t, and

f : n

m to indicate that f : n m and f .type = t.

Well-formed B-DFDs differ from raw B-DFDs

primarily in the choice of ﬂow types. Flows are typed

based on their source and target activators. Only some

combinations of sources, targets and ﬂow types are

valid. They are shown on the left-hand side of Fig. 5.

If a ﬂow does not conform to one of these six cases,

it is ill-typed and will be rejected by our type infer-

ence algorithm. In addition to these ﬂow typing con-

straints, we adopt some common rules from the DFD

literature for well-formed B-DFDs: diagrams may not

contain loops (ﬂows with identical source and tar-

get activators) , activators cannot be isolated (discon-

nected from all other activators), and processes must

have at least one incoming and outgoing ﬂow (see e.g.

Falkenberg et al., 1991; Dennis et al., 2018).

Deﬁnition 3. We deﬁne the set of data ﬂow types

as T

= {in, out,comp,store,read,delete}. A well-

formed B-DFD is an attributed multigraph G, where

A = {type} and V = T

] T

. In addition, ﬂows and

activators are subject to the following conditions:

• n.type ∈ T

and f .type ∈ T

;

• if f : n

m then n: ext and m : proc;

• if f : n

out

m then n: proc and m: ext;

• if f : n

comp

m then n: proc, m: proc and n 6= m;

• if f : n

store

m then n: proc and m: db;

• if f : n

read

m then n: db and m : proc;

• if f : n

delete

m then n: proc and m: db;

• if n : proc then n ∈ S(G) and n ∈ T (G)

• if n : ext or n: db then n ∈ S(G) or n ∈ T (G)

The Type-inference algorithm (Algorithm 1) detects

and reports any ill-formed ﬂows (lines 1–12) and ac-

tivators (lines 13–19). If type inference is successful,

the resulting well-formed B-DFD can safely be trans-

formed into a PA-DFD.

Algorithm 1: Type-inference.

input : A raw B-DFD G

output : A well-formed version of G

1 foreach f : m n ∈ F do

2 if f .type = pf then

3 if m: ext ∧ n : proc then f .type ← in;

4 else if m: proc ∧ n : ext then f .type ← out;

5 else if m: proc ∧ n : proc ∧ m 6= n then

6 f .type ← comp

7 else if m: proc ∧ n : db then f .type ← store;

8 else if m: db ∧ n : proc then f .type ← read;

9 else f is ill-formed;

10 else if f .type = df then

11 if m: proc ∧ n : db then f .type ←delete;

12 else f is ill-formed;

13 foreach n ∈ N do

14 if n : proc ∧ (n /∈ S(G) ∨ n /∈ T (G)) then

15 n is ill-formed

16 else if n : ext ∧ (n /∈ S(G) ∧ n /∈ T (G)) then

17 n is ill-formed

18 else if n : db ∧ (n /∈ S(G) ∧ n /∈ T (G)) then

19 n is ill-formed

3.2 Transformation

Well-formed B-DFDs are guaranteed to be well-

formed, but they do not yet contain any explicit pri-

vacy checks. They are introduced by Algorithm 2,

which transforms each ﬂow in the well-formed B-

DFD into a set of corresponding PA-DFD elements

(see Fig. 5). These PA-DFD elements represent the

functionality for enforcing purpose limitation, reten-

tion time, accountability and policy management.

First we add reason activators for each process in

the well-formed B-DFD. These activators are linked

to each other by a special partner attribute. Each

reason activator is assigned to exactly one process via

this attribute. Likewise, we add a new policy_db ac-

tivator to each data store in the well-formed B-DFD

and link them via their partner attributes. The second

phase of the algorithm transforms each ﬂow based on

its type (i.e., the hotspot that it belongs to). We use

dedicated helper procedures to transform each ﬂow

type (e.g., addInElems, which transforms in ﬂows).

The auxiliary procedures introduce the necessary ac-

tivators and ﬂows for checking and logging each data

ﬂow. The partner attributes of the original ﬂow’s

source and target are used to identify the activators

that supply and transfer the required policy values.

As with B-DFDs, we use attributed graphs to rep-

resent PA-DFDs formally.

Deﬁnition 4. Deﬁne the set of policy node types as

= {limit,request,reason, policy_db} and the set

MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development

210

type:in

External

Entity

Process

External

Entity

Process

Limit

Request

Log

pol

d ?

d,pol,v

Log

pol

Reason

type:out

Process

External

Entity

External

Entity

pol

d ?

d,pol,v

pol

Limit

Log

Request

Reason

Log

Process

d,pol,v

pol

d ?

d,pol,v

pol

Limit

Log

Request

Reason

Log

Process Process’

Reason’

type:comp

Process Process

type:store

Data

Process

d,pol,v

Data

pol

d ?

d,pol,v

pol

Limit

Log

Request

Reason

Log

Process

Policy

Clean

pol

ref

d,pol,v

type:read

ProcessData

pol

d ?

d,pol,v

pol

Limit

Log

Request

Reason

Log

Policy

Data

Process

type:delete

ref

Data

Process

ref

pol

ref ?

ref,pol,v

pol

Limit

Log

Request

Reason

Log

Policy

DataProcess

ref,pol,v

pol

Figure 5: Well-formed B-DFD and the updated correspond-

ing PA-DFD elements.

of admin node types as T

= {log,log_db,clean}.

A PA-DFD is an attributed graph G, where A =

{type,partner} and V = T

] T

] N. In

addition, the following must hold:

• n.type ∈ T

] T

and f .type ∈ T

;

• if n.partner is deﬁned, then n.partner ∈ N.

In principle, the ﬂows of PA-DFDs ought to be sub-

ject to similar typing conditions as those for well-

formed B-DFDs. Following the principle used for

well-formed B-DFDs, we could type each ﬂow based

on the types of its source and target. For example, the

ﬂows connecting request to limit activators could be

given type reqlim. This would result in eighteen new

ﬂow types. To simplify presentation, we instead use

just two ﬂow types for PA-DFDs as we did for raw

B-DFDs: plain ﬂows (pf ) and deletion ﬂows (df ).

Algorithm 2: Transformation.

input : A well-formed B-DFD G

output : A PA-DFD

1 foreach n ∈ N do

2 if n: proc then

3 add a new node m: reason to G;

4 m.partner ← n; n.partner ← m

5 if n: db then

6 add a new node m: policy_db to G;

7 m.partner ← n; n.partner ← m

8 foreach f ∈ F do

9 if f : in then addInElems (n, f ,G) ;

10 if f : out then addOutElems (n, f ,G) ;

11 if f : comp then addCompElems (n, f ,G) ;

12 if f : store then addStoreElems (n, f ,G) ;

13 if f : read then addReadElems (n, f ,G) ;

14 if f : delete then addDeleteElems (n, f ,G) ;

Comparison of Transformation Rules. The trans-

formation rules presented in Fig. 2 have a few subtle

but important shortcomings that are addressed in our

Transformation algorithm.

First, the rules do not explain how activators with

multiple input and output ﬂows are to be transformed.

Note that all activators on the left of Fig. 2 have at

most one incoming or outgoing ﬂow. It is unclear

which of the new activators and ﬂows shown on the

right should be added only once per rule application,

and which need to be instantiated for every incom-

ing or outgoing ﬂow. We solve this problem by split-

ting the transformation into two distinct steps. In a

ﬁrst step, we create reason and policy_db nodes as

partners for processes and data stores. In the second

step, each original ﬂow is equipped with the activators

and ﬂows implementing the new privacy checks. This

two-step approach cleanly separates the per-activator

and per-ﬂow aspects of each rule.

Second, the limit and log activators in the orig-

inal rules are set up in a problematic way. Every

limit activator is followed by a log activator that re-

ceives both a policy and a data value. The log acti-

vator logs both values and forwards the data value to

downstream activators. But what if a privacy viola-

tion occurs? The limit activator should inhibit such

violations by blocking unintended ﬂows, passing on

only policy-compliant data values. Hence, policy vi-

olation events never reach the log activator, and are

therefore not logged. This seems highly problematic.

Alternatively limit nodes could pass on all data and

policy values (irrespective of violations), leaving the

log activator to perform the actual ﬁltering. But why

have separate limit and log activators in the ﬁrst place

then? We resolve this ambiguity by connecting limit

activators directly to the downstream activators and,

Transforming Data Flow Diagrams for Privacy Compliance

211

Get Customer

Information

Create Account ?

Request

Log

pol

Create

Amazon

Account

Limit

Create Account

pol

Customer

Limit

Request

Log

Customer Info

pol

Customer Info ?

pol

Customer Info,

pol, v

Reason Get

Customer

Information

Reason

Create

Amazon

Account

Customer Info,

pol, v

Create Account,

pol, v

Create Account,

pol, v

Figure 6: Example of a PA-DFD generated by the updated

rules.

separately, to the log activator. The ﬂow connecting

the limit and log activators carries a special ﬂag v in-

dicating whether a violation took place (see Fig. 5).

Finally, the original “Usage” rule contains a sub-

tle error, which is again related to the way it con-

nects the newly introduced limit and log activators

(see Fig. 2). After the application of this rule, the

process receives a policy value pol in addition to the

data value d it received originally. It passes pol to

the adjacent log activator, presumably without chang-

ing its value, so that any violations related to pol de-

tected by the preceding limit activator can be logged.

But this means that the updated data value d

and the

policy value pol are out of sync. Fig. 3 shows an ex-

ample PA-DFD transformed according to the original

rules that clearly illustrates this issue. The “Get Cus-

tomer Information” process receives the “Customer

Info” and the corresponding policy “pol”, then passes

it to the log activator with the processed data “Create

Account”. This means there is a mismatch between

the logged data “Create Account” and the policy value

“pol” which belongs to “Customer Info”. Our algo-

rithm avoids this problem by separating limiting and

logging, which are added on a per-ﬂow basis, from

processing. Fig. 6 shows the PA-DFD produced by

our Transformation algorithm for the same B-DFD.

Now the “Create Account” ﬂow, after having been

transformed according to the comp rule, is protected

by its own limit and log activators, and there are no

longer any dangling ﬂows.

3.3 Our Tool

We have modiﬁed the hotspots-based translation

given by Antignac et al. (2016, 2018) in order to

address its ambiguities and inaccuracies. Our tool

for transforming B-DFDs into PA-DFDs implements

algorithms 1 and 2, and uses a third-party applica-

tion for drawing the diagrams. Such drawing soft-

ware should support the drawing of DFDs, be user-

friendly, be easy-to-use, be cross-platform and be

open source. draw.io was the tool of our choice

(draw.io, 2019). As draw.io has no built-in support

Construction

Project

1. Capture

Completed

Tasks via

Smart Sensor

Subcontract

Agreement

Status

Completed

sub-tasks

Scope of Works

Project Status

Information

Project

Database

Real-time Location

Information

2. Auto-

assign Status

Data

BIM

3. Validate

Completed

Works

Tracked Progress

Valid/Invalid

Installation

Figure 7: Part of Automated Payment System DFD.

for B-DFDs, we suggest installing Henriksen’s open

source library (Henriksen, 2018). Since it is easy to

import and export diagrams from/to XML format in

draw.io, our implementation processes B-DFD dia-

grams represented in an XML format and generates

PA-DFD diagrams in the same format. Our tool is

implemented in Python and has been tested on a Mac-

Book Pro.

4 CASE STUDIES

To validate our algorithms, we have applied our tool

to models of two realistic applications: an automated

payment system and an online retail system. We il-

lustrate the correctness of our algorithm by running

informal simulations of the two models. Here we fo-

cus on the ﬁrst case study. The second case study is

described in our tech report (Alshareef et al., 2020).

The DFD for the secure payment system con-

sidered here is due to Chong and Diamantopoulos

(2020); it has been reviewed by domain experts and

models a system for making automatic payments to

subcontractors in a construction project.

We start our evaluation by augmenting the origi-

nal DFD shown in Fig. 7 with static (or design-time)

policy information. Table 1 shows an extract: each

ﬂow is assigned a unique identiﬁer (F_id), its Label

(from the DFD), a Purpose (to check against the data

subject consent), a PD ﬂag indicating if the data is

personal, and a Data_type (e.g., “image” or “email”).

Next, we transform the B-DFD thus obtained into

a PA-DFD, parts of which are shown in Fig. 8.

MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development

212

Table 1: Design Time Information for B-DFD Automated Payment System.

F_id Label Purpose PD Data_type

Completed sub-tasks Capturing completed sub-tasks True video, images and string

Scope of Works Knowing subcontractors contractual duties True string

Real-time Location Information Project monitoring True video, images and string

Status Sending up to date project information to IBM True video, images and string

Table 2: Run-time Information for B-DFD/PA-DFD Automated Payment System.

D_id F_id Dsub Pol/Consent Expiry Content Fwd. in B-DFD Fwd. in PA-DFD

SubcontractorX Capturing completed sub-tasks end of 2020 "streaming videos" and "image_1.jpg" Yes Yes

SubcontractorX Identifying assigned tasks end of contract "facade panel installation" Yes Yes

SubcontractorX Recording the work status end of contract "streaming videos" and "image_2.jpg" Yes Yes

ProjectX Assigning project info to BIM end of 2021 "Project info:name, desc, status, subcontract,etc" Yes Yes

SubcontractorY Taking pictures for advertisements end of 2020 "streaming videos" and "image_3.jpg" Yes No

Construction

Project

1. Capture

Completed

Tasks via

Smart Sensor

Subcontract

Agreement

Completed

sub-tasks

Real-time

Location Info ?

Pol,v

Clean1

Log

Limit1

Request1

Pol

Reason p

Capture

Completed

Tasks …

Log1

Completed

sub-

tasks,Pol,v

Completed

sub-

tasks,Pol,v

Completed

sub-tasks

Pol

Policy

Limit2

Request2

Scope of

Works

Pol

Scope of

Works ?

Log2

Log

Scope of Works,

pol, v

Scope of

Works, pol,

Project

DataBase

Policy

Limit3

Request3

Pol

Log3

Log

Pol

ref

Real-time

Location Info,

Real-time

Location

Info, Pol,v

Real-time

Location Info

2. Auto-

assign Status

Data

Reason p

Auto-

assign

Status Data

Pol

Limit4

Request4

Pol

Status

Pol

Status ?

Log4

Log

Status,

Pol,v

Status,

Pol,v

Figure 8: Part of Automated Payment System PA-DFD.

Finally, we perform an informal simulation of the

payment system, illustrating that the PA-DFD gener-

ated by our proof-of-concept tool enforces the desired

GDPR properties (purpose limitation and account-

ability). To run the informal simulation, we assume a

set of dynamic information provided by users during

runtime. An excerpt is shown in Table 2. Each row

consists of a unique data identiﬁer (D_id) with ﬁve

attributes: F_id identiﬁes the ﬂow carrying the data;

DSub the data subject; Pol/Consent the consented pur-

poses; Expiry the expiration time; Content the ac-

tual data. The last two columns of Table 2 indicate

whether the given data values are forwarded to down-

stream activators in the B-DFD and PA-DFD, respec-

tively, during the simulation. They show that the PA-

DFD prevents some data from being processed, while

the B-DFD processes all data regardless of the data

subject’s consent since there are no privacy checks.

Consider, for example, the “Completed sub-tasks”

ﬂow between “Construction Project” and Process 1

in the original DFD. This ﬂow carries sensitive infor-

mation collected by the smart sensor, which needs to

be checked and limited according to the subcontrac-

tor’s privacy policy (following the principle of pur-

pose limitation). This is achieved via corresponding

limit and request activators in the PA-DFD. To illus-

trate this, we consider two scenarios, represented by

the data values d

and d

in the ﬁrst and last rows

of Table 2. In Scenario 1, the data subject “Subcon-

tractorX” permitted the smart sensor to collect infor-

mation until the end of 2020. Hence, the informa-

tion d

is forwarded from the limit node to Process 1

and logged (according to the accountability principle)

in the log store along with its policy and a ﬂag indi-

cating that no violation occurred. In Scenario 2, the

limit node prevents the data value d

from being for-

warded to Process 1 since the intended purpose of the

ﬂow (“Capturing completed sub-tasks”) is not com-

patible with the purpose (“Taking pictures for adver-

tisements”) to which the data subject (“Subcontrac-

torY”) consented. Furthermore, this event is logged

in the log store and identiﬁed as a privacy violation.

Contrast the above scenarios with the original

DFD in which the subcontractors’ data is uncondi-

tionally forwarded to processes and stored without

regard for any GDPR principle; the data can be col-

lected and used without limitation, for any purpose.

5 RELATED WORK

Basin et al. (2018) have recently proposed a method-

ology to audit GDPR compliance by using business

process models. They identify “purpose” with “pro-

cess” and show how to automatically generate pri-

vacy policies and detect violations of data minimisa-

tion at the modelling level. They highlight the dif-

ﬁculty of representing purpose at the programming

language level, and provides convincing arguments on

why GDPR compliance cannot be entirely automated.

Transforming Data Flow Diagrams for Privacy Compliance

213

Schaefer et al. (2018) present a deﬁnition of rules

for achieving Conﬁdentiality-by-Construction, where

functional speciﬁcations are replaced by conﬁdential-

ity speciﬁcations listing which variables contain se-

crets. Though the approach seems interesting, it has

(to the best of our knowledge) not been implemented.

Tuma et al. (2019) analyse information ﬂow poli-

cies at the modelling level. They focus on data conﬁ-

dentiality and integrity, and introduce a graphical no-

tation based on DFDs to algorithmically detect design

ﬂaws “in the form of violations of the intended secu-

rity properties”. They provide an Eclipse-based im-

plementation. Their approach is also based on DFDs

but has different objectives: while we focus on the

implementation of model transformation for speciﬁc

privacy checks, Tuma et al. focus on the detection of

design ﬂaws associated with security properties.

Our paper distinguishes itself in that none of the

above has taken the approach to automatically add

privacy checks to design models.

6 CONCLUSIONS

We have provided algorithms to automatically trans-

late DFD models into privacy-aware DFDs (PA-

DFDs) as well as a proof-of-concept implementation

integrated into a graphical tool for drawing DFDs.

Obtaining the algorithms (from the existing concep-

tual transformation) was not easy as some aspects

of the transformation were subtle and ambiguous not

allowing for a direct implementation. We have ad-

dressed these conceptual ﬂaws and evaluated them

through two case studies: an automated payment sys-

tem and an online retail system.

One limitation of our approach is that the dia-

grams resulting form our transformation can be large,

making it difﬁcult to visualise them. That said, the

intended use of this tool is as an intermediate step

in the design and development process, so the soft-

ware architect can still be able to inspect (and mod-

ify) only small and relevant subsets of the PA-DFD.

Our next step is to implement an algorithm to auto-

matically synthesise a template from the PA-DFD in

Java or Python. We will provide the programmer with

predeﬁned libraries to be used as building blocks for

implementing such privacy checks.

REFERENCES

Alshareef, H., Stucki, S., and Schneider, G. (2020). Trans-

forming data ﬂow diagrams for privacy compliance

(long version). CoRR, abs/2011.12028.

Antignac, T., Scandariato, R., and Schneider, G. (2016). A

privacy-aware conceptual model for handling personal

data. In ISoLA’16, pages 942–957.

Antignac, T., Scandariato, R., and Schneider, G. (2018).

Privacy compliance via model transformations. In

IWPE’18, pages 120–126. IEEE.

Basin, D., Debois, S., and Hildebrandt, T. (2018). On pur-

pose and by necessity: compliance under the GDPR.

In FC’18, pages 20–37. Springer.

Cavoukian, A. (2012). Privacy by design: origins, mean-

ing, and prospects for assuring privacy and trust in the

information era. In Privacy Protection Measures and

Tech. in Business Org., pages 170–208. IGI Global.

Chong, H.-Y. and Diamantopoulos, A. (2020). Integrat-

ing advanced technologies to uphold security of pay-

ment: Data ﬂow diagram. Automation in Construc-

tion, 114:103–158.

Danezis, G., Domingo-Ferrer, J., Hansen, M., Hoepman,

J.-H., Le Métayer, D., Tirtea, R., and Schiffner, S.

(2015). Privacy and data protection by design. ENISA

Report.

Dennis, A., Wixom, B. H., and Roth, R. M. (2018). Systems

analysis and design. John wiley & sons.

draw.io (2019). draw.io. https://www.draw.io/.

European Commission (2016). General data protection

regulation (GDPR). Regulation 2016/679, European

Commission.

Falkenberg, E., Pols, R. V. D., and Weide, T. V. D. (1991).

Understanding process structure diagrams. Informa-

tion Systems, 16(4):417 – 428.

Freitas, M. and Mira da Silva, M. (2018). GDPR compli-

ance in SMEs: There is much to be done. J. Inform.

Systems Eng., 3(4):30.

Henriksen, M. (2018). Draw.io libraries for threat mod-

eling diagrams. https://github.com/michenriksen/

drawio-threatmodeling.

Hert, P. D. and Papakonstantinou, V. (2016). The new gen-

eral data protection regulation: Still a sound system

for the protection of individuals? Computer Law &

Security Review, 32(2):179–194.

Oetzel, M. C. and Spiekermann, S. (2014). A systematic

methodology for privacy impact assessments: a de-

sign science approach. European Journal of Informa-

tion Systems, 23(2):126–150.

Schaefer, I., Runge, T., Knüppel, A., Cleophas, L., Kourie,

D., and Watson, B. W. (2018). Towards con-

ﬁdentiality-by-construction. In ISoLA’18. Springer.

Schneider, G. (2018). Is privacy by construction possible?

In ISoLA’18, pages 471–485. Springer.

Senarath, A. and Arachchilage, N. A. (2018). Why devel-

opers cannot embed privacy into software systems? an

empirical investigation. In EASE’18, pages 211–216.

Shostack, A. (2014). Threat modeling: Designing for secu-

rity. John Wiley & Sons.

Sirur, S., Nurse, J. R., and Webb, H. (2018). Are we there

yet? Understanding the challenges faced in complying

with the general data protection regulation (GDPR). In

MPS’18, pages 88–95. ACM.

Tsormpatzoudi, P., Berendt, B., and Coudert, F. (2015). Pri-

vacy by design: From research and policy to prac-

MODELSWARD 2021 - 9th International Conference on Model-Driven Engineering and Software Development

214

tice - the challenge of multi-disciplinarity. In APF’15,

pages 199–212. Springer.

Tuma, K., Scandariato, R., and Balliu, M. (2019). Flaws

in ﬂows: Unveiling design ﬂaws via information ﬂow

analysis. In ICSA’19, pages 191–200. IEEE.

Wuyts, K., Scandariato, R., and Joosen, W. (2014). Empir-

ical evaluation of a privacy-focused threat modeling

methodology. J. of Syst. and Soft., 96:122–138.

Transforming Data Flow Diagrams for Privacy Compliance

215