Privacy2Practice: Leveraging Automated Analysis for Privacy Policy

Transparency and Compliance

Saja Alqurashi

1,2 a

and Indrakshi Ray

1 b

Department of Computer Science, Colorado State University, U.S.A.

Department of Information Technology, King Abdulaziz University, Saudi Arabia

Keywords:

Natural Language Processing (NLP).

Abstract:

Privacy policies play a critical role in safeguarding information systems, yet they are frequently expressed in

lengthy, complex natural language documents. The intricate and dense language of these policies poses sub-

stantial challenges, making it difﬁcult for both novice users and experts to fully comprehend data collection,

sharing practices, and the overall transparency of data handling. This issue is particularly concerning given

the necessity of disclosing data practices to users, as mandated by privacy regulations such as the General

Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). To address these chal-

lenges and improve data transparency, this paper introduces Privacy2Practice, a comprehensive automated

framework leveraging Natural Language Processing (NLP) techniques to extract and analyze key information

from privacy policies. By automating the identiﬁcation of data practices mandated by privacy regulations,

the framework assesses how transparently these practices are disclosed, ensuring better alignment with reg-

ulatory requirements.The proposed approach signiﬁcantly enhances the transparency and the compliance of

privacy policies by identifying entities (F1-scores: 97% for ﬁrst-party and 93% for third-party entities), data

types (F1-score: 82%), and purposes of data collection and sharing (F1-score: 90%). These results underscore

the importance of transparency, particularly when data is shared with external parties, and highlight the chal-

lenges associated with automating privacy policy analysis. The results highlight signiﬁcant challenges, such

as undisclosed third-party sharing, while showcasing the potential of automation to be more comprehensive,

transparent and compliant with regulatory standards.

1 INTRODUCTION

A privacy policy is a legal document that outlines

how an organization collects, uses, stores, and shares

user data while establishing the terms of data pri-

vacy. As a legally binding mechanism, it ensures

transparency by informing users of an organization’s

data practices. Privacy policies are crucial for regula-

tory compliance with frameworks such as the General

Data Protection Regulation (GDPR) (Tankard, 2016),

the California Consumer Privacy Act (CCPA) (de la

Torre, 2018), and the Health Insurance Portability and

Accountability Act (HIPAA) (Annas, 2003). These

regulations mandate strict guidelines for transparency

and accountability in handling personal data (Cohen,

2008; Zaeem and Barber, 2020; Erkkilä, 2020). An

effective privacy policy provides a clear and compre-

https://orcid.org/0009-0009-9118-5533

https://orcid.org/0000-0002-0714-7676

hensive explanation of how customer personal data is

collected, used, stored, and shared, while explicitly

detailing their privacy rights. To truly serve its pur-

pose, the policy must be transparent and articulated in

a way that is easy for users to understand.

Under CCPA, businesses must disclose what per-

sonal information is collected, how it is used, whether

it is shared or sold, and how long it will be retained.

This information is typically provided in a notice

linked on the business’s website before or at the time

of data collection. Consumers are also informed of

their rights, including the right to know, delete, opt

out of the sale of their data, and be free of discrimi-

nation for exercising these rights (de la Torre, 2018).

Similarly, GDPR requires data controllers to provide

clear and accessible information to data subjects, in-

cluding the identity of the data controller, the purpose

of processing, the legal basis for processing, reten-

tion periods, and any potential data transfers. Fur-

132

Alqurashi, S., Ray and I.

Privacy2Practice: Leveraging Automated Analysis for Privacy Policy Transparency and Compliance.

DOI: 10.5220/0013518600003979

In Proceedings of the 22nd International Conference on Security and Cryptography (SECRYPT 2025), pages 132-143

ISBN: 978-989-758-760-3; ISSN: 2184-7711

thermore, privacy notices must explain the rights of

individuals, such as access, rectiﬁcation, erasure, data

portability, and the right to object or withdraw consent

where applicable (Tankard, 2016).

However, studies indicate that privacy policies

present signiﬁcant obstacles to transparency. Their

complexity, dense legal jargon, and excessive length

make them difﬁcult for novice and expert users to un-

derstand, affecting their effectiveness in clearly com-

municating data practices (Antón et al., 2004; Jensen

and Potts, 2004). Although the primary purpose of

privacy policies is to inform users about their rights

and the data practices of an organization, most indi-

viduals do not read them. Even those who attempt to

engage with these documents often ﬁnd the informa-

tion overwhelming. Fully understanding the privacy

policies of all the services one uses would require an

impractical amount of time and effort, further discour-

aging users from engaging with them. This complex-

ity exacerbates the gap in privacy policy transparency,

leaving users poorly informed about how their per-

sonal information is collected, used, and protected.

To address these challenges, recent research has

focused on developing methods to extract and ana-

lyze critical information from privacy policies auto-

matically. These efforts aim to make privacy poli-

cies more understandable while ensuring compliance

with legal frameworks (Harkous Hamza et al., 2018;

Andow et al., 2019; Andow et al., 2020; Elluri et al.,

2020). Despite these advancements, ensuring trans-

parency in privacy policies remains a challenge. The

length and complexity of these policies continue to

make it difﬁcult for consumers to fully understand

how their personal information is used by the orga-

nizations.

Current studies have not thoroughly examined the

speciﬁc types of data extracted from privacy policies,

such as personal information categories or the stated

purposes for data collection and sharing, includ-

ing marketing activities or service provision (Andow

et al., 2019; Andow et al., 2020; Bui et al., 2021). To

address these gaps, we introduce Privacy2Practice, a

framework designed to extract and classify data prac-

tices outlined in privacy policies, along with their cor-

responding purposes. We provide a methodology of

transforming intricate natural language requirements

into structured and actionable formats pertaining to

privacy policies. This approach emphasizes system-

atic analysis and alignment of privacy policies with

regulatory mandates, ensuring clarity, transparency,

and compliance, thereby providing a comprehensive

understanding of how organizations manage user data

and the intent behind these practices.

This paper presents Privacy2Practice, an auto-

mated framework designed to analyze privacy poli-

cies articulated in natural language documents. Pri-

vacy2Practice is designed to identify and extract data

practices embedded within these policies that de-

scribe the procedures organizations follow when col-

lecting, storing, sharing, and protecting personal in-

formation. Our approach focuses on three essential

components of data practices: entities, data types, and

purpose types, each representing key aspects of data

collection and sharing.

Privacy2Practice leverages a Natural Language

Processing (NLP)-driven approach to systematically

analyze privacy policies. It begins with (i) Entity

Type Identiﬁcation, classifying data controllers as ﬁrst

party, that is, entities that directly collect user data, or

third party, that is, external entities that receive shared

data. This classiﬁcation relies on binary classiﬁcation

and dependency parsing. Next, (ii) Data Type Iden-

tiﬁcation extracts and categorizes the speciﬁc types

of data collected, such as personal or ﬁnancial in-

formation, using a multiclass classiﬁcation approach.

Following this, (iii) Purpose Type Identiﬁcation de-

termines the intent behind data collection, mapping

it to categories like advertising or analytics through

multiclass classiﬁcation. Finally, (iv) Policy Policy

Analysis evaluates how clearly and transparently pri-

vacy policies align with regulatory frameworks such

as GDPR and CCPA.

The results of this approach are highly promising,

with the framework achieving an impressive F1-score

of 97% for ﬁrst-party entity identiﬁcation and 93%

for third-party entities. Additionally, the framework

demonstrated its effectiveness in accurately identify-

ing data types, attaining an F1-score of 82%. Further-

more, it excelled in determining the purposes of data

collection and sharing, achieving a strong F1-score of

90%. These results underscore the framework’s po-

tential to enhance transparency and regulatory com-

pliance in privacy policy analysis. In our analysis

of multiple privacy policies, we identiﬁed frequent

instances of non-compliance, often linked to undis-

closed third-party data sharing, particularly with ad-

vertisers. These ﬁndings emphasize the importance of

transparency, especially when personal data is shared

externally.

The paper is organized as follows: Section 2 re-

views related work, while Section 3 details the pro-

posed methodology. Section 4 describes the datasets.

Section 5 covers NLP tasks such as entity, data, and

purpose type identiﬁcation, while Section 6 focuses

on privacy policy analysis. Section 7 concludes with

key ﬁndings and future directions.

Privacy2Practice: Leveraging Automated Analysis for Privacy Policy Transparency and Compliance

133

2 LITERATURE REVIEW

Sunkle et al. (Sunkle et al., 2015) developed a

compliance-checking approach based on the Seman-

tics of Business Vocabulary and Rules (SBVR). Their

methodology translates regulatory requirements into

operational procedures by creating semantic vocab-

ularies and constructing logical expressions for pol-

icy rules. While this approach ensures alignment

with organizational processes, its reliance on semi-

automated processes and semantic similarity models

poses scalability and adaptability challenges, particu-

larly when addressing complex regulatory language.

Elluri et al. (Elluri et al., 2020) introduced an

automated framework for aligning privacy policies

with GDPR guidelines. Using Doc2Vec for seman-

tic similarity, they constructed an ontology to com-

pare compliance levels between organizational poli-

cies and GDPR requirements. Though effective in

identifying gaps, the approach’s reliance on semantic

similarity limits its ability to fully capture legal nu-

ances and contextual intricacies, especially for com-

plex or lengthy documents.

Mousavi et al. (Mousavi Nejad et al., 2018) pro-

posed KnIGHT, an automated tool that maps pri-

vacy policy statements to corresponding GDPR arti-

cles. Using NLP techniques, KnIGHT identiﬁes se-

mantically signiﬁcant sentences and aligns them with

GDPR clauses. Despite its potential for improving

compliance efforts, expert evaluations highlight its

partial accuracy and inability to address false nega-

tives, underscoring the need for enhanced precision.

Hamdani et al. (Hamdani et al., 2021) presented

a GDPR compliance framework that combines ma-

chine learning and rule-based approaches to evalu-

ate privacy policies. Their system extracts data prac-

tices using multi-label classiﬁcation and applies pre-

deﬁned rules derived from GDPR Articles 13 and 14.

While promising, the incomplete coverage of GDPR

concepts within their taxonomy and moderate perfor-

mance in data practice extraction indicate room for

improvement in achieving comprehensive compliance

veriﬁcation.

Amaral et al. (Amaral et al., 2023) introduced

DERECHA, a machine-learning-based method for

verifying the compliance of Data Processing Agree-

ments (DPAs) with GDPR. Using a conceptual model

of compliance requirements, DERECHA assesses

compliance through semantic parsing and NLP tech-

niques, achieving high precision and recall. However,

its reliance on predeﬁned models limits adaptability

to nuanced or context-speciﬁc GDPR interpretations.

Andow et al. (Andow et al., 2019) developed Pol-

icyLint, a tool for detecting contradictions in privacy

policies of mobile applications. By analyzing over

11,000 apps, PolicyLint identiﬁed contradictions in

14.2% of policies, including misleading statements

and redeﬁned terms. While effective in detecting in-

consistencies, the tool’s inability to distinguish be-

tween data-collecting entities limits its analytical ac-

curacy.

Bui et al. (Bui et al., 2021) proposed PurPliance,

an automated approach for detecting inconsistencies

between privacy policy statements and actual data us-

age behaviors. By analyzing semantic structures and

leveraging predicate-argument patterns, PurPliance

signiﬁcantly improved detection precision and recall

compared to previous methods. However, its reliance

on context-insensitive patterns may hinder its appli-

cation in more complex scenarios.

Nguyen et al. (Nguyen et al., 2021) examined

compliance issues related to GDPR’s consent man-

agement requirements, analyzing how applications

handle user consent for data processing activities.

Their study provided valuable insights into viola-

tions and recommendations for enhancing user pri-

vacy. However, the research was limited as it assessed

consent without executing the applications, overlook-

ing potential violations occurring later in the data life-

cycle.

Neupane et al. (Neupane et al., 2022) conducted

an extensive analysis of the privacy and security risks

in mobile companion apps, uncovering signiﬁcant

vulnerabilities and threats to user privacy. They em-

phasized the importance of stronger regulations and

enforcement to protect user data. The study eval-

uated Android companion apps across three dimen-

sions: data privacy, security, and risk. However, it

did not incorporate GDPR considerations within these

pillars.

Alfawzan et al. (Alfawzan et al., 2022) re-

viewed privacy, data sharing, and security policies in

women’s mHealth apps under EU GDPR compliance.

The study highlighted concerns such as inadequate se-

curity measures, unclear third-party data sharing, and

insufﬁcient consent mechanisms. Despite these ﬁnd-

ings, the study’s scope was limited to 23 apps, reduc-

ing its generalizability.

3 OVERVIEW OF OUR

APPROACH

We developed Privacy2Practice, an NLP pipeline de-

signed to overcome the complexities of identifying

and analyzing data practices embedded in privacy pol-

icy documents. Using Few-Shot Learning techniques,

Privacy2Practice facilitates efﬁcient privacy policy

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

134

analysis, ensuring that organizations remain aligned

with regulatory frameworks such as GDPR and CCPA

while enhancing transparency and compliance. The

pipeline is speciﬁcally engineered to extract critical

data practices mandated by the major privacy regu-

lations, including GDPR and CCPA(Harkous Hamza

et al., 2018). These practices are categorized into

three key dimensions as follows:

• Entities: This dimension identiﬁes the parties in-

volved in data collection and sharing processes:

– First Party: The organization or entity directly

responsible for collecting user data.

– Third Party: External entities or organizations

that receive data from the ﬁrst party for various

purposes.

• Data Type: This dimension deﬁnes the speciﬁc

categories of data collected or processed by the

organization, such as personally identiﬁable infor-

mation (PII), ﬁnancial details, browsing history,

and other relevant data types.

• Purpose Type: This dimension captures the un-

derlying reasons for data collection and shar-

ing, including objectives like service improve-

ment, targeted advertising, or compliance with le-

gal obligations.

Figure 1 illustrates the architecture of the Pri-

vacy2Practice pipeline to identify and classify enti-

ties, data, and purposes types within privacy policies.

Privacy2Practice analyzes privacy policies lever-

aging advanced natural language processing (NLP)

techniques, including Few-Shot learning, a subﬁeld of

machine learning in which models are trained to per-

form tasks with only a few labeled examples. This is

beneﬁcial in privacy policy analysis, where annotated

datasets are scarce. Traditional supervised learning

approaches require large volumes of labeled data to

generalize well, which is often impractical in the pri-

vacy domain. Few-Shot Learning models (Parnami

and Lee, 2022) overcome this limitation by using pre-

trained sentence transformers and contrastive learning

to quickly adapt to new tasks.

In this work, we adopted SetFit, a state-of-the-art

Few-Shot Learning algorithm proposed by Tunstall et

al. (Tunstall et al., 2022), which has demonstrated

high performance in a range of text classiﬁcation ap-

plications.

Utilizing the Few-Shot Learning approach, the

implementation of Privacy2Practice consists of four

downstream tasks aimed at automating the identiﬁca-

tion and categorization of data practices (entity, data,

and purpose types). These tasks include:

Task 1: Entity Type Identiﬁcation: This task ex-

tracts essential entities, including ﬁrst-party and third-

party entities, from natural language text. Binary clas-

siﬁcation, enhanced with dependency parsing, is em-

ployed to identify speciﬁc entities in the collection or

sharing of data.

Task 2: Data Type Identiﬁcation: This task detects

the presence of data within policy segments and accu-

rately classiﬁes the speciﬁc data types mentioned. To

achieve precise categorization, a multi-class classiﬁ-

cation approach is utilized.

Task 3: Purpose Type Identiﬁcation: This task

analyzes policy segments to determine whether they

specify the purposes of the data collection and shar-

ing activities, categorizing identiﬁed purposes accord-

ingly using a multi-class classiﬁcation approach.

Task 4: Privacy Policy Analysis: This task eval-

uates the transparency of a privacy policy by veri-

fying whether it adequately discloses all identiﬁed

data practices, namely entities, data types, and pur-

poses of data collection and sharing. If any of the ex-

pected data practices are missing or vague, the policy

is ﬂagged for further review. This review should be

conducted by the appropriate compliance personnel,

such as a privacy analyst or auditor.

The following sections of this paper provide a de-

tailed overview of the dataset used for our approach, a

comprehensive explanation of each task, a description

of the experimental methodology, and key insights

gained from the implementation.

4 DATASETS

Access to accurately labeled data is crucial to enhance

our supervised NLP algorithms for the initial three

tasks in our pipeline. After conducting an exhaus-

tive review of existing literature, we have pinpointed a

meticulously curated dataset concerning privacy poli-

cies (OOP-115) dataset (Wilson et al., 2016), which

we denote as Dataset 1. Moreover, to enrich our anal-

ysis for the ﬁnal task (Policy Analysis), we collected

various policies from various technology companies,

which we denote as Dataset 2.

4.1 Dataset 1

In this study, we used the publicly available OOP-

115 dataset (Wilson et al., 2016), a comprehensive

resource on privacy policy. Designed by Wilson et

al. (Wilson et al., 2016), this dataset comprises a di-

verse collection of 115 privacy policies spanning 15

sectors, including arts, shopping, business, and news.

It encapsulates a wealth of information, including the

data practices shown in Table 1, and each data prac-

tice category has attributes such as data type, purpose

Privacy2Practice: Leveraging Automated Analysis for Privacy Policy Transparency and Compliance

135

Figure 1: A Cross Functional Flowchart Presents the Automated Privacy Policy Analysis (Privacy2Practice).

type, and others meticulously curated by proﬁcient

annotators (Wilson et al., 2016). The ﬁnal annota-

tion scheme comprises a comprehensive breakdown

into ten distinct categories of data practices, as illus-

trated in Table 1 below. Consequently, we employ

this dataset as the cornerstone of our investigation,

focusing on tasks that aim to identify entities, data

types, and purpose types, as elaborated in Sections

5.1,5.2,5.3.

The OPP-115 data set comprises a wide array of

data practice categories as shown in Table 1. How-

ever, our focus was speciﬁcally on categories and at-

tributes aligning with our research objectives. In par-

ticular, we honed in on entity categories, speciﬁcally

ﬁrst and third parties, and the attributes of data type

and purpose type. These chosen categories were care-

fully extracted from the OPP-115 dataset to form the

basis of our training data.

Table 2 provides a detailed breakdown of the cat-

egory distribution within the training subset of the

OPP-115 dataset. It presents the number of sam-

ples associated with three key aspects: entities, data

type, and purpose type. The entities category, which

includes both ﬁrst party and third party, comprises

1,790 samples. The data type category contains 801

samples, while the purpose type category consists of

827 samples. This distribution offers valuable in-

sights into the data practices utilized from the OPP-

115 dataset, serving as a foundational dataset for

models training and evaluation.

Table 1: Categories of Data Practices Deﬁned by Wilson et

al. (Wilson et al., 2016).

Category Description

First Party (Collec-

tion)

Clariﬁes the entity who processes and

collects the data and the rationale be-

hind the service provider’s acquisi-

tion of user information.

Third Party (Sharing) Elaborates on the mechanisms

through which user information is

shared with or gathered by third-

party entities.

User Choice and Con-

trol

Outlines the choices and control op-

tions available to users regarding

their information.

User Access, Edit,

and Deletion

Describes whether and how users can

access, modify, or delete their infor-

mation.

Data Retention Indicates the duration for which user

information is retained.

Data Security Explains the measures in place to

safeguard user information.

Policy Change Addresses how users will be notiﬁed

about changes to the privacy policy.

Do Not Track Speciﬁes if and how Do Not Track

signals for online tracking and adver-

tising are respected.

International and Spe-

ciﬁc Audiences

Covers practices relevant to speciﬁc

user groups such as children, Euro-

peans, or California residents.

Other Includes additional sub-labels for in-

troductory or general text, contact in-

formation, and practices not falling

under the other deﬁned categories.

4.2 Dataset 2

We collect comprehensive privacy policies from var-

ious companies operating within the technology sec-

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

136

Table 2: Distribution of the Training Dataset.

Data Practices Sample Count

Entities ( First Party- Third

Party)

1790

Data Types 801

Purpose Types 827

tor. This data set forms the cornerstone of our anal-

ysis. To acquire these data, we developed a crawler

capable of extracting text from websites that host pri-

vacy policy statements. Subsequently, we systemati-

cally parsed each document, breaking them into dis-

crete paragraphs and statements to facilitate our anal-

ysis.

We followed a set of predeﬁned criteria to guide

the selection of privacy policies throughout the data

collection process. Based on these criteria, compa-

nies were categorized into two distinct groups. The

ﬁrst group comprises well-established organizations

that are generally recognized for demonstrating com-

pliance with regulations such as the GDPR and the

CCPA, which may contribute to more detailed dis-

closures of their data practices (Wong et al., 2023).

In contrast, the second group consists of smaller or

less prominent companies that may not provide the

same level of detail regarding their data practices.

To preserve the conﬁdentiality of these organizations,

their names have been replaced with symbolic identi-

ﬁers. We must emphasize that we used this data set in

our analysis task, as discussed in Section 6. Table 3

provides a comprehensive breakdown, illustrating the

number of statements collected from each company.

Table 3: Sentence Count in Companies’ Privacy Policies.

Group Companies Sentences Count

Group 1

Google 239

Microsoft 243

Group 2

Company_X 59

Company_Y 33

5 DOWN-STREAM NLP TASKS

The tasks for automating the analysis of privacy poli-

cies using the Privacy2Practice framework are struc-

tured into three primary downstream NLP tasks, each

addressing critical aspects of data practices outlined

in privacy policies. Below, we delve into the method-

ologies and approaches used for these tasks.

5.1 Entity Type Identiﬁcation

Our initial focus revolves around identifying who is

collecting the data and with whom it is shared, re-

ferred to as ﬁrst party and third party entities, respec-

tively. Thus, this task primarily centers on identify-

ing the entities engaged in data collection and shar-

ing, as delineated within privacy policies, commonly

referred to as ﬁrst party and third party entities.

In this task, we propose a two-level classiﬁcation

approach to precisely identify entities within privacy

policy sentences. Firstly, we classify the entity type,

aiming to differentiate between ﬁrst party and third

party entities, a process we term as Entity Identiﬁca-

tion. Once this foundational classiﬁcation is estab-

lished, we develop an algorithm to identify the spe-

ciﬁc entities mentioned within each sentence, a pro-

cess we refer to as Entity Detection.

Entity Identiﬁcation: The objective of this task is to

categorize segments of text based on whether they re-

fer to the entity responsible for collecting the data (re-

ferred to as the ﬁrst party) or if they pertain to the ex-

ternal entity from which the data is sourced and sub-

sequently obtained by the ﬁrst party (referred to as

the third party). To accomplish this goal, we adopt

a binary text classiﬁcation methodology. This entails

designing a model that can effectively classify each

text segment into one of two categories: referencing

the ﬁrst party or the third party. Given the inherent

challenge posed by the limited availability of training

data, we employ a Few-Shot Learning approach. By

utilizing this approach, we develop a binary classiﬁca-

tion model that effectively classiﬁes text segments de-

spite having access to only a limited dataset for train-

ing. In our approach, we leverage the SetFit model to

build a classiﬁcation model capable of accurately cat-

egorizing text segments into either ﬁrst party or third

party entities. The SetFit model has demonstrated no-

table efﬁciency and effectiveness in similar text clas-

siﬁcation tasks, making it a suitable choice for our

purpose. To evaluate the performance of our entity

identiﬁcation process, we rely on standard evaluation

metrics, including precision, recall, and F1-score.

Entity Detection: In this step, our focus is on deﬁn-

ing that ﬁrst party and third party entities within text

segments. Our approach entails identifying ﬁrst party

entities by recognizing them as the subject or ﬁrst

person within a sentence, and it is followed by dis-

cerning their associated actions. This involves ana-

lyzing the syntactic structure of the text to locate in-

stances where the subject of the sentence corresponds

to the ﬁrst party mentioned. Conversely, for iden-

tifying third party entities, our methodology centers

around pinpointing the name of organizations within

the text. These organizations typically represent the

third parties with whom data sharing agreements are

established. To carry out this analysis, we employ a

dependency parser applied to the preprocessed text

data. The dependency parser enables us to analyze

the grammatical structure and relationships between

Privacy2Practice: Leveraging Automated Analysis for Privacy Policy Transparency and Compliance

137

words in the text, providing valuable insights into how

different elements within the text are connected and

interact. We utilized SpaCy (Honnibal and Montani,

2017), which is an NLP library, to apply the depen-

dency parser. SpaCy offers robust capabilities for lin-

guistic analysis, including dependency parsing. By

leveraging SpaCy, we can extract detailed syntactic

information from the text data, facilitating the identi-

ﬁcation of the ﬁrst party and third party entities within

text segments.

Experimental Results.

Entity Classiﬁcation: Fine-tuning large language

models for text classiﬁcation involves training them

on a curated dataset speciﬁcally tailored to the task.

In our case, the goal was to classify text into two

distinct categories: ﬁrst-party and third-party entities.

To achieve this, we conducted an experiment by ﬁne-

tuning the SetFit model (Tunstall et al., 2022), conﬁg-

uring it as a binary classiﬁer using labeled data repre-

sentative of ﬁrst-party and third-party entities.

The ﬁne-tuning process utilized the following hy-

perparameters:

• Training Data: 1,000 samples

• Testing Data: 500 samples

• Sentence Transformer: paraphrase-MiniLM-L3-v

• Number of Epochs: 10

Based on the experimental results shown in Table 4,

our model demonstrated an exceptional ability to ac-

curately identify ﬁrst-party entities within the text,

achieving an impressive F1 score of 97%. In addi-

tion, it exhibited notable proﬁciency in the detection

of third-party entities, achieving an F1 score of 93%.

These high-performance metrics underscore the ef-

fectiveness of the model and highlight its promising

potential for reliable text classiﬁcation tasks.

Table 4: Performance of the Entity Classiﬁcation.

Entity Type P(%) R(%) F

(%)

First Party 100% 95% 97%

Third Party 87% 100% 93%

Entity Extraction: To extract relevant entities, we uti-

lized a dependency parser, leveraging the advanced

capabilities of SpaCy. Our objective was to identify

the ﬁrst party responsible for data collection and the

third party with whom the data is shared. The process

was carried out using the following steps:

1. Tokenization and Iteration: The input sentence

is ﬁrst tokenized into individual tokens, forming

the foundation for detailed analysis. Each token

is then iterated over to facilitate thorough entity

identiﬁcation and extraction.

2. Identifying First Party: Tokens are inspected to

determine whether they correspond to ﬁrst-party

pronouns (e.g., I,’ we,’ us,’ it’). These pronouns

typically represent the speaker or the entity col-

lecting the data. Dependency relationships be-

tween tokens are analyzed to verify whether the

ﬁrst-party pronoun functions as the subject of a

verb (nsubj). If the pronoun is not directly linked

as the subject of a verb, additional analysis is con-

ducted to determine whether it serves as the sub-

ject of an auxiliary verb (aux) that functions as the

head of a main verb. This approach ensures accu-

rate identiﬁcation of ﬁrst-party entities in varying

syntactic contexts.

3. Identifying Third Party: Each token is examined

to identify entities tagged as organizations (la-

beled "ORG") using Named Entity Recognition

(NER). The corresponding text for these identi-

ﬁed organization entities is then extracted for fur-

ther analysis.

The following example demonstrates entity extraction

from privacy policies using SpaCy. A sample privacy

policy sentence is analyzed to identify the ﬁrst party

(the entity collecting the data) and the third party (the

entity with whom the data is shared). Consider the

statement:

"We may collect data that include, but are not lim-

ited to, weight, steps, and height. Based on the initial

setup on iOS, enable us to link to Apple Health."

In this example: - First Party: We, us - Third

Party: Apple Health

Comparative Performance Analysis.

A comparison of our model with prior research, in-

cluding methodologies employing CNN and XLNET

models (Hamdani et al., 2021; Harkous Hamza et al.,

2018), highlights the superiority of the proposed ap-

proach. As shown in Table 5, the SetFit model out-

performed these approaches, achieving signiﬁcantly

higher F1-scores for both ﬁrst-party and third-party

entity identiﬁcation.

By systematically analyzing the dependency

structures of sentences and identifying key linguistic

patterns, our algorithm effectively pinpoints instances

of both ﬁrst-party and third-party entities. These re-

sults underscore the robustness of the SetFit model,

demonstrating its capability as a few-shot text classi-

ﬁer tailored to the complexities of privacy policy anal-

ysis.

5.2 Data Type Identiﬁcation

This task focuses on classifying the diverse data in-

stances mentioned in privacy policies into speciﬁc

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

138

Table 5: F1-score comparison between our approach and

prior methods.

Entity Type CNN

(Ham-

dani

et al.,

2021)

XLENT

(Ham-

dani

et al.,

2021)

Polisis

(CNN)

(Hark-

ous Hamza

et al.,

2018)

Our Ap-

proach

(SetFit)

First Party 76% 83% 80% 97%

Third Party 78% 81% 81% 93%

categories, each representing a distinct type of infor-

mation. To achieve this, we leverage a taxonomy in-

spired by OPP-115 [26], categorizing the data into

key groups such as personal information, ﬁnancial de-

tails, device-related data, cookies and tracking tech-

nologies, user online activities, and unspeciﬁed in-

stances. The detailed classiﬁcation categories are out-

lined below:

1. Personal Information: Includes personally identi-

ﬁable information (PII), such as names, addresses,

social security numbers, and biometric data.

2. Financial Information: Covers data related to

ﬁnancial transactions, account details, payment

methods, credit card numbers, and other sensitive

ﬁnancial data.

3. Computer Information: Pertains to data linked

to electronic devices, including device identiﬁers,

Internet Protocol (IP) addresses, and other unique

digital identiﬁers.

4. Cookies and Tracking Elements: Encompasses

data associated with cookies and tracking tech-

nologies used to monitor user behavior and pref-

erences.

5. User Online Activities: Reﬂects user interactions

with online services and websites, including ac-

tivities tracked and recorded by these platforms.

6. Unspeciﬁed: Applies to data instances that do not

clearly fall under any of the above categories or

are not explicitly described in the text.

To systematically identify and categorize data in-

stances within privacy policies, we adopt a two-tiered

classiﬁcation approach as follows:

1. Existence Checking Phase: This initial phase as-

sesses whether a speciﬁc data type is present in

a given text segment. Each segment is catego-

rized as either speciﬁed or unspeciﬁed based on

the presence of identiﬁable data types. If data

existence is conﬁrmed, the segment progresses to

the next phase.

2. Data Type Classiﬁcation Phase: In this phase,

segments containing identiﬁable data are analyzed

and assigned to one of the predeﬁned categories.

This step ensures a detailed and accurate align-

ment of data instances with their appropriate clas-

siﬁcations.

By employing this structured approach, we ensure

precision and consistency in identifying and catego-

rizing data instances within privacy policies. The two-

tiered methodology enhances the accuracy and efﬁ-

ciency of our analysis, providing a comprehensive un-

derstanding of the data practices disclosed in these

documents.

Experimental Results.

Our study employed the SetFit Few-Shot Learning ap-

proach to tackle the challenge of classifying data seg-

ments within privacy policies. Leveraging the OPP-

115 dataset, which categorizes data types as either

speciﬁed or unspeciﬁed, we established a solid foun-

dation for our analysis. This dataset enabled us to

extract the data type attribute assigned to each seg-

ment and utilize it as the segment’s label, facilitating

a systematic classiﬁcation process.

In the initial phase, referred to as Existence Check-

ing, we ﬁne-tuned SetFit to determine whether each

segment represented a speciﬁed or unspeciﬁed data

type. This phase laid the groundwork for further clas-

siﬁcation by identifying relevant segments requiring

detailed analysis. The performance of the model dur-

ing this phase was evaluated using key metrics, in-

cluding precision, recall, and F1-score, with the re-

sults presented in Table 6.

Building upon the outcomes of the ﬁrst phase,

the subsequent phase, Data Type Classiﬁcation, fo-

cused on assigning each identiﬁed segment to one of

the predeﬁned data type categories. These categories

included personal information, ﬁnancial records,

computer-related data, and tracking elements. By

further ﬁne-tuning SetFit, we enhanced its ability to

accurately classify segments into these speciﬁc cate-

gories. The performance metrics for this phase, in-

cluding precision, recall, and F1-score, are summa-

rized in Table 7.

For our experiments, the ﬁne-tuning process uti-

lized the following hyperparameters::

• Training Data: 600 samples

• Testing Data: 200 samples

• Sentence Transformer: paraphrase-MiniLM-L3v

• Number of Epochs: 10

The results, presented in Table 7, highlight the

model’s performance in detecting existing data within

text segments, achieving an overall F1-score of 82%.

Furthermore, the model demonstrated exceptional

proﬁciency in classifying speciﬁc data types. It

achieved F1-scores of 96% for personal information,

94% for ﬁnancial information, and 98% for computer

Privacy2Practice: Leveraging Automated Analysis for Privacy Policy Transparency and Compliance

139

information. However, its performance in identify-

ing user online activities and cookies/tracking data

was comparatively lower, with F1-scores of 62% and

69%, respectively.

It is important to note that prior studies have not

focused on specifying the types of data handled or

shared within privacy policies, nor have they explored

automated methods for identifying such data types.

By undertaking this task, our research makes a sub-

stantial contribution to the ﬁeld by introducing an

automated approach to categorize data types within

privacy policies. This contribution marks a signiﬁ-

cant step forward in understanding and addressing the

complexities of privacy policy analysis in an increas-

ingly data-driven world.

Table 6: Performance of the Data Existence Phase.

Classes P R F

Speciﬁed Data Type 80% 84% 82%

Unspeciﬁed Data Type 88% 84% 86%

Table 7: Performance of the Data Type Classiﬁcation Phase.

Classes P R F

Personal Information 94% 98% 96%

Financial Information 93% 95% 94%

Computer Information 95% 100% 98%

Cookies and Tracking Elements 75% 64% 69%

User Online Activities 80% 50% 62%

5.3 Purpose Type Identiﬁcation

This task aims to classify the purpose types men-

tioned within privacy policies into speciﬁc categories,

each representing a distinct purpose. To achieve this,

we focus on key categories inspired by those deﬁned

in OPP-115 (Wilson et al., 2016), which include: ad-

vertising, basic services and features, additional ser-

vices and features, analytical and research, personal-

ization and customization, and legal requirements.

To address this task, we employed Few-Shot

Learning approach for multi-class classiﬁcation. Our

objective was to identify the purpose of data prac-

tices within segments of privacy policies and catego-

rize them into the following distinct classes:

1. Advertising: Encompasses data practices related

to promotional activities, market research, and

consumer engagement strategies designed to en-

hance brand visibility and customer outreach.

2. Basic Services and Features: Includes data prac-

tices integral to the delivery of core functionalities

and primary offerings within the organization.

3. Additional Services and Features: Covers data

practices pertaining to supplementary services

and features that enhance the user experience and

complement the organization’s core offerings.

4. Analytical and Research: Refers to data prac-

tices focused on data analysis, research initiatives,

and insights generation to better understand mar-

ket trends, consumer behavior, and organizational

performance metrics.

5. Personalization and Customization: Involves data

practices aimed at tailoring services, products,

and user experiences to individual preferences and

requirements, thereby enhancing personalization

capabilities.

6. Legal Requirements: Represents data practices

undertaken to comply with legal obligations re-

lated to the collection, use, or sharing of personal

information online.

7. Unspeciﬁed: Applies when no speciﬁc purpose is

mentioned within the segment.

We utilized the Few-Shot Learning approach, speciﬁ-

cally the SetFit model, to develop a classiﬁcation ap-

proach capable of accurately assigning purpose types

to their respective categories. The process involved a

two-phase classiﬁcation strategy as follows:

Existence Checking Phase: In this initial phase, we

determined whether a speciﬁc purpose was mentioned

within the segment. Segments were classiﬁed as ei-

ther containing a speciﬁed purpose or falling under

the "unspeciﬁed" category. If a speciﬁed purpose was

identiﬁed, the segment proceeded to the next phase.

This step ensured that only relevant segments were

included in the purpose type classiﬁcation. For the

Existence Checking Phase, the SetFit model was ﬁne-

tuned to classify segments as either speciﬁed or un-

speciﬁed. Evaluation metrics, including precision, re-

call, and F1-score, are presented in Table 8.

Purpose Type Classiﬁcation Phase: In this phase,

segments identiﬁed as containing a speciﬁed purpose

were assigned to one of the predeﬁned categories.

The classiﬁcation aimed to discern the nature of the

purpose and accurately map it to its corresponding

category. In the Purpose Type Classiﬁcation Phase,

we further ﬁne-tuned the SetFit model to classify seg-

ments into the predeﬁned purpose categories (adver-

tising, basic services and features, additional services

and features, analytical and research, personalization

and customization, and legal requirements). The per-

formance of this classiﬁcation is assessed using pre-

cision, recall, and F1-score, as detailed in Table 9.

The model was ﬁne-tuned with the following hy-

perparameters:

• Training Data: 600 samples

• Test Data: 200 samples

• Sentence Transformer: paraphrase-MiniLM-L3v

• Number of Epochs: 10

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

140

Experimental Results.

Based on the results of our experiment, we observed

that our model demonstrated capability in discerning

existing purpose in the text segment with a 90% F-1

score as shown in Table 8. Furthermore, it displayed

proﬁciency in identifying advertising, basic service,

additional services, and legal requirements, achiev-

ing an F1-score of 88%, 80%, 95% and 81%, respec-

tively, as shown in Table 9. However, when it came

to identifying analytical research and personalization

purposes the model achieved a lower F1 score of 72%

and 64% respectively.

Table 8: Performance of the Purpose Existence Phase.

Classes P R F

Speciﬁed Purpose 91% 90% 90%

Unspeciﬁed Purpose 91% 92% 92%

Table 9: Performance of the Purpose Type Classiﬁcation

Phase.

Classes P R F

Advertising 87% 89% 88%

Basic services and feature 86% 75% 80%

Additional services and feature 100% 91% 95%

Analytical and research 68% 76% 72%

Personalization and customization 73% 57% 64%

Legal requirement 81% 81% 81%

It is noteworthy that there has been no prior re-

search dedicated to specifying the purposes for col-

lecting or sharing data within privacy policies, and

there has been no study on the automated identiﬁca-

tion of such purpose types. Thus, this task constitutes

another notable contribution to the automated privacy

policy analysis.

6 PRIVACY POLICY ANALYSIS

This section provides an in-depth analysis of privacy

policies, emphasizing their effectiveness in achiev-

ing transparency, a cornerstone of trust and regu-

latory compliance in data practices. Transparency,

as introduced earlier, is the principle that ensures

users can understand and evaluate how their personal

data is collected, shared, and used. Privacy poli-

cies serve as critical tools for communicating these

practices, and their transparency directly impacts user

trust and compliance with regulations such as GDPR

and CCPA (Adjerid et al., 2013; Pan and Zinkhan,

2006).

Transparent privacy policies provide clarity, con-

sistency, and comprehensiveness in the disclosure of

data practices. In this study, transparency is evalu-

ated through the lens of consistent disclosure, which

measures the extent to which privacy policies clearly

and accurately represent data collection, sharing, and

usage practices. Using the Privacy2Practice frame-

work, this analysis examines whether privacy policies

meet the criteria for transparency by aligning their

disclosures with regulatory requirements.

To assess transparency, privacy policies from two

distinct groups of companies were analyzed.

• Group 1: Two large technology companies were

selected, including Google

and Microsoft

• Group 2: Two small organizations were selected

for this study and are referred to as Company_X

and Company_Y to preserve their conﬁdentiality.

Evaluation of Group 1’s Privacy Policy Trans-

parency.

First Party Transparency: Both Google and Microsoft

demonstrate clear responsibility for data collection,

with their privacy policies explicitly stating that the

companies themselves handle this task.

Third Party Transparency: Google restricts data shar-

ing to its own services and applications, providing

users with clarity on third-party involvement. Mi-

crosoft similarly limits data sharing to its ecosystem,

such as media-related services.

Data Type Transparency: The data types collected

by both companies, including personal information,

cookies, and online activity, are identiﬁed transpar-

ently. Neither company collects ﬁnancial information

and this exclusion is clearly communicated.

Purpose Transparency: Both companies provide ex-

plicit justiﬁcations for data collection. Google fo-

cuses on improving user experiences and enabling re-

search and analytics, while Microsoft emphasizes ser-

vice delivery and additional features.

Thus, Group 1 sets a high standard for trans-

parency by ensuring clarity of roles, with a clear de-

lineation of ﬁrst-party and third-party responsibilities,

which helps users understand who is collecting and

handling their data. Additionally, their comprehen-

sive disclosures provide detailed identiﬁcation of the

types of data collected, enabling users to grasp the

scope and nature of the information being handled.

Furthermore, Group 1 offers transparent explanations

of the purposes behind their data practices, ensur-

ing users are aware of why their data is being col-

lected and how it is utilized. By aligning their dis-

closures with transparency principles, Group 1 not

only demonstrates compliance with regulatory stan-

dards but also fosters trust among its users.

Evaluation of Group 2’s Privacy Policy Trans-

parency.

https://policies.google.com/privacy?hl=en-US

https://privacy.microsoft.com/en-us/privacystatement

Privacy2Practice: Leveraging Automated Analysis for Privacy Policy Transparency and Compliance

141

First Party Transparency: Company_X and Com-

pany_Y both declare their responsibility for data col-

lection.

Third Party Transparency: Company_X explicitly

names third parties like MailChimp, providing some

level of transparency. In contrast, Company_Y fails

to disclose third-party details, leaving users uncertain

about data sharing practices.

Data Type Transparency: Company_X identiﬁes per-

sonal information, cookies, and data of online activ-

ity. Company_Y collects similar data, but lacks clar-

ity in explaining the inclusion of sensitive ﬁnancial

information.

Purpose Transparency: While Company_X offers

some information on the purposes of data collec-

tion, such as research and service enhancement, com-

pany_Y provides limited details, creating ambiguity

in its data practices.

Thus, Group 2 reveals signiﬁcant transparency

gaps that undermine the clarity and trustworthiness

of their privacy practices. One major issue is the

ambiguity surrounding third-party disclosures, as the

policies provide limited information about the enti-

ties with whom data is shared. Additionally, there

is inadequate clarity regarding the purposes of data

collection and sharing, with insufﬁcient explanations

for why user data is being handled in speciﬁc ways.

These shortcomings hinder users’ ability to fully un-

derstand the data practices of Group 2 companies,

ultimately eroding trust and raising concerns about

transparency and compliance.

Thus, this analysis highlights the pivotal role that

transparency plays in privacy policies, serving as a

foundation for building trust and ensuring regulatory

compliance. Group 1 exempliﬁes best practices by

embracing transparency principles through clearly de-

ﬁned roles for ﬁrst and third parties, detailed dis-

closures of data types, and purpose-driven explana-

tions for data collection and sharing. In contrast,

Group 2 illustrates the challenges faced in achiev-

ing comparable levels of transparency, often lacking

clarity and consistency. These ﬁndings emphasize

the critical need for automated frameworks such as

Privacy2Practice to evaluate and improve the trans-

parency of privacy policies, ensuring that users are

fully informed and their data are adequately pro-

tected.

7 CONCLUSION

Privacy policies are critical in deﬁning how organiza-

tions handle user data, including its collection, stor-

age, sharing, and protection. However, these policies

are often expressed in dense natural language, making

them difﬁcult for users to comprehend and evaluate

for transparency and compliance with privacy regula-

tions.

To address these challenges, this paper introduced

Privacy2Practice, an automated framework designed

to analyze and extract key data practices embedded

in privacy policy documents. Using advanced NLP

techniques and a Few-Shot Learning approach, the

framework focuses on three primary dimensions of

data practices: (1) entities involved in data collection

and sharing, (2) data types handled, and (3) purposes

for data collection and sharing. Using four tasks, in-

cluding entity type identiﬁcation, data type identiﬁca-

tion, purpose type identiﬁcation, and privacy analysis,

Privacy2Practice ensures a systematic and structured

approach to analyzing privacy policies.

The strength of the proposed approach lies in its

use of a Few-Shot Learning algorithm that excels in

text classiﬁcation tasks with minimal training data.

For entity type identiﬁcation, the model achieved F1

scores of 97% and 93% for ﬁrst-party and third-party

entities, respectively, demonstrating exceptional pre-

cision. Similarly, the framework accurately catego-

rized the types of data (F1 score: 82%) and deter-

mined the purposes of data collection and sharing (F1

score: 90%). This level of precision highlights the

ability of the framework to effectively address the

complexity of privacy policy analysis, providing ac-

tionable information to organizations and end users

alike.

Our future work will focus on enhancing the

framework by detecting and resolving contradictions

between policy claims, actual operational practices,

and software behavior. Additionally, we aim to inte-

grate automated compliance mechanisms within ap-

plication behaviors to ensure continuous alignment

with privacy regulations.

ACKNOWLEDGEMENTS

This work was supported in part by funding from NSF

under Award Numbers DMS 2123761, CNS 1822118,

CNS 2335687, NIST, ARL, Statnett, AMI, NewPush,

and Cyber Risk Research.

Also, this research is supported by a grant (No.

CRPG-00-0000) under the Cybersecurity Research

and Innovation Pioneers Initiative, provided by the

National Cybersecurity Authority (NCA) in the King-

dom of Saudi Arabia.

SECRYPT 2025 - 22nd International Conference on Security and Cryptography

142

REFERENCES

Adjerid, I., Acquisti, A., Brandimarte, L., and Loewen-

stein, G. (2013). Sleights of privacy: Framing, disclo-

sures, and the limits of transparency. In Proceedings

of the ninth symposium on usable privacy and secu-

rity, pages 1–11.

Alfawzan, N., Christen, M., Spitale, G., and Biller-

Andorno, N. (2022). Privacy, data sharing, and data

security policies of women’s mhealth apps: Scoping

review and content analysis. JMIR mhealth uhealth

10, 5 (may 2022), e33735.

Amaral, O., Abualhaija, S., and Briand, L. (2023). Ml-

based compliance veriﬁcation of data processing

agreements against GDPR. In IEEE 31st Interna-

tional Requirements Engineering Conference (RE),

pages 53–64, Germany. IEEE.

Andow, B., Mahmud, S. Y., Wang, W., Whitaker, J.,

Enck, W., Reaves, B., Singh, K., and Xie, T.

(2019). {PolicyLint}: investigating internal pri-

vacy policy contradictions on google play. In 28th

USENIX security symposium, pages 585–602, Santa

Clara,CA,USA. USENIX Association.

Andow, B., Mahmud, S. Y., Whitaker, J., Enck, W., Reaves,

B., Singh, K., and Egelman, S. (2020). Actions speak

louder than words:{Entity-Sensitive} privacy policy

and data ﬂow analysis with {PoliCheck}. In 29th

USENIX Security Symposium, pages 985–1002, On-

line. USENIX Association.

Annas, G. J. (2003). Hipaa regulations: a new era of

medical-record privacy? New England Journal of

Medicine, 348:1486.

Antón, A. I., Earp, J. B., He, Q., Stufﬂebeam, W., Bolchini,

D., and Jensen, C. (2004). Financial privacy policies

and the need for standardization. IEEE Security &

privacy, 2(2):36–45.

Bui, D., Yao, Y., Shin, K. G., Choi, J.-M., and Shin, J.

(2021). Consistency analysis of data-usage purposes

in mobile apps. In Proceedings of ACM SIGSAC Con-

ference on Computer and Communications Security,

pages 2824–2843, Republic of Korea. ACM.

Cohen, J. E. (2008). Privacy, visibility, transparency, and

exposure. The University of Chicago Law Review,

75(1):181–201.

de la Torre, L. (2018). A guide to the california consumer

privacy act of 2018. Available at SSRN 3275571.

Elluri, L., Joshi, K. P., and Kotal, A. (2020). Measuring

semantic similarity across EU GDPR regulation and

cloud privacy policies. In 2020 IEEE International

Conference on Big Data (Big Data), pages 3963–

3978. IEEE.

Erkkilä, T. (2020). Transparency in public administration.

In Oxford research encyclopedia of politics. Oxford.

Hamdani, R. E., Mustapha, M., Amariles, D. R., Trous-

sel, A., Meeùs, S., and Krasnashchok, K. (2021). A

combined rule-based and machine learning approach

for automated GDPR compliance checking. In Pro-

ceedings of the Eighteenth International Conference

on Artiﬁcial Intelligence and Law, pages 40–49, São

Paulo, Brazil.

Harkous Hamza, K. F., Lebret, R., Schaub, F., Shin, K. G.,

and Aberer, K. (2018). Polisis: Automated analysis

and presentation of privacy policies using deep learn-

ing. In 27th USENIX Security Symposium, pages 531–

548, Baltimore, MD, USA. USENIX Association.

Honnibal, M. and Montani, I. (2017). SpaCy 2: Natural lan-

guage understanding with Bloom embeddings, convo-

lutional neural networks and incremental parsing. Ac-

cessed: 2023.

Jensen, C. and Potts, C. (2004). Privacy policies as

decision-making tools: an evaluation of online pri-

vacy notices. In Proceedings of the SIGCHI confer-

ence on Human Factors in Computing Systems, pages

471–478, Vienna, Austria.

Mousavi Nejad, N., Scerri, S., and Lehmann, J. (2018).

Knight: Mapping privacy policies to GDPR. In

Knowledge Engineering and Knowledge Manage-

ment: 21st International Conference, EKAW 2018,

Nancy, France, November 12-16, 2018, Proceedings

21, pages 258–272. Springer.

Neupane, S., Tazi, F., Paudel, U., Baez, F. V., Adamjee,

M., De Carli, L., Das, S., and Ray, I. (2022). On the

data privacy, security, and risk postures of iot mobile

companion apps. In IFIP Annual Conference on Data

and Applications Security and Privacy, pages 162–

182. Springer.

Nguyen, T. T., Backes, M., Marnau, N., and Stock, B.

(2021). Share ﬁrst, ask later (or never?) studying vio-

lations of {GDPR’s} explicit consent in android apps.

In 30th USENIX Security Symposium (USENIX Secu-

rity 21), pages 3667–3684.

Pan, Y. and Zinkhan, G. M. (2006). Exploring the impact of

online privacy disclosures on consumer trust. Journal

of retailing, 82(4):331–338.

Parnami, A. and Lee, M. (2022). Learning from few exam-

ples: A summary of approaches to few-shot learning.

arXiv preprint arXiv:2203.04291.

Sunkle, S., Kholkar, D., and Kulkarni, V. (2015). Toward

better mapping between regulations and operations of

enterprises using vocabularies and semantic similar-

ity. Complex Systems Informatics and Modeling Quar-

terly, 5:39–60.

Tankard, C. (2016). What the GDPR means for businesses.

Network Security, 2016(6):5–8.

Tunstall, L., Reimers, N., Jo, U. E. S., Bates, L., Korat,

D., Wasserblat, M., and Pereg, O. (2022). Efﬁcient

few-shot learning without prompts. arXiv preprint

arXiv:2209.11055.

Wilson, S., Schaub, F., Dara, A. A., Liu, F., Cherivirala, S.,

Leon, P. G., Andersen, M. S., Zimmeck, S., Sathyen-

dra, K. M., Russell, N. C., et al. (2016). The cre-

ation and analysis of a website privacy policy corpus.

In Proceedings of the 54th Annual Meeting of the As-

sociation for Computational Linguistics, pages 1330–

1340, Berlin, Germany.

Wong, R. Y., Chong, A., and Aspegren, R. C. (2023). Pri-

vacy legislation as business risks: How GDPR and

CCPA are represented in technology companies’ in-

vestment risk disclosures. Proceedings of the ACM on

Human-Computer Interaction, 7(CSCW1):1–26.

Zaeem, R. N. and Barber, K. S. (2020). The effect of the

GDPR on privacy policies: Recent progress and fu-

ture promise. ACM Transactions on Management In-

formation Systems (TMIS), 12(1):1–20.

Privacy2Practice: Leveraging Automated Analysis for Privacy Policy Transparency and Compliance

143