Graph Algorithms over Homomorphic Encryption for Data Cooperatives

Mark Dockendorf, Ram Dantu and John Long

Department of Computer Science and Engineering, University of North Texas

Keywords:

Homomorphic Encryption, Data Cooperatives, Graphs, Data Oblivious Algorithms, Shortest Path, Minimum

Spanning Tree, Harmonic Centrality, Betweenness Centrality, Random Walk.

Abstract:

“Big data” continues to grow in inﬂuence with few competitors able to challenge them. In order to slow the

growth of and eventually replace these “data silos”, we must enable competition from alternative sources that

respect users’ privacy, such as data cooperatives. In our previous work, we proposed an architecture for a

privacy-preserving data cooperative that relies on homomorphic encryption (HE) to ensure data privacy and

demonstrated ring-based BFS, degree centrality, and farness centrality over HE graph data. In this paper

we expand our suite of HE graph algorithms to include single-source shortest-path, all-pairs shortest-path,

minimum spanning tree, harmonic centrality, random walk, and betweenness centrality over HE graph data.

These graph analysis algorithms support the core service of a data cooperative: to provide data and insights

(or aggregates) to the service of the cooperative’s clients (researchers, companies, governments, etc.) while

maintaining the privacy of their users.

1 INTRODUCTION

Given the advantage that aggregated data gives the or-

ganizations that can afford it, the practice of amass-

ing large amounts of data will never go away entirely.

As long as organizations (companies, governments,

etc.) amass a net positive outcome from violating

their users’ privacy, they will continue to do so. Tar-

geted advertising’s success (from the advertisement

platform’s perspective) is a prime example: targeted

advertising tends to generate 4.5 times the clicks of

traditional advertising (Farahat, 2013). The primary

competition to these massive, privately-held data col-

lections (a.k.a “data silos”) are data cooperatives.

Data cooperatives, sometimes referred to as data

unions, function similarly to a credit union, except in-

stead of participants investing money, they invest their

data (Pentland and Hardjono, 2020). Participants do

not typically earn monetary interest from their invest-

ment; rather, participants in the data cooperative have

greater control over how their data is used and gain

other, non-monetary beneﬁts. This does not mean that

it will not work for companies and governments, but

rather the data will be used in accordance with the will

of the participants: a mutualistic relationship rather

than a parasitic monopoly. The ability for your fa-

vorite coffee shop to know the best spot to add their

next location serves both the company’s bottom line

and your convenience. The ability for a government

to better route public transportation, better schedule

infrastructure maintenance, and more based on aggre-

gated data works for the beneﬁt of both the govern-

ment ofﬁcials and the people of the community. Data

cooperatives focus on serving both the privacy needs

of users and the data needs of companies, researchers,

governments, and more.

Data cooperatives can provide similar services to

those of their “data silo” counterparts. For exam-

ple, in a data cooperative-backed targeted advertising

service, participants would be able to control if the

service makes inferences from their history, past pur-

chases, demographic data, and/or their stated interests

as well as if the aforementioned data is uploaded to

the data cooperative. In the spirit of mutual beneﬁt

for data use, participants could specify their interests

and block certain types of advertisements. This would

allow recovering addicts to block advertisements in-

volving their vice, and those that have had traumatic

experiences would be able to block subject matters

that remind them of said experience.

A real-world example of an existing data coop-

erative is HealthBank. HealthBank stores and ana-

lyzes personal health data, connecting patients, their

doctors, pharmaceutical companies, and researchers

in the common goal of improving health and well-

being for both the patients personally and humanity

Dockendorf, M., Dantu, R. and Long, J.

Graph Algorithms over Homomorphic Encryption for Data Cooperatives.

DOI: 10.5220/0011277000003283

In Proceedings of the 19th International Conference on Security and Cryptography (SECRYPT 2022), pages 205-214

ISBN: 978-989-758-590-6; ISSN: 2184-7711

205

as a whole.

Data cooperatives allow the sharing and use of ag-

gregated personal data, but have the same inherent

ﬂaw as any other organization collecting data: users’

privacy is not inherently secured. The potential for

data misuse exists in a data cooperative just as it ex-

ists in “data silos”: a data cooperative may one day

decide to violate users’ privacy by allowing a myriad

of so-called “legitimate interests” to peek at the data.

To remove this possibility, your data needs to be en-

crypted.

Homomorphic encryption (HE) enables the shar-

ing of and computation over private data without dis-

closing the data itself to a third party. A toy example

would be a client using BFV to encrypt two integers,

x → E(x) and y → E(y), and ofﬂoading the product

operation to a 3rd party. The 3rd party would follow

the BFV procedure for multiplication and return E(z),

the encrypted product. Despite not knowing the clear-

text values for x and y, the third party will successfully

compute E(z), which will be decrypted by the client,

revealing z in cleartext. HE has already been applied

to cloud computing and relational databases.

With advancements in HE schemes BFV (Fan

and Vercauteren, 2012) (Iliashenko and Zucca, 2021),

TFHE (Chillotti et al., 2019), HEAAN (a.k.a.

CKKS) (Cheon et al., 2017), and more recently,

CHIMERA(Boura et al., 2020) and PEGASUS (jie

Lu et al., 2021), which enable conversion between

BFV, HEAAN, and TFHE, new possibilities in per-

formant fully homomorphic encryption (FHE) have

arisen. Where previously impossible, division of arbi-

trary ciphertexts is now a reality (via a CHIMERA or

PEGASUS scheme). By using homomorphic encryp-

tion within graph algorithms, a HE graph database can

be created that allows users to share their data in a way

that works for them through a data cooperative with a

greater level of security compared to a cleartext based

approach.

2 MOTIVATION

Graph analysis is core to solving many optimization

problems, and graph databases allow more ﬂexibility

in both data types and data relationships than tradi-

tional relational databases. One goal of a data co-

operative is to rapidly adapt to changing data types,

so a database with a set schema is not ideal. Graph

databases are typically schema-less, and this ﬂexibil-

ity allows them to adapt well to a wide variety of

data and changes in relationships between data. This

means that a graph database more easily adapts to the

types of data that are likely to be used in a data coop-

erative (Parra-Moyano et al., 2020). Graph databases

allow for well-established graph analysis algorithms,

such as page rank, label propagation, shortest path,

centralities, and others, to be used to gain insight on

their data. Thus, the graph structure lends itself to the

purposes of a data cooperative: (1) to allow partic-

ipants to contribute the data they want to invest–this

means different users will contribute different types,

amounts, and levels of detail in their data and (2) pro-

vide analytics over the data made available by par-

ticipants to meet the needs of researchers, companies,

governments, and more.

Homomorphic encryption can be used to share a

third parties’ computational power without leaking

any private data, or for use in multiparty computa-

tion in the case where a collection of parties want to

perform collaborative computations using their aggre-

gate data but not leak any raw data to their counter-

parts. These recent advancements in fully homomor-

phic encryption have lead to the possibility of more

performant arbitrary operations on encrypted data.

FHE enables the creation of data cooperatives

where privacy of participants is not a simple promise:

it is mathematically provable (Lyubashevsky et al.,

2012). Using multiparty HE (Mouchet et al., 2021)

or multi-key HE (L

opez-Alt et al., 2012) and work-

ing only on encrypted data, the chance of a successful

data breach can be greatly reduced.

2.1 Problem Deﬁnition

Data cooperatives are not inherently a panacea to

privately-held mass data collections. A textbook data

cooperative has the same two fundamental ﬂaws as

privately-held data collections: (1) the holding entity

can peruse the personal data of users and (2) a sin-

gle data breach can expose the personal data of thou-

sands to millions of users as was the case when Face-

book was breached (Holmes, 2021). Both of these

fundamental ﬂaws arise from the same problem: the

holding entity stores and operates on data in cleart-

ext. A solution to this problem is fully homomorphic

encryption.

To enable graph-based data cooperatives to work

under fully homomorphic encryption schemes, cur-

rent graph algorithms must be adapted to work with-

out control ﬂow variance. Speciﬁcally, the control

ﬂow of the algorithm must be blind to the data be-

ing operated on. This denies use of loops that are not

range-based and data-conditional branching in control

ﬂow. However, conditional storage (ie. multiplexers)

or HE comparison (Cheon et al., 2019) can still be

used, as they do not alter control ﬂow.

SECRYPT 2022 - 19th International Conference on Security and Cryptography

206

2.2 Our Contributions

While some graph algorithms have been demon-

strated over homomorphic encryption (Anagreh et al.,

2021a) (Aly et al., 2013) (Meng et al., 2015) (Wang

et al., 2017) (Zhang et al., 2020) (Dockendorf et al.,

2021), ours tend to differ from the current solutions

in the artifacts that they yield. Minimum spanning

tree has been explored using oblivious parallel RAM

(Laud, 2014) and a homomorphic version of Prim’s

algorithm (Anagreh et al., 2021b). Our HE MST im-

plementation uses a Kruskal’s-style approach to min-

imum spanning tree, which yields signiﬁcantly differ-

ent artifacts when the graph in question is not con-

nected: Prim’s yields a tree of the minimum for-

est that contains the starting vertex, while Kruskal’s

yields an entire minimum forest over all vertices.

Both our HE Bellman-Ford and HE Floyd-Warshall

enable re-constructable paths in addition to generat-

ing a distance vector or matrix respectively. In this

paper, we adapt Bellman-Ford, Floyd-Warshall, heap-

sort, Kruskal’s algorithm, harmonic centrality, ran-

dom walk, and betweenness centrality to work in a

HE scheme.

We have previously demonstrated ring-based

BFS, degree centrality, and farness centrality over HE

graph data (Dockendorf et al., 2021). In this paper,

we demonstrate:

Parallel HE Floyd-Warshall with Reconstructable

Paths. We offer a homomorphic Floyd-Warshall with

up to O(V

) parallelism (which may use individual ci-

phertexts or row-packed ciphertexts) that also creates

reconstrucable shortest paths, where (Anagreh et al.,

2021a) calculated only distance. Our Floyd-Warshall

is also trivially adaptable to packed HEAAN cipher-

texts and we provide pseudocode to do so.

HE Minimum Spanning Forest. We offer a homo-

morphic minimum spanning tree algorithm, homo-

morphic Kruskal’s algorithm, that has unique artifacts

compared to existing work (Anagreh et al., 2021b)

when the graph is disconnected. When G is discon-

nected, our HE Kruskal’s returns a minimum forest

over all connected components.

HE Harmonic Centrality. To our knowledge, we are

the ﬁrst to demonstrate a harmonic centrality calcula-

tion over a HE graph.

HE Betweenness Centrality. To our knowledge, we

are the ﬁrst to demonstrate homomorphic between-

ness centrality calculation over a HE graph.

HE Random Walk. We demonstrate homomorphic

random walk for HE graphs using a random 1-hot

vector encoding. This encoding produces one vertex

per vector in a matrix that is |V | by s, where s is the

number of steps in the random walk.

3 ARCHITECTURE

Figure 1: Our research focuses almost exclusively on level 2

and 3 operations. These upper-level operations focus on en-

abling graph algorithms over HE graph data. Parts or results

of level 2 operations are used for level 3 operations. For ex-

ample, betweenness uses parts of shortest path algorithms

and harmonic centrality uses the result of all-pairs-shortest-

path as its input. Atomic operations are assumed to be pro-

vided by the underlying homomorphic encryption schemes

(BFV, HEAAN, CHIMERA, PEGASUS, etc.), creating a

pseudo-API that we use to implement more complex func-

tionality. Some level 1 operations are also provided by cer-

tain HE data types. Where level 1 operations are not inher-

ent to the HE scheme, we create wrappers that implement

the functionality. This layered approach allows us to focus

on graph algorithms over HE graph data and leave the core

functionality and optimization of HE schemes to other re-

search teams.

The project follows a modular system, with arith-

metic operations being deﬁned in HEﬁxed and HE-

double (TFHE (Chillotti et al., 2019) bit-based im-

plementations), graph algorithms that use generic op-

erators deﬁned in HEgraph, and simple experiment

drivers as executable programs. HEgraph is a library

that contains homomorphic versions of BFS, degree

centrality, page rank, Bellman-Ford, Floyd-Warshall,

Kruskal’s (and its accompanying heapsort), random

walk, harmonic centrality, and betweenness centrality

as well as some trivial graph manipulations. An ex-

ample of one such trivial graph manipulation is con-

verting adjacency matrices to direct-distance matri-

ces: hollow matrices with all zeroes that are not on

the diagonal converted to “inﬁnity”. Direct distance

is used by several of our algorithms; its pseudocode

follows.

Direct Distance

Input: G, the encrypted adjacency matrix

Output: M, the encrypted direct-distance matrix

let M be a matrix of dimension |V| by |V|

for all i := 0 to |V|-1:

for all j := 0 to |V|-1:

if (i = j):

Graph Algorithms over Homomorphic Encryption for Data Cooperatives

207

M[i,j] := E(0)

else:

M[i,j] := (G[i,j] != E(0)) ?

G[i,j] : E(inf)

While HEﬁxed uses procedural circuits (ie. ﬂow

adder), HEdouble uses imported Bristol fashion cir-

cuits for IEEE-754 64-bit ﬂoating point values. We

implemented a circuit executor that used TFHE

boolean primitives and imported bristol fashion cir-

cuits to create an interface that provided all arithmetic

operations over IEEE-754 values.

Experiment drivers are trivial programs that:

1. import graphs or generate random graphs

2. invoke HEgraph’s encryption over the graph

3. time the duration of a chosen graph algorithm

4. decrypt the result

5. verify the result (using boost::graph where possi-

ble)

6. (optionally) display the result to the tester

Figure 2: Overall system architecture. The libhegraph con-

tains the primary contribution of this paper. Libhegraph

was written in a primitive-agnostic fashion, allowing the

numerical types library (HEﬁxed/HEdouble) to be swapped

for other implementations with minimal code modiﬁcation.

Blocks in yellow were written by us, but do not contain sig-

niﬁcant novelty and red blocks are works from other parties.

The interface between experiment drivers

and HEgraph is comprised of graph-level oper-

ations. The interface between HEgraph and the

HEﬁxed/HEdouble libraries is arithmetic operations,

which HEgraph uses to implement graph operations.

HEﬁxed/HEdouble interface with TFHE via boolean

logic to implement their arithmetic operations.

HEgraph can be conﬁgured to use HEAAN or BFV

schemes on algorithms that support it.

For the results presented in this paper, we used

exclusively HEﬁxed and HEdouble so that all results

would be inter-operable and could be used as part of

a query system for HE data. For example, the result

of homomorphic Kruskal’s could be packed back into

a graph structure, allowing all edges of a given vertex

that are part of the MST to be selected via binary ma-

trix multiply, a procedure described in our previous

paper (Dockendorf et al., 2021). This interoperability

of results is key to creating a query system for data

cooperatives using graph algorithms.

While the performance of the demonstrated

scheme is low, the bit-based FHE used can be eas-

ily replaced with a CHIMERA (Boura et al., 2020) or

PEGASUS (jie Lu et al., 2021) scheme for improved

performance. This modularity arises from HEgraph

using exclusively arithmetic as its interface to FHE

primitive libraries.

The overall ﬂow of events is the same for all graph

algorithms tested. Since arithmetic operations are iso-

lated in a separate library from graph algorithms, our

TFHE-based arithmetic library can be exchanged for

any other arithmetic library that satisﬁes the needs of

the algorithm(s) in question.

Figure 3: Event ﬂow between various components. While

the details of the algorithms vary, all follow the same over-

arching pattern. The experiment driver loads graph data,

invokes and times HEgraph’s operations, and compares re-

sults to known correct values. HEgraph contains all graph

algorithm logic and relies on external libraries for arith-

metic implementations. HEﬁxed/HEdouble provide arith-

metic operations over TFHE bits.

4 EXPERIMENTAL SETUP

Experiments were run using the following software

environment:

• Ubuntu 20.04.3 LTS operating system

• Compiler toolchain LLVM 10.0 (from Ubuntu

repository)

SECRYPT 2022 - 19th International Conference on Security and Cryptography

208

• TFHE version 1.1 (from Github repository)

(Chillotti et al., 2016)

• FFTW-3 version 3.3.8 for FFT implementation

(from Ubuntu repository)

• Linux kernel 5.4 (from Ubuntu repository)

As all data ﬁt comfortably in memory, no pages

were swapped to disk, and timing did not include

reading data from disk or parsing; storage is not a

factor in the results. Experiments for Bellman-Ford,

Floyd-Warshall, heapsort, and Kruskal’s were run on

the following hardware environment:

• CPU: AMD 3960X (24C, 48T)

• Memory: 64 (16x4) GB ECC DDR4-3200

Experiments for harmonic centrality, random

walk, and betweenness centrality were run on the fol-

lowing hardware environment:

• CPU: AMD 3900X (12C, 24T)

• Memory: 32 (16x2) GB DDR4-3200

5 RESULTS

All experiments were performed with a bit-

constructed TFHE schemes; the complexities shown

in this paper are worst-case as no ciphertext packing

was used to allow SIMD. Homomorphic heapsort,

Bellman-Ford, Floyd-Warshall, and Kruskal’s were

performed using a ﬁxed-point system. Random

walk, betweenness centrality, and harmonic centrality

were performed using a TFHE ﬂoating-point scheme

created by importing Bristol fashion circuits.

All algorithms explored here can be performed

on TFHE and CHIMERA/PEGASUS (Boura et al.,

2020)(jie Lu et al., 2021) schemes. Heapsort,

Bellman-Ford, Floyd-Warshall, and Kruskal’s are

also valid for BFV/BGV and HEAAN schemes due

to not needing division by ciphertext values.

Before exploring graph algorithms, a supporting

algorithm is needed: speciﬁcally, an efﬁcient sorting

algorithm.

5.1 Homomorphic Heapsort

The primary disadvantage when sorting homomor-

phic values is that the best-case growth complexity

that can be obtained from a particular sort is the worst-

case complexity; that is, if the sort can even be con-

verted to an HE algorithm. This is due to all unstable

sorts having to be made data-oblivious: perform the

same operation regardless of the comparison results.

Heapsort has the advantages of being a Θ(nlog(n))

sort and using only swap/no-swap based on compar-

isons of homomorphic values. As a result, cleartext

heapsort does not signiﬁcantly differ with its homo-

morphic counterpart and can be made stable using

min/max or a multiplexer.

For homomorphic heapsort to work, a comparison

and conditional swap is needed. This is possible on

BFV, BGV, HEAAN, and TFHE (bitwise) schemes.

While homomorphic quicksort, mergesort, and inser-

tion sort have been shown in previous work (Chat-

terjee and Sengupt, 2015), heapsort was not demon-

strated. This heapsort deomonstrated in 4 uses a Θ(b)

(where b is number of bits) compare-and-swap oper-

ator in a TFHE integer scheme.

Figure 4: Homomorphic heapsort growth complexity is lin-

earithmic with very small variance. This is expected as

cleartext heapsort is also linearithmic and the only modi-

ﬁcations required are a homomorphic comparison and con-

ditional swap operation. Besides this minor change, the al-

gorithm’s pseudocode is unchanged.

5.2 Homomorphic Bellman-Ford

Bellman-Ford is an alternative to Dijkstra’s algorithm

that uses a dynamic-programming approach. Since it

uses only storage based on conditionals does not al-

ter control ﬂow based on the smallest value (as Dijk-

stra’s does), it is an ideal algorithm to transform to use

to solve the homomorphic single-source shortest-path

problem (SSSP).

Homomorphic Bellman-Ford’s pseudocode is

nearly identical to cleartext Bellman-Ford other than

2 major differences. The ﬁrst major difference is

that the direct-distance matrix is computed, making M

from G, a hollow matrix with no zeroes other than on

the diagonal. This change will cause all ’0’ entries in

the adjacency matrix to become inﬁnity; this step can

be skipped if such a matrix was pre-computed. The

second major difference comes from the use of an op-

eration that may take the form of a conditional store

or multiplexer that makes use of a homomorphic com-

parison operator (Cheon et al., 2019) (Iliashenko and

Zucca, 2021). Homomorphic Bellman-Ford requires

only addition and compare-and-swap operators: it is

Graph Algorithms over Homomorphic Encryption for Data Cooperatives

209

valid for BFV, BGV, HEAAN, and TFHE (assuming

an addition circuit is provided) schemes.

Unlike Floyd-Warshall, a multi-threaded imple-

mentation of Bellman-Ford would offer only O(V )

parallelism. Furthermore, Bellman-Ford offers short-

est paths for only a single vertex, while having the

same growth complexity as Floyd-Warshall. With

better alternative optimizations, multi-threaded ho-

momorphic Bellman-Ford was not implemented.

Figure 5: Bellman-Ford growth complexity is cubic with

very small variance. No additional complexity is added due

to HE. Parallelism is possible, but was not implemented due

to Floyd-Warshall having similar complexity while produc-

ing shortest paths for all pairs.

5.3 Homomorphic Floyd-Warshall

Floyd-Warshall is a dynamic programming solution

to the all-pairs shortest-path problem (APSP). The

time complexity of single-threaded Floyd-Warshall is

O(V

); this bound remains unchanged in the homo-

morphic implementation.

Multi-threaded Floyd-Warshall takes some addi-

tional setup time as it must ﬁrst allocate a second dis-

tances matrix and predecessors matrix. Since, in the

worst case, Floyd-Warshall must reach the last ver-

tex every loop in order to get the new shortest path

for a vertex, the 2 inner loops can collapsed and run

completely in parallel assuming that the previous re-

sult is held constant. This provides O(V

) paral-

lelism. Swapping previous and current results every

outer loop continues until the outer loop of Floyd-

Warshall completes, at which time the distance and

predecessor matrix that were last written to are re-

turned. Homomorphic Floyd-Warshall can be shown

to be data-equivalent to its cleartext counterpart sim-

ply by replacing the ternaries with if-else statements

and performing unnecessary-statement eliminations.

Parallel Homomorphic Floyd-Warshall

Input: G, the encrypted adjacency matrix

Output: D, encrypted distance matrix, and N, the en-

crypted next-step matrix for reconstructing paths

D1, D2, N1, N2 are |V| by |V| matrices

for every possible edge in G[i,j]:

if i = j:

D1[i,j] := E(0)

N1[i,j] := E(j)

else:

D1[i,j] := (G[i,j] > E(0)) ? G[i,j] : E(inf)

N1[i,j] := (G[i,j] > E(0)) ? E(j) : E(null)

for k := 0 to |V|-1:

#parallel for collapse(2)

for i := 0 to |V|-1:

for j := 0 to |V|-1:

D2[i,j] := (D1[i,k] + D1[k,j] < D1[i,j]) ?

D1[i,k] + D1[k,j] : D1[i,j]

N2[i,j] := (D1[i,k] + D1[k,j] < D1[i,j]) ?

N1[i,k] : N1[i,j]

pointer swap D1 <-> D2

pointer swap N1 <-> N2

return D1, N1

Note that the matrices may use row-packed

BFV/HEAAN ciphertexts in conjunction with homo-

morphic comparison (Cheon et al., 2019) (Iliashenko

and Zucca, 2021) to parallelize the j loop. This al-

ternative would create O(V ) thread parallelism with

O(V ) SIMD parallelism through packed ciphertexts.

The adapted pseudocode for the main loop follows.

Parallel Packed HE Floyd-Warshall

Input: G, the encrypted adjacency matrix with row-

packed ciphertexts

Output: D, encrypted distance matrix, and N, the en-

crypted next-step matrix for reconstructing paths

D1, D2, N1, N2 are |V| by |V| matrices

// direct distance omitted

for k := 0 to |V|-1:

#parallel for

for i := 0 to |V|-1:

// vector with |V| instances of D1[i,k]

d := E(vector with 1-hot at index k)

n := d * N1[i]

d := d * D1[i]

d := d.sum_all_rotations()

n := n.sum_all_rotations()

D2[i] := (d + D1[k] < D1[i]) ?

d + D1[k] : D1[i]

N2[i] := (d + D1[k] < D1[i]) ?

n : N1[i]

pointer swap D1 <-> D2

pointer swap N1 <-> N2

return D1, N1

The above packed ciphertext version could yield

better growth complexity, assuming sum

otations

or a similar operation that propagates the value from

a single slot into all slots can be performed in O(1)

or O(log(V )) bootstrappings. This is similar in

concept to the Floyd-Warshall found in (Anagreh

et al., 2021a), but does not offer interoperability with

SECRYPT 2022 - 19th International Conference on Security and Cryptography

210

TFHE-only results from other sections of this paper,

so it was not used in HEgraph. We plan to adopt

CHIMERA, PEGASUS, or a similar library as a com-

mon HE scheme for all algorithms presented in this

paper.

Figure 6: Floyd-Warshall growth complexity retains its

O(V

) cleartext growth complexity when performed over

HE data when not using packed ciphertexts.

Figure 7: Multi-threaded Floyd-Warshall without packed

ciphertexts is also O(V

), but scales well with hardware

parallelism: about 23x faster on a 24-core machine.

5.4 Homomorphic Kruskal

Kruskal’s algorithm builds a minimum spanning for-

est from a graph (a single tree if the graph is con-

nected). Kruskal’s uses a union-ﬁnd data structure

with each vertex initially in its own set. Kruskal’s

then proceeds to add edges between vertices of differ-

ent sets from lowest weight to highest, merging sets

whenever an edge is added.

The major difference between other homomorphic

minimum spanning tree algorithms and ours occurs

when the graph is not connected. When the graph

in question is not connected, our algorithm returns a

minimum spanning forest of the graph: a set of trees

that are a MST for the vertices they span. The unique

artifact for this is that any edge added to the graph

that would link two trees of the returned forest will be

a part of the new minimum spanning forest regardless

of its weight.

The minimum spanning forest for any graph has

at most V − 1 edges. Since execution path cannot

change based on homomorphic data, V − 1 edges will

always be returned, but some may be marked as in-

valid (INT MAX) and will always be at the end of the

list. Homomorphic Kruskal assumes the input graph

is undirected and is valid for BFV, BGV, and TFHE

schemes.

Since we return the top |V | − 1 entries regardless

of whether the graph is connected or not, there may

be edges returned as such: ∞ → ∞ with ∞ cost. We

call ∞ (or −1) the false vertex because it will never

be indexed when the tree is loaded back into a graph

structure. Thus, these inﬁnite weight self loops on the

false vertex will never affect a computational outcome

on the resultant graph.

Homomorphic Kruskal’s Algorithm

Input: G, the encrypted adjacency matrix

Output: edges, encrypted list of edges in tuple form

edges is a list of tuple (w,i,j)

edges := all G[i,j] where j > i,

w := (G[i,j]) ? G[i,j] : E(inf)

make_sets(0..V-1)

sort(edges by w)

foreach edge in edges:

s1 := find_set(edge.i)

s2 := find_set(edge.j)

sn := max(s1,s2)

union(edge.i,edge.j,sn)

edge := (s1 != s2) ?

edge : (E(inf),E(inf),E(inf))

sort(edges by w)

return edges[0..V-1)

Copy Graph’s Upper-right Values Copy the upper

right of the matrix (not including diagonal) and as-

semble tuples; note that V

(row) will always be less

than V

(column) due to this. If an edge does not ex-

ist in the adjacency matrix, the edge weight is set to

inﬁnity.

Sort Copied Edges by Weight Running a sort on

the edges ensures that when iterating through them,

smaller edges are always encountered ﬁrst.

Repeated Union-ﬁnd Iterate through the sorted edges

performing ﬁnd on each vertex in each edge. Union

the two vertices’ sets (this results in no change if the

vertices are already in the same set). If the vertices are

already part of the same set, set the weight and vertex

indices on the edge to inﬁnity.

Sort Edges Again This causes all edges that were

not part of the minimum spanning forest (and sub-

sequently had their weights set to inﬁnity) to fall to

lower positions. Performing union-ﬁnd over all ver-

tices and sparing only edges that joined two disjoint

sets will spare a maximum of V − 1 edges. These

edges are part of the minimum spanning tree/forest.

Graph Algorithms over Homomorphic Encryption for Data Cooperatives

211

Return Top V − 1 Edges After the second sort, a max-

imum of V − 1 edges remain with valid values. Users

of this data need to understand that not all edges will

be valid if the graph was disconnected. When a graph

is disconnected, Kruskal’s returns a minimum span-

ning forest. All invalid edges will be at the end of the

list, so after the ﬁrst invalid value is encountered, all

subsequent entries may be ignored.

Figure 8: As the homomorphic algorithm uses a naive ap-

proach to performing Kruskal and TFHE-based ciphertexts,

complexity is O(V

). We are currently exploring a more

efﬁcient implementation. Using CHIMERA or PEGASUS

will also improve performance.

5.5 Homomorphic Betweenness

Centrality

We measured the execution time of betweenness cen-

trality on different sized graphs and previously we de-

termined that the runtime complexity of data blind be-

tweenness centrality is O(N

). Figure 9 shows raw

execution time, which is on the order of hours for a

relatively small graph and ﬁgure 10 which shows the

5th root of the runtime vs the number of vertices. We

are currently exploring a more efﬁcient implementa-

tion that uses dynamic programming to improve over

the naive approach.

Homomorphic betweenness centrality is valid

only on TFHE and CHIMERA/PEGASUS-style HE

schemes as division by HE ciphertexts is required.

Figure 9: Naive homomorphic betweenness centrality (our

current algorithm) has a very long runtime: taking hours to

complete, even on small graphs.

Figure 10: Linearizing the runtime of betweenness central-

ity to the 5th root shows a strait line and validates our anal-

ysis of O(V

5.6 Homomorphic Harmonic Centrality

For harmonic centrality, we assume a pre-computed

distance matrix is present (from homomorphic Floyd-

Warshall, etc.). The resultant runtime is linear with

respect to the number of vertices in the graph. If

the distance matrix is not pre-computed, complexity

for the ﬁrst harmonic centrality is O(V

) and all sub-

sequent calls are O(V ), assuming the graph has not

changed and the distance matrix is cached. Harmonic

centrality is faster than betweenness centrality as it

takes under 11 seconds to compute the harmonic cen-

trality for any given vertex in a graph of 50 vertices

using our TFHE-based HEdouble. Harmonic central-

ity requires inversion of ciphertexts, thus it is not valid

on standard BFV or HEAAN schemes, but is valid for

TFHE or CHIMERA/PEGASUS-based HE schemes.

Figure 11: With a pre-computed distance matrix (from ho-

momorphic Floyd-Warshall, etc.), harmonic centrality has

O(V ) growth complexity.

5.7 Homomorphic Random Walk

Random walk selects a random neighbor, adds it to

a list and moves onto that neighbor: this is repeated

s times to form a random path with s + 1 vertices.

Homomorphic random walk entails selecting a vec-

tor of all vertices and creating a random 1-hot encod-

ing from among the current vertex’s neighbors. Per-

SECRYPT 2022 - 19th International Conference on Security and Cryptography

212

forming a matrix-vector multiply with the transposed

weighted adjacency matrix results in the next vector

in the series.

It takes on the order of 20 minutes to do a random

walk of two steps on a graph of size 50 using a ﬂoat-

ing point TFHE-based scheme. In ﬁgure 12, we can

see a growth of O(V

∗ s). The slow growth is due to

parallelization that was done on the matrix multipli-

cation.

Select random is a supporting function for random

walk, so its pseudocode has been included as well.

Select Random

Input: v, an encrypted vector

Output: v out, encrypted 1-hot vector from among

non-zeros in v

r := E(random in range [0,1))

r := r * (sum of elements in v)

v_out := zero initialized vector with |V| elements

flag := !(sum > 0)

for i := 0 to |V|-1:

r := r - v[i]

v_out[i] := (r < E(0) && !flag) ? E(1) : E(0)

flag := r < E(0) | flag

return v_out

Homomorphic Random Walk

Input: G, the encrypted weighted adjacency matrix

Output: V, encrypted list vectors

V := empty list of vectors

vec := zero vector

vec[v] := 1

for i := 0 to s-1:

vec := G * vec

vec := select_random(vec)

V.append(vec)

return V

Homomorphic random walk generates a list of

1-hot encoded vectors that is a result of doing a

weighted random selection of outgoing edges and

traversing them s times starting from vertex v. The

edges of the supplied graph can have arbitrary

weights. This implementation does not prohibit “pac-

ing” (alternating between 2 vertices), but could be

modiﬁed to do so.

Clearly, since r ∗sum < sum when r ε [0, 1), r will

eventually go negative if the values that were added to

get sum are subtracted from r. When r goes negative,

f lag will be true for all subsequent loops. The f lag

prevents multiple values from being set to 1, so vec

must be 1-hot from among nonzero values in vec.

Since P appends only 1-hot encoded vectors cho-

sen from among the outgoing edges of the previously

selected vertex, the hot elements of P form a path.

The randomness of this path depends on the RNG

used; therefore, this algorithm could be vulnerable to

malicious or lazy 3rd party random number selection.

Figure 12: Homomorphic random walk has a growth com-

plexity of O(V

∗ s), where s is the number of steps in the

walk. This experiment performed 2 steps (3 total vertices)

in a 50-vertex graph.

6 CONCLUSION

This collection of HE graph algorithms serves as a

core component of data cooperatives that decide to

use homomorphic encryption to protect their partic-

ipants’ data. We used a modular design that allows

a library providing basic HE arithmetic data types to

be swapped out as future advancements are made in

performant fully homomorphic encryption.

In this paper, we demonstrated numerous com-

mon graph algorithms adapted to work on HE graphs.

We demonstrated these algorithms on graphs with

individually packed ciphertexts in order to form di-

rect comparisons to the growth rates of their cleart-

ext counterparts. With packed ciphertexts, certain al-

gorithms, especially Bellman-Ford, Floyd-Warshall,

harmonic centrality, and random walk, will be much

faster as bootstrapping time tends to dominate calcu-

lation time in HE schemes.

Our homomorphic Floyd-Warshall goes beyond

shortest distance, creating re-constructable paths un-

der homomorphic encryption. Our homomorphic

Kruskal’s creates a minimum spanning forest when

the graph is not connected and can be further sped up

using parallel sorting.

We demonstrate betweenness centrality over HE

graphs, which requires division of ciphertexts in the

second to last step (immediately before the ﬁnal sum-

mation). As all prior steps use operations supported

by BFV and HEAAN, faster calculation of between-

ness centrality will be made possible with dynamic

programming as well as using a HEAAN or BFV

to TFHE conversion immediately before the division

step instead of a purely-TFHE implementation; these

are left to future work.

Graph Algorithms over Homomorphic Encryption for Data Cooperatives

213

This work enables secured third-party analysis

of graph data from numerous sources and the cre-

ation and use of graph structures in HE data cooper-

atives. Applications for these HE graph algorithms

include privacy-preserving contact tracing, privacy-

preserving city planning, privacy-preserving cooper-

ative cyber-defense, and much more (Pentland and

Hardjono, 2020).

ACKNOWLEDGMENT

We thank National Security Agency for the partial

support through grants H98230-20-1-0329, H98230-

20-1-0403, H98230-20-1-0414, and H98230-21-1-

0262.

REFERENCES

Aly, A., Cuvelier, E., Mawet, S., Pereira, O., and Van Vyve,

M. (2013). Securely solving simple combinatorial

graph problems. volume 7859, pages 239–257.

Anagreh, M., Laud, P., and Vainikko, E. (2021a). Parallel

privacy-preserving shortest path algorithms. Cryptog-

raphy, 5(4).

Anagreh, M., Vainikko, E., and Laud, P. (2021b). Paral-

lel privacy-preserving computation of minimum span-

ning trees.

Boura, C., Gama, N., Georgieva, M., and Jetchev, D.

(2020). Chimera: Combining ring-lwe-based fully ho-

momorphic encryption schemes. Journal of Mathe-

matical Cryptology, 14(1):316–338.

Chatterjee, A. and Sengupt, I. (2015). Searching and sorting

of fully homomorphic encrypted data on cloud.

Cheon, A. H., Kim, D., and Ki, D. (2019). Efﬁcient homo-

morphic comparison methods with optimal complex-

ity.

Cheon, J. H., Kim, A., Kim, M., and Song, Y. (2017). Ho-

momorphic encryption for arithmetic of approximate

numbers. In Takagi, T. and Peyrin, T., editors, Ad-

vances in Cryptology – ASIACRYPT 2017, pages 409–

437, Cham. Springer International Publishing.

Chillotti, I., Gama, N., Georgieva, M., and Izabach

ene, M.

(2019). Tfhe: Fast fully homomorphic encryption

over the torus. Journal of Cryptology, 33:34–91.

Chillotti, I., Gama, N., Georgieva, M., and Izabach

ene, M.

(August 2016). TFHE: Fast fully homomorphic en-

cryption library. https://tfhe.github.io/tfhe/.

Dockendorf, M., Dantu, R., Morozov, K., and Bhowmick,

S. (2021). Investing data with untrusted parties using

he. In SECRYPT.

Fan, J. and Vercauteren, F. (2012). Somewhat practical fully

homomorphic encryption. Cryptology ePrint Archive,

Report 2012/144. https://ia.cr/2012/144.

Farahat, A. (2013). How effective is targeted advertising?

In 2013 American Control Conference, pages 6014–

6021.

Holmes, A. (2021). 533 million facebook users’ phone

numbers and personal data have been leaked online.

Insider.

Iliashenko, I. and Zucca, V. (2021). Faster homomorphic

comparison operations for bgv and bfv. Proceedings

on Privacy Enhancing Technologies, pages 246–264.

jie Lu, W., Huang, Z., Hong, C., Ma, Y., and Qu, H.

(2021). Pegasus: Bridging polynomial and non-

polynomial evaluations in homomorphic encryption.

2021 IEEE Symposium on Security and Privacy (SP),

pages 1057–1073.

Laud, P. (2014). Privacy-preserving minimum spanning

trees through oblivious parallel ram for secure mul-

tiparty computation.

opez-Alt, A., Tromer, E., and Vaikuntanathan, V. (2012).

On-the-ﬂy multiparty computation on the cloud via

multikey fully homomorphic encryption. Proceedings

of the Annual ACM Symposium on Theory of Comput-

ing.

Lyubashevsky, V., Peikert, C., and Regev, O. (2012). On

ideal lattices and learning with errors over rings.

Cryptology ePrint Archive, Report 2012/230. https:

//ia.cr/2012/230.

Meng, X., Kamara, S., Nissim, K., and Kollios, G. (2015).

Grecs: Graph encryption for approximate shortest dis-

tance queries. In CCS 2015 - Proceedings of the 22nd

ACM SIGSAC Conference on Computer and Com-

munications Security, Proceedings of the ACM Con-

ference on Computer and Communications Security,

pages 504–517. Association for Computing Machin-

ery.

Mouchet, C., Troncoso-Pastoriza, J. R., Bossuat, J.-P., and

Hubaux, J.-P. (2021). Multiparty homomorphic en-

cryption from ring-learning-with-errors. Proceedings

on Privacy Enhancing Technologies, 2021:291 – 311.

Parra-Moyano, J., Schmedders, K., and Pentland, A.

(2020). 3. shared data: Backbone of a new knowledge

economy. In Building the New Economy. 0 edition.

https://wip.mitpress.mit.edu/pub/yvy3qigg.

Pentland, A. and Hardjono, T. (2020). 2. data cooper-

atives. In Building the New Economy. 0 edition.

https://wip.mitpress.mit.edu/pub/pnxgvubq.

Wang, Q., Ren, K., Du, M., Li, Q., and Mohaisen, A.

(2017). Secgdb: Graph encryption for exact short-

est distance queries with efﬁcient updates. In Kiayias,

A., editor, Financial Cryptography and Data Security,

pages 79–97, Cham. Springer International Publish-

ing.

Zhang, C., Zhu, L., Xu, C., Sharif, K., Zhang, C., and Liu,

X. (2020). Pgas: Privacy-preserving graph encryption

for accurate constrained shortest distance queries. In-

formation Sciences, 506:325–345.

SECRYPT 2022 - 19th International Conference on Security and Cryptography

214