Automated Search for Leaked Private Keys on the Internet:

Has Your Private Key Been Pwned?

Henry Hosseini, Julian Rengstorf and Thomas Hupperich

Department of Information Systems, University of M

unster, Germany

Keywords:

Public Key Authentication, Leakage Detection, Security Services.

Abstract:

Public key authentication is widely used alternatively to password-based credentials, enabling remote login

with a generated key pair consisting of a private key and a public key. Like passwords, private keys are

required to remain conﬁdential to prevent unauthorized access to resources. These secrets might become

subject to theft or publicly exposed unintentionally by the key’s owner. In such cases, the keys are deemed

compromised and need to be revoked and abandoned instantaneously. Unfortunately, it is rarely possible for

users to know whether their secret keys have been publicly exposed.

Closing this gap, we introduce a private key leakage checker titled KeyPwned crawling the Internet for exposed

authentication keys. We present a continuously updated database of leaked keys’ ﬁngerprints discovered on

websites or in source code repositories. For community-driven enhancement, we allow suggestions of URLs to

scan for additional leaked keys, following our standardized process. We furthermore offer users a registration

with their public keys to be notiﬁed if we detect leakage of their corresponding private key. KeyPwned is

designed to run as a service following common software design standards, empowering users to verify their

keys’ conﬁdentiality and take action if a private key has been exposed.

1 INTRODUCTION

Authentication methods aside from passwords are

used to increase authentication security and dimin-

ish the risk of data leakage. A prominent example

based on asymmetric cryptography is the SSH pro-

tocol, which facilitates public key authentication as

a replacement for password-based authentication. In

contrast to password-based authentication, public key

authentication relies on a stored private key ﬁle on

the local machine (Lonvick and Ylonen, 2006). Al-

though this authentication mechanism is considered

more secure since keys cannot be guessed like pass-

words, private key ﬁles might be leaked accidentally

or stolen from a machine. Consequently, for this au-

thentication method to be secure, it is essential to keep

private keys secret and the exposure of private keys is

deemed a threat to the conﬁdentiality of personal data,

one of the most important goals of information secu-

rity (Avizienis et al., 2004). Thus, it is crucial to know

whether an authentication key has been leaked.

For password-based login credentials, there exists

an approach named HaveIBeenPwned (Hunt, 2022),

which recently became fostered by the Federal Bu-

reau of Investigation and open source (Hunt, 2021).

The project’s website allows checking whether given

over 11 billion compromised accounts as of October

2021, this service makes an essential contribution to

security by providing a powerful service that checks

the conﬁdentiality of users’ credentials. However, the

service of HaveIBeenPwned does not take key pair

leakage into account. Our work attempts to close this

gap by providing key authentication users a means

of verifying their private keys’ conﬁdentiality. We

propose an implementation for a private key leakage

checker capable of reporting whether a private SSH

key has been publicly exposed. The software devel-

oped for this purpose is titled Key

Pwned, in acknowl-

edgment of HaveIBeenPwned.

KeyPwned has been implemented as a public web-

site

on which users can enter SSH ﬁngerprints to

check if they occur in the database. By storing the

SSH ﬁngerprint of a private key as a public iden-

tiﬁer instead of the key itself, conﬁdentiality is en-

sured. To build up the initial database of KeyPwned,

we searched for leaked authentication keys manually

and derived their ﬁngerprints. The database of leaked

keys is extensible, as presumably more keys will be-

come exposed over time. Therefore, KeyPwned al-

https://keypwned.uni-muenster.de/

Hosseini, H., Rengstorf, J. and Hupperich, T.

Automated Search for Leaked Private Keys on the Internet: Has Your Private Key Been Pwned?.

DOI: 10.5220/0011308000003266

In Proceedings of the 17th International Conference on Software Technologies (ICSOFT 2022), pages 649-656

ISBN: 978-989-758-588-3; ISSN: 2184-2833

649

lows users to point URLs out that are searched for

leaked keys. We follow a community-driven, collab-

orative approach to maintain the database as compre-

hensively as possible by letting users point at potential

key leakages. These potential key leakages are ver-

iﬁed along with other precautions so that malicious

users cannot submit arbitrary input, which may cor-

rupt the database. Thus, our approach follows an ad-

vanced, guided search strategy.

To ensure high software quality, we apply a prod-

uct quality model as deﬁned in the ISO/IEC 25010

standard (ISO/IEC, 2011). By developing and evalu-

ating KeyPwned based on this standard, high-quality

software and secure utilization by users is ensured.

Our approach raises the question, when exactly a

key may be deemed leaked. As there is no ultimate

deﬁnition, for the scope of our work we consider a

private key as leaked if it has become publicly avail-

able on the Internet without protection, regardless of

the duration it has been available.

In summary, the main contributions of our work are

as follows:

• We built a database of over 240,000 leaked private

key ﬁngerprints.

• We introduce KeyPwned as a checking service for

private key leakage, allowing users to test if their

private keys have been publicly exposed.

• We perform a standardized product quality assess-

ment of our approach based on ISO/IEC 25010.

• We allow a community-driven database extension.

• We offer a public key registration for notiﬁcations

if the corresponding private key has been exposed.

The next section of this paper describes the soft-

ware design of KeyPwned and the search for publicly

available keys. The paper will then go on to eval-

uating KeyPwned and describing the collected data.

We discuss our solution in Section 4 and provide an

overview of related work in Section 5. The ﬁnal sec-

tion summarizes the main ﬁndings of this work.

2 SOFTWARE DESIGN

The proposed service software comes in two parts.

The ﬁrst part, named private key leak checker, pro-

vides the databases and interfaces necessary for users

to check their private keys, suggest websites for au-

tomated crawling, and register for key exposure no-

tiﬁcations. The second part addresses the search for

leaked keys, extracting and processing their informa-

tion for our database.

2.1 Private Key Leak Checker

Three user interfaces form the central part of inter-

action with KeyPwned, described in the following.

The request interface allows the submission of one or

more key ﬁngerprints to receive feedback on whether

the private keys were leaked in a data breach incor-

porated in the database or if they are presumably still

safe to use. The request interface is complemented

with a report interface to enter a list of URLs that may

contain private keys. The software retrieves all possi-

ble keys from these URLs automatically and extends

the database. As a third user interface, the notiﬁca-

tion interface provides the users with the possibility

to register their ﬁngerprints with their email addresses

to receive notiﬁcations in case of leakage.

To initially populate the database with leaked pri-

vate keys, we collected website URLs by search terms

indicating the presence of private key ﬁles. For this

purpose, the Google Hacking Database (GHDB), sev-

eral websites hosting plain text ﬁles and source code,

as well as a public GitHub activity dataset are used.

Regular expressions are tailored to extract all private

key data from downloaded web pages, and their SHA-

256 as well as MD5 ﬁngerprints are calculated and

stored. Note that for conﬁdentiality and ethical rea-

sons, the extracted private keys are not stored them-

selves. As to password-protected private keys, the ﬁn-

gerprints cannot be calculated without the password.

To illustrate the interaction and architecture of the

application, we apply the C4 model (Brown, 2021) for

visualizing the software architecture.

2.1.1 Request Interface

The main interaction point is a web interface that con-

tains elements such as a text input area, a submit but-

ton, and a section with FAQs to explain the applica-

tion. Users can interact with the application by gen-

erating the ﬁngerprints of their private keys on their

local machines and copying these ﬁngerprints into the

text input area on the website. A request containing

the inserted ﬁngerprints is sent to the application’s

back-end by clicking the submit button. The ﬁnger-

print is looked up in the database to ﬁnd all possi-

ble matches. A short manual section includes help-

ful notes in the form of FAQs about how the applica-

tion works, how private key ﬁngerprints can be gen-

erated, the format the application expects them, and

what measures should be taken in case of a leak.

2.1.2 Report Interface

As a second part of the application, the report in-

terface allows reporting URLs containing leaked pri-

ICSOFT 2022 - 17th International Conference on Software Technologies

650

vate keys to extract and import these keys into the

database. The interface allows entering multiple

URLs into a text ﬁeld. A web crawler fetches the web

content of the reported URLs nightly and searches for

private keys that can be converted into ﬁngerprints for

the database. The URLs are automatically checked

with the Google Safe Browsing Lookup API to pre-

vent our tool from visiting unsafe web resources. To

verify the public accessibility of keys, we do not di-

rectly import ﬁngerprints of reported leaked keys by

users into the database. Instead, users may point us

to URLs containing leaked keys. This process en-

sures a uniﬁed procedure and that the ﬁngerprints in

the database remain valid.

2.1.3 Notiﬁcation Interface

To provide users with the possibility to get notiﬁed

in case of key leakage, we designed a third publicly

accessible interface in which users may register their

key ﬁngerprints along with an email address. The

registration process involves the following challenge-

response protocol. First, the user provides their pub-

lic key and email address. Next, the user receives a

generated nonce via the provided email address. The

user needs to sign this nonce digitally using the corre-

sponding private key of the uploaded public key. Fi-

nally, the resulting signed nonce must be entered in

the user interface to conﬁrm that the user owns the

private key and the email address. The uploaded pub-

lic key is automatically discarded after the registration

process, saving only ﬁngerprint and email address.

2.1.4 Software Architecture

From an architectural point of view, a scalable so-

lution is advisable because it is impossible to deter-

mine the future required performance with certainty

as the platform grows. Therefore, the software ar-

chitecture is designed following the service-oriented

architecture (SOA) pattern due to its ﬂexibility and

popularity. In the realization, this means that services

are deﬁned, modeled, and implemented as containers.

The Container diagram of KeyPwned is depicted in

Figure 1 and shows the applied technologies in each

container. Users can interact with the system by ﬁnd-

ing out about their private keys’ conﬁdentiality status,

by reporting URLs pointing to newly leaked private

keys, or by conducting the registration of their pri-

vate key ﬁngerprint. The software system of KeyP-

wned consists of six separate containers. The three

databases contain a) stored key ﬁngerprints retrieved

from publicly accessible sources, b) reported URLs

pointing to key leaks, and c) registered data of users

wishing to get informed about leakage of their keys.

Details on the database setup are given in subsubsec-

tion 2.1.5. The interfaces are implemented as sepa-

rate containers based on the user interaction possi-

bilities with KeyPwned. The Downloader container

is used for downloading new keys that were reported

through the report interface. The Notiﬁer container is

a service sending email notiﬁcations to the owners of

leaked keys that were registered through the notiﬁca-

tion interface. All containers use database connection

clients for interacting with the database container.

KeyPwned

[Software System]

Database

[Container: MongoDB]

Three databases storing

private key fingerprints,

reported URLs,

and email addresses

User

[Person]

Notification Interface

[Container:

TypeScript&Next.js]

Report Interface

[Container:

TypeScript&Next.js]

Report URLs with leaked keys

inserts email addresses

into

[MongoClient]

inserts URLs into

[MongoClient]

registers

[HTTPS]

reports

[HTTPS]

Request Interface

[Container:

TypeScript&Next.js]

Check key fingerprints

reads fingerprints from

[MongoClient]

Downloader

[Container: Node.js]

Script to download

new leaked private keys

from URLs

reads URLs from

and

inserts fingerprints into

[MongoClient]

reads email addresses

from

[MongoClient]

Notifier

[Container: Node.js]

Sends email notifications

to users if their

private keys get leaked

checks

[HTTPS]

Figure 1: C4 Container diagram showing the software ar-

chitecture of KeyPwned.

The software is deployed using Docker contain-

ers that can be distributed consistently and platform-

independently. The application is developed in Type-

Script, and runs in a Node.js environment. The tech-

nology and language decisions are based on maintain-

ability being a major aspect of software quality.

On the front end, the React-based Next.js frame-

work is used. The Next.js framework adds valueable

features to the React application, allowing the pre-

rendering of pages at build time through static site

generation (SSG) or rendering at request time using

server-side rendering (SSR). An Express web server

is used, for securing trafﬁc using HTTPS, protecting

the page from DoS attacks, as well as deﬁning custom

behavior for API requests.

2.1.5 Database Setup

Three databases store the ﬁngerprints of leaked pri-

vate keys, the reported queued URLs pointing to po-

Automated Search for Leaked Private Keys on the Internet: Has Your Private Key Been Pwned?

651

tential leaks, and the registered key ﬁngerprints for

notiﬁcation purposes. The ﬁngerprint database is

ﬁlled with initial data as described in subsection 2.2.

We selected the document-oriented database soft-

ware MongoDB due to its ﬂexibility and scalability. A

database is created, containing a collection that stores

the ﬁngerprint documents. Similar design concepts

are applied for the databases of reported URLs and

registered ﬁngerprints.

Newly added private keys are downloaded and

converted to ﬁngerprints. If the ﬁngerprint calcula-

tion fails, the key is considered invalid and therefore

no invalid keys can be added to the database. Storing

ﬁngerprints not only ensures key privacy as it is not

possible to exploit our service for retrieving private

keys, it also empowers the scalability of our approach

as it is more efﬁcient than storing the found keys.

Both SHA-256 and MD5 ﬁngerprint formats are

stored in the database to enable users to request infor-

mation for either format. The source domain, format-

ted including subdomains, and the date of retrieval are

saved as these are displayed to the user who looks up a

particular key. The full URL of the key source is only

saved for documentation purposes and is not shown to

the user to preserve conﬁdentiality.

2.2 Key Retrieval

The workﬂow to build the initial database of leaked

private keys encompasses three major steps including

the search, extraction, and processing of private keys,

as depicted in Figure 2 and described in the following.

2.2.1 Search

To build our initial database of publicly accessible pri-

vate keys, we used Google Search and Google Big-

Query. The search is assisted by three main types of

sources that are described in the following.

First, we used the Google Hacking Database

(GHDB)

and found suitable dorks for discovering

publicly available RSA private key ﬁles.

Second, in addition to RSA, we extend our search

to the DSA, ECDSA, and Ed25519 algorithms. Using

the search ﬂag site: allows for deeper search on spe-

ciﬁc websites. This feature is exploited to ﬁnd private

keys on three more websites known for sharing plain

text ﬁles and source code snippets, namely Pastebin,

GitHub Gist, and Searchcode.

Third, we used Google BigQuery to search for

leaked private keys. One of the publicly available

datasets on this platform includes a complete snapshot

https://www.exploit-db.com/google-hacking-database

of more than 2.8 million open-source GitHub reposi-

tories that can be ﬁltered using SQL queries with reg-

ular expressions.

2.2.2 Extraction

In the next step, the private keys need to be extracted

and temporarily stored before being converted to ﬁn-

gerprints. As Figure 2 shows, this process varies

depending on the data sources. Regarding the web

pages found via Google Search queries to contain pri-

vate keys, these were extracted from the page con-

tents using regular expressions. These regular expres-

sions are inspired by the ones that Meli et al. used

to ﬁnd private keys on GitHub (Meli et al., 2019).

URLs reported via the report interface of KeyPwned

are processed in the same way. For the approach us-

ing Google BigQuery, this step can be omitted, as the

private keys are directly extracted using SQL queries.

2.2.3 Processing

Each extracted (not-password-protected) private key

was temporarily saved, converted to ﬁngerprints,

and inserted into the fingerprints collection along

with its originating URL. This processing procedure

slightly differs depending on the source. There-

fore, customized code is developed per data source

to calculate the ﬁngerprints and check whether an ex-

tracted private key is valid. For the calculation of the

SSH ﬁngerprints, the Python library cryptography is

used (Python Cryptographic Authority, 2022). Ad-

ditionally, further processing is performed to acquire

the other required database ﬁelds such as source do-

main and timestamp of key retrieval.

3 EVALUATION

We assess the quality of the software created for our

service, and evaluate the data collection process and

the resulting database containing ﬁngerprints of pub-

licly exposed authentication keys.

3.1 Quality Assessment

The design choices of KeyPwned are based on the

requirements of the software product quality model

given in ISO/IEC 25010. Our software quality as-

sessment was guided by examining each of the eight

characteristics of this product quality model.

Evaluating the ﬁrst characteristic, functional suit-

ability, showed that the software fulﬁlls its function

to provide users with the possibility to check their pri-

vate keys against a database of leaked keys and pro-

ICSOFT 2022 - 17th International Conference on Software Technologies

652

Google Hacking Database

Public GitHub Activity

Text Files and Source Code

Saving

URLs

Extracting

Keys

Saving

Pages

Converting Keys

to Fingerprints

</>

Extraction

Processing

Fingerprints

Figure 2: Workﬂow for the retrieval of private keys and conversion into SSH ﬁngerprints.

vides the correct results for any input. However, the

database can never guarantee completeness, as leaked

private keys in the wild might not have been discov-

ered and added to the database. In order to extend the

database in the future, a report interface is provided

that allows users to report new leaks for adding new

private keys to the database.

For assessing the second characteristic, perfor-

mance efﬁciency, we performed a comprehensive

study to measure the response times of different input

sizes. The test results show that the average response

times for sampled ﬁngerprints from the database and

the response times for randomly generated ﬁnger-

prints do not differ signiﬁcantly.

As for the third characteristic, compatibility, the

assessment showed that the application could be used

ﬂexibly by other software, ensuring coexistence and

interoperability. Other applications may use the

database and the web application features through

REST API endpoints.

Evaluating the fourth characteristic, usability,

highlighted that the functional appropriateness of the

application could be recognized by keeping the UI

minimalistic. A classless CSS framework was ap-

plied, which provides a simple and accessible inter-

face. The website also provides a simple manual to

improve learnability.

Regarding reliability as the ﬁfth characteristic,

the website instrument modern database and con-

tainer virtualization software and automatic backups

every other week. Furthermore we prevent the ex-

ploitation of our service as DDoS relay by process

design: users may suggest URLs for crawling. These

suggestions are gathered once a day and then scanned

in batch. This way, a user may suggest an URL sev-

eral times but our system does not take action on user

command immediately, which could potentially result

in ﬂooding other websites and servers.

The sixth characteristic, security, is critical to the

application’s success, which deals with conﬁdential

data. The conﬁdentiality of the user’s private keys is

protected by ensuring that only the key ﬁngerprint and

not the plain text private key is transmitted. The report

interface for adding new private keys to the database

is protected by allowing only URLs to be reported.

Private key ﬁles without a veriﬁable source are not

considered for the database. For legally securing the

application, the IP addresses of requests are stored to

hold users accountable for any illegal activity.

Assessing the seventh characteristic, maintain-

ability, highlighted that it is particularly important

for further extensions to the application. Choosing

a modular software design with containers and out-

sourced components ensures reusability. The choice

of TypeScript as a language has a strongly positive

impact on modiﬁability, as post-modiﬁcation bugs are

noticed during development.

As the last characteristic, the portability of the

application was assessed. Since the currently used

virtual machine for the proof-of-concept may not be

sufﬁcient in terms of resources in the future, the soft-

ware was designed with the ulterior motive of trans-

fer capability to a more resourceful system without

affecting the functionality. By implementing all ser-

vices as Docker containers, it is ensured that the ap-

plication can easily be adapted to changing hardware

requirements and transfer to a new environment.

In summary, the results of the assessment indicate

that KeyPwned is an application developed with con-

sideration for high software quality standards. It was

designed so that scaling and further enhancements can

be realized without harming the current functionality.

3.2 Data Collection

As a ﬁrst step, we performed search queries, as de-

scribed in subsection 2.2. Private keys in ﬁve different

formats were extracted from these pages. This search

covered public key algorithms, including RSA, DSA,

ECDSA, and Ed25519.

The ﬁrst three of the source datasets presented in

Table 1 are based on queries from the GHDB. In to-

tal, 618 keys were extracted from the pages that were

found using these queries. Duplicates were removed.

Random inspection of the URLs and contents of these

pages showed that the keys are found on various

domains, such as privately hosted GitLab instances

of universities and organizations or servers exposing

their entire content, including the .ssh folder.

Automated Search for Leaked Private Keys on the Internet: Has Your Private Key Been Pwned?

653

Table 1: Number of keys per source, showing the total num-

ber and unique keys found within the respective dataset.

Source Results Unique Keys

GHDB #6337 52 11

GHDB #3888 321 117

GHDB #4455 245 73

Pastebin 138 85

GitHub Gist 385 226

Searchcode 2,646 880

GitHub 238,162 12,743

Total 241,949 14,135

As opposed to the queries from the GHDB, the

second block of queries is targeted towards speciﬁc

domains, namely Pastebin, GitHub Gist and Search-

code. In total, another 3,169 keys were discovered on

these pages and deduplicated accordingly.

The third and by far the largest number of keys

was found in the public GitHub activity dataset on

Google BigQuery. Here, 238,162 keys were extracted

in the process. Although the results contain many du-

plicates and only 12,743 keys are unique, the high

number of repositories containing secrets conﬁrms

the ﬁndings of related work, which emphasizes the

prevalence of the issue of secret leakage in public

source code repositories (see section 5).

3.3 Leakage Database

By retrieving publicly available private keys, we build

up the initial database of KeyPwned to evaluate the

feasibility of checking private keys for leakage. As

of October 2021, the KeyPwned database contains

14,135 unique ﬁngerprints of leaked private keys, as

presented in Table 1. The study demonstrates that the

static GitHub activity snapshot is a promising source

of secrets, with over 90 % of the ﬁnal database stem-

ming from GitHub. It implies that that secret leakage

is a widespread problem in public source code repos-

Table 2: Overview of the number of keys per public key

algorithm and key length.

Algorithm and Length Unique Keys

RSA-2048 6,493

RSA-1024 2,774

ECDSA-256 1,383

RSA-512 698

RSA-4096 470

DSA-1024 328

other 1,989

Total 14,135

itories. A variety of public key algorithms and key

lengths were detected in the process. Table 2 provides

an overview of the most frequent types of keys. Inter-

ested users can query the conﬁdentiality status of their

ﬁngerprints on a dedicated publicly available website.

4 DISCUSSION

Real-world Application. The use cases of authen-

tication key pairs, e. g., logging into servers, suggests

their user group belonging to IT professionals, admin-

istrators, and developers. The population of this user

group happens to be smaller than average users, who

would only check their usernames and passwords for

data leaks. Consequently, KeyPwned ﬁnds applica-

tions among the tech-savvy users who would check

for the conﬁdentiality of their private keys.

Impact. IT professionals, administrators, and de-

velopers are usually responsible for operating IT re-

sources, including protecting conﬁdential or personal

data. This raises the importance of their authentica-

tion credentials’ security, including their correspond-

ing private keys, since a compromised admin account

would endanger the security of user data. Hence,

KeyPwned indirectly contributes to the security of

users as well. As there have been several studies on

the leakage of private keys, the complementary re-

quirement of a trustworthy database of these leaked

private keys seems to be essential. We believe that

it provides a signiﬁcant contribution to the security

world and closes an existing gap. Still, a longitudinal

study to acquire the target group’s opinion and feed-

back about our service is to be conducted.

Ethics. Some of the over 240,000 downloaded private

keys may be invalid, unused, or only used for testing

purposes and not necessarily of high-security impor-

tance. However, this could only be tested by perform-

ing brute-force login attempts with discovered keys.

We did not attempt to use any of the exposed pri-

vate keys, e. g.to authenticate at publicly reachable

services. Furthermore, for the ﬁnal database, only

the ﬁngerprints of private keys are kept to decrease

the risk of a potential data leak at our side and ensure

general anonymity regarding leaked keys.

Disclosure Process. Our approach follows the prin-

ciple of self-check, as users may check the conﬁden-

tiality of keys themselves. The more informative op-

tion is the registration of key ﬁngerprints and auto-

mated notiﬁcation in case of leakage. This practice

requires an upfront registration and storage of per-

sonal data, i. e.an email address for each key ﬁnger-

print, and, hence, abdicates anonymity.

ICSOFT 2022 - 17th International Conference on Software Technologies

654

Future Improvements. The current work focuses on

SSH keys as the most prevalent type of authentication

keys. However, the implementation allows abstrac-

tion and arbitrary key formats with public informa-

tion in the keys’ headers. Taking more key types into

account is a planned enhancement to achieve a well-

sorted database with a larger coverage.

The extension of the key ﬁngerprints’ database

currently relies on our own automated scans and user-

driven suggestions. In the future, this could be en-

hanced by direct contributions of leaked keys’ ﬁnger-

prints to the database. On the one hand, such a man-

ual insertion of leaked keys’ ﬁngerprints as a bulk by

trusted third parties could improve the coverage of our

service as system administrators could directly report

leaked keys, even if the keys are not yet observed on

the Internet. On the other hand, it is important to keep

a uniﬁed procedure and verify key leakage to ensure

the database’s validity.

We currently do not consider whether a leaked key

is still in use for authentication, but only register if it

has been exposed publicly. For gaining more insight

on the key leakage threat, a future enhancement could

be an extension for registered users to ﬂag their keys

as abandoned or revoked. This way, it would be ap-

parent if a leaked key is still in use, posing a security

risk, or this risk has been mitigated already.

Ultimately, a key testing API would allow integra-

tion in authentication services. During a key-based

authentication, an automated check of whether the

used key has been publicly exposed would be per-

formed using this API. If so, users and administrators

can be made aware and revoke leaked keys immedi-

ately. This way, our service would contribute to other

services’ security directly.

5 RELATED WORK

Several studies have examined the problem of secret

leakage in public source code repositories. The word

secret is used as a collective term for all kinds of cre-

dentials, including tokens, usernames, passwords, and

private keys, among others. Sinha et al. (Sinha et al.,

2015) focused on the problem of leakage of API to-

kens and suggested methods to prevent and handle

key data leakage. The study by Meli et al. (Meli

et al., 2019) in 2019 characterizes the prevalence and

extent of secret leakage in public GitHub reposito-

ries. Based on a snapshot and six-month recording of

newly committed ﬁles, their study showed that over

100,000 repositories were affected by secret leakage,

and thousands of new credentials got leaked daily.

These studies illustrate the still relevant issue of se-

cret leakage in public source code repositories.

GitHub has taken measures by including scan-

ning services to detect secrets in repositories (GitHub,

2021). Currently, this service notiﬁes service

providers, e. g., cloud providers, of leakage of issued

secret authentication tokens. Nonetheless, the GitHub

scanning approach does not automatically scan for all

secrets. The user needs to deﬁne custom patterns to

scan for other types of secrets that would include not

only API keys but also SSH private keys, client se-

crets, or generic passwords, as the deﬁnition of secret

by Saha et. al. (Saha et al., 2020) suggests.

In addition to the commercial secret scanning so-

lution of GitHub, a variety of open-source tools exist

designed to scan single code repositories for poten-

tial secrets. Three of the most popular ones are truf-

ﬂeHog (Ayrey, 2018), git-secrets (AWS Labs, 2019)

and gitrob (Henriksen, 2018), differing in their search

mechanisms (regular expressions, entropy checks,

and ﬁle extensions) and purposes (leak prevention or

detection). In contrast to these tools, shhgit (Price,

2019) is not targeted towards speciﬁc repositories but

scans the whole space of GitHub, GitHub Gist, Git-

Lab, and BitBucket repositories in real-time. The tool

was turned into a commercial solution in 2021. An-

other commercial solution for detecting leaked secrets

is GitGuardian which offers free services to small

teams and public repositories listed as GitHub orga-

nizations. While their solution can detect leaked se-

crets instantaneously, it is not open-source and does

not provide the ability to upload ﬁngerprints directly

to check their conﬁdentiality status. It is still feasible

to ﬁnd secrets in public source code repositories.

To prevent credential stufﬁng attacks by reducing

the number of active credentials leaked, the current

best-practice approach lets users check if their login

credentials appear in known data breaches. For this

purpose, several services for checking compromised

credentials have been developed (Li et al., 2019). The

service HaveIBeenPwned (Hunt, 2022) was launched

by Troy Hunt in 2013 and includes a large database

of over 10 billion compromised accounts and nearly

500 websites that suffered from a data breach. This

service allows users to check an email address or

password in real-time against the database and ac-

quire information on their exposure in any known

data breach. Moreover, it is possible to sign up for

an email notiﬁcation service that notiﬁes users when

their credentials are leaked in a data breach. Addi-

tionally, the service of HaveIBeenPwned is provided

as a public API integrated into password managers.

In a similar approach, the German Hasso Plattner In-

stitute (HPI) has developed a credential checking ser-

vice called HPI Identity Leak Checker which in con-

Automated Search for Leaked Private Keys on the Internet: Has Your Private Key Been Pwned?

655

trast to HaveIBeenPwned, does not provide a result

in real-time but notiﬁes the user via email to increase

conﬁdentiality (Hasso Plattner Institute, 2021).

In reaction to data breaches, Google has integrated

Google Password Checkup (Thomas et al., 2019) into

Google Chrome browser’s password manager, and

Apple released a similar feature for its built-in pass-

word manager iCloud Keychain.

Overall, there is a substantial beneﬁt of cre-

dential leak checkers and existing services focus

on password-based authentication. However, as

public-key authentication is a standard authentication

method, there is a need for a similar service dedicated

to private keys.

6 CONCLUSION

Leaked authentication keys are a threat to security and

should be revoked immediately. To act fast, it is of the

essence to ﬁnd out if a private key has been publicly

exposed as soon as possible. We have demonstrated

that scanning the Internet for leaked keys is one way

to achieve awareness regarding key leakage. After

building an initial database of publicly available se-

cret keys, we implemented a service for users to check

their keys while also administrators may use this ser-

vice to test their clients’ keys. However, we only store

the ﬁngerprints of discovered keys. The quality of our

implementation was measured to common standards.

We aim to achieve a collaboratively built database

of private authentication keys deemed insecure as

they have been revealed on the Internet with this

work. Therefore, the dataset can be extended by sub-

mitting URLs that we then scan for leaked keys. We

plan to continue this service and make its ﬁnal im-

plementation available after publishing this work to

allow a community-driven, ongoing extension of the

dataset and to be up-to-date so that users may check

their keys regularly.

REFERENCES

Avizienis, A., Laprie, J.-C., Randell, B., and Landwehr, C.

(2004). Basic concepts and taxonomy of dependable

and secure computing. IEEE transactions on depend-

able and secure computing, 1(1):11–33.

AWS Labs (2019). awslabs/git-secrets: Prevents you from

committing secrets and credentials into git reposito-

ries. https://github.com/awslabs/git-secrets. (Ac-

cessed on 27/05/2022).

Ayrey, D. (2018). TrufﬂeHog. https://github.com/dxa4481

/trufﬂeHog. (Accessed on 27/05/2022).

Brown, S. (2021). The C4 model for visualising software

architecture. https://c4model.com/. (Accessed on

27/05/2022).

GitHub (2021). GitHub Docs: About secret scanning. https:

//docs.github.com/en/code-security/secret-security/a

bout-secret-scanning. (Accessed on 27/05/2022).

Hasso Plattner Institute (2021). Identity Leak Checker. ht

tps://sec.hpi.de/ilc/. (Accessed on 27/05/2022).

Henriksen, M. (2018). michenriksen/gitrob: Reconnais-

sance tool for GitHub organizations. https://github.c

om/michenriksen/gitrob. (Accessed on 27/05/2022).

Hunt, T. (2021). Pwned Passwords, Open Source in the

.NET Foundation and Working with the FBI. https://

www.troyhunt.com/pwned-passwords-open-source-i

n-the-dot-net-foundation-and-working-with-the-fbi/.

(Accessed on 27/05/2022).

Hunt, T. (2022). Have I Been Pwned: Check if your email

has been compromised in a data breach. https://have

ibeenpwned.com/. (Accessed on 27/05/2022).

ISO/IEC (2011). ISO/IEC 25010:2011 Systems and soft-

ware engineering — Systems and software Quality

Requirements and Evaluation (SQuaRE) — System

and software quality models.

Li, L., Pal, B., Ali, J., Sullivan, N., Chatterjee, R., and Ris-

tenpart, T. (2019). Protocols for Checking Compro-

mised Credentials.

Lonvick, C. M. and Ylonen, T. (2006). The Secure Shell

(SSH) Authentication Protocol. RFC 4252.

Meli, M., McNiece, M. R., and Reaves, B. (2019). How Bad

Can It Git? Characterizing Secret Leakage in Public

GitHub Repositories. In NDSS.

Price, P. (2019). eth0izzle/shhgit: Ah shhgit! Find GitHub

secrets in real time. https://github.com/eth0izzle/shh

git/. (Accessed on 27/05/2022).

Python Cryptographic Authority (2022). Cryptography. ht

tps://cryptography.io/. (Accessed on 27/05/2022).

Saha, A., Denning, T., Srikumar, V., and Kasera, S. K.

(2020). Secrets in Source Code: Reducing False Pos-

itives using Machine Learning. In 2020 International

Conference on COMmunication Systems NETworkS

(COMSNETS), pages 168–175. ISSN: 2155-2509.

Sinha, V. S., Saha, D., Dhoolia, P., Padhye, R., and Mani, S.

(2015). Detecting and Mitigating Secret-Key Leaks

in Source Code Repositories. In 2015 IEEE/ACM

12th Working Conference on Mining Software Repos-

itories, pages 396–400. IEEE.

Thomas, K., Pullman, J., Yeo, K., Raghunathan, A., Kelley,

P., Invernizzi, L., Benko, B., Pietraszek, T., Patel, S.,

Boneh, D., and Bursztein, E. (2019). Protecting ac-

counts from credential stufﬁng with password breach

alerting. In USENIX Security Symposium. Google

LLC.

ICSOFT 2022 - 17th International Conference on Software Technologies

656