Introducing a Veriﬁed Authenticated Key Exchange Protocol

over Voice Channels for Secure Voice Communication

Piotr Krasnowski

1,2

, Jerome Lebrun

and Bruno Martin

Univ. C

ote d’Azur, I3S-CNRS, Sophia Antipolis, France

BlackBoxS

ecu, Sophia Antipolis, France

Keywords:

Authenticated Key Exchange, Secure Voice Communications, Data over Voice, Vocal Veriﬁcation, Crypto

Phone, Tamarin Prover, Formal Protocol Veriﬁcation.

Abstract:

Increasing need for secure voice communication is leading to new ideas for securing voice transmission. This

work relates to a relatively new concept of sending encrypted speech as pseudo-speech in audio domain over

existing civilian voice communication infrastructure, like 2G-4G networks and VoIP. Such a setting is more

universal compared to military “Crypto Phones” and can be opened for public evaluation. Nevertheless, secure

communication requires a prior exchange of cryptographic keys over voice channels, without reliance on any

Public Key Infrastructure (PKI).

This work presents the ﬁrst formally veriﬁed and authenticated key exchange (AKE) over voice channels

for secure military-grade voice communications. It describes the operational principles of the novel com-

munication system and enlists its security requirements. The voice channel characteristics in the context of

AKE protocol execution is thoroughly explained, with a strong emphasis on differences to classical store-

and-forward data channels. Namely a robust protocol has been designed speciﬁcally for voice channels with

double authentication based on signatures and Short Authentication Strings (SAS). The protocol is detailed

and analyzed in terms of fundamental security properties and successfuly veriﬁed in a symbolic model using

Tamarin Prover.

1 INTRODUCTION

An increasing concern of privacy violation in voice

communications has motivated the development of

secure voice over IP (VoIP) communicators, with

Telegram and Signal being the iconic examples

However, these applications are inherently insecure

against spying malware installed on the smart-phone

(Scott-Railton et al., 2017). Parallely, military-grade

applications requiring higher protection rely on ded-

icated hardware, most commonly in the form of

Crypto Phones. These closed and unveriﬁable solu-

tions suffer from high costs and low ﬂexibility, as typ-

ically encrypted phones allow communications exclu-

sively over one kind of a voice channel, like GSM.

The mentioned limitations encourage the search

for open solutions complementary to Crypto Phones,

combining ﬂexibility and high protection provided by

specialized hardware. A new idea, depicted on Fig.1,

is based on voice encryption in the audio domain. The

https://signal.org

https://core.telegram.org

Figure 1: Encrypted voice over voice channel scheme.

speech is recorded by (a) the headset’s microphone

and then forwarded to (b) the encryption device (here

called the Crypto Box). The Crypto Box processes

the speech and enciphers vocal parameters of the sig-

nal. The encrypted speech in the form of data stream

shaped into pseudo-speech audio signal is transmit-

ted by (c) the audio link to the audio input of (d) the

phone and sent through 2G-4G networks or VoIP. Fi-

nally, the received pseudo-speech is deciphered by the

paired Crypto Box on the other side of the channel.

In such a setting, voice encryption is performed

Krasnowski, P., Lebrun, J. and Martin, B.

Introducing a Veriﬁed Authenticated Key Exchange Protocol over Voice Channels for Secure Voice Communication.

DOI: 10.5220/0009156506830690

In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 683-690

ISBN: 978-989-758-399-5; ISSN: 2184-4356

683

outside of the phone, hence protecting against audio-

recording malware. To limit the risk of a system cor-

ruption, the Crypto Box has only analog input/output

interfaces to the headset and to the phone. However,

for security reasons, it is necessary that other analog

inputs of the phone (particularly the built-in micro-

phone) are blocked by a special case or removed.

From the system perspective, two Crypto Boxes

are the end-points of a secured voice domain. Every-

thing in between, including mobile phones itself, are

elements of a communication infrastructure that en-

ables voice transmission. The framework adds a new

layer of security, protecting against spying malware

installed on the phone. Since all communications be-

tween encrypting devices is done purely in the ana-

log domain, the selection of the speciﬁc voice com-

munication technology is therefore a secondary issue.

Compatibility with most of the vocal communication

methods, like VoIP applications or 2G-4G networks,

signiﬁcantly widens the range of usability scenarios.

The described setting, which is not intended for a

daily-usage, is of great interest for business, diplo-

matic and military services, who require secure com-

munications in an unreliable environment and without

the access to a conﬁdential communication infrastruc-

ture.

The major motivation in our approach is to secure

voice communication even with untrusted phones, as

these should not be actively involved in the setup of a

secure connection or store sensitive data. Instead, the

trust is given to Crypto Box manufacturers, respon-

sible for software implementation or update policy.

Though, the open framework enables various hard-

ware solutions, including combining the phone and

the Crypto Box into one device.

Producing encrypted speech in real-time appears

to be quite technically challenging. Firstly, the

recorded speech is encoded into the vocal parame-

ters in a similar manner as during speech compres-

sion. Later, the speech parameters are encrypted and

mapped onto the audio waveform. This technique,

called Data over Voice (DoV), proved its feasibil-

ity in practical scenarios (Katugampala et al., 2004;

Shahbazi et al., 2009; Dhananjay et al., 2010; Bian-

cucci et al., 2013). However, since voice channels are

designed to carry voice signal without much loss of

perceptual quality, which is a different goal than the

transmission of data, the achievable bitrate for DoV

typically is at most 2 kbps. Even in case of mod-

ern digital VoIP applications, the received voice is

much distorted compared to the input signal, mak-

ing the transmission resembling a communication

over highly distortive analog channel. Sending en-

crypted voice with such constraints is possible thanks

to strong error correction and voice compression by

coders like MELP or Codec2.

Secure speech enciphering requires a prior ex-

change of session keys between the Crypto Boxes.

Due to system requirements, the key exchange can

only be made through the same point-to-point voice

channel, which gives no practical possibility of

adding an online trusted third party (TTP) or a certiﬁ-

cate authority (CA). Such a limitation is a big concern

for users’ authentication.

Research on secure key exchange between two

honest parties without any TTP led to the creation

of standards suitable for VoIP applications, as an ex-

tension of the Real-Time Transport Protocol, called

ZRTP (Callas et al., 2011), and Multimedia Inter-

net KEYing (MIKEY) protocol (Arkko et al., 2004).

Especially ZRTP is interesting in the context of this

work, because it provides authentication mechanism

in the absence of any Public Key Infrastructure (PKI)

or a pre-shared secret. In these situations, authenti-

cation is based on vocally comparing Short Authen-

tication Strings (SAS). Unfortunately, having three

modes of operation and extensive negotiation signal-

ing, even ZRTP seems to be overly complex for com-

munication over voice channels. Moreover, none of

the protocols put a sufﬁcient emphasis on resistance

to strong message distortion or desynchronization in

low-bandwidth environment.

To the best of authors’ knowledge, this is the ﬁrst

paper focusing on authenticated key exchange (AKE)

protocols over voice channels. The work aims at

giving the understanding of the very speciﬁc chan-

nel constraints, leading to a protocol highly adapted

to voice channel characteristics and system require-

ments. The protocol provides double authentication

in a single mode of operation, by signatures and vo-

cal comparison of SAS. In addition, it is ﬂexible

enough to support authentication of user who did not

yet share the signing public keys between each other,

with SAS-only authentication or unilateral signature

authentication. Finally, the same protocol can be used

to authenticate the exchange of signing public keys.

2 SYSTEM REQUIREMENTS

The need for hardware-based voice encryption is a re-

sponse to an increased risk of being intercepted. Thus,

a cryptographic scheme should reﬂect higher require-

ments for secrecy and authentication. The ﬁrst con-

cern is recording and analyzing the network trafﬁc

by omnipresent passive eavesdroppers. Active adver-

saries controlling the network are more likely to block

or distort communication, which is technically very

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

684

simple. However, a powerful and knowledgeable ad-

versary who is able to analyze and synthesize a com-

patible pseudo-speech may try to modify a message or

insert his own. Finally, in critical situations, the en-

crypting device could be hijacked in order to extract

long-term keys. On the other hand, in our work we

assume that the encryption device does not allow any

intrusion into its internal memory during the opera-

tion, so all ephemeral data stored on the device (and

deleted after each protocol run) is considered secure.

A design process of the protocol is motivated by

an anticipated user experience. However, due to se-

vere constraints of the voice channel characteristics,

the biggest challenges are related to protocol com-

plexity, synchronization and robustness. A major bot-

tleneck is a large message round-trip time, around 2

seconds long, which causes the whole protocol run-

time prohibitively long even in case of simple proto-

cols. Another limitation is a very small bandwidth

implying a reduction of the message size. More-

over, the protocol has to be robust against fading

and signal distortion, requiring a signiﬁcant simpli-

ﬁcation of signalization and strong error correction

mechanisms. Finally, in order to decrease battery

power consumption, cryptographic operations should

be rather lightweight and optimized. When imple-

menting, relying on popular and veriﬁed network se-

curity libraries, like OpenSSL or NaCL, could be a

strong practical advantage.

Adaptation to hardware and channel constraints

should not lead to signiﬁcant relaxation of the security

level. It will be detailed that the key exchange proto-

col provides strong mutual agreement on the parame-

ters used for the derivation of the session key, putting

a special emphasis on preventing Man-In-The-Middle

(MITM) attacks and achieving Perfect Forward Se-

crecy (PFS). The crucial property of the protocol is to

enable the authentication of peers, no matter if they

share a common secret or not.

A successful and fast key exchange is an indicator

of sufﬁciently good channel conditions, that provide a

comfortable communication. Each received message

can be used to effectively estimate channel character-

istics and to improve decoding capabilities.

3 PROTOCOL DESCRIPTION

This section presents the symbolic model of the au-

thenticated key exchange protocol over voice chan-

nels and provides a brief explanation.

3.1 Preliminaries

Let us describe the key exchange between honest

users Alice and Bob who know each other, without

any legitimate trusted third party participating. The

operational framework requires that Alice and Bob

ﬁrst need to establish a non-encrypted voice connec-

tion with a preferred voice application. Then, they can

initiate a secure communication. The system model

assumes that identity information used to make a call

(phone number, user account, credentials etc.) is inde-

pendent from the authentic user identity and from the

identiﬁcation number of the voice encryption hard-

ware. Only one running session at a time is possi-

ble since each device cannot process more than one

message simultaneously. Therefore, several kinds of

Denial-of-Service (DoS) attacks, when the adversary

tries to send multiple messages to a recipient, are not

effectively different than distorting or blocking the

channel.

In highly unreliable channels like voice channels,

Alice and Bob are never sure of message delivery.

Thus, several synchronization techniques are needed,

i.e. repeat requests, retransmissions and time-outs.

For simplicity and space limitations, most details on

synchronization will be omitted here. Additionally,

thanks to strong error-detection coding, users are able

to detect random channel errors and differentiate them

from intentional malicious manipulations.

3.2 Symbolic Model of the Protocol

The proposed protocol, that is presented on Fig-

ure 2 next page, relies on Ephemeral (Elliptic-

Curve) Difﬁe-Hellman (EC)DHE exchange (Hanker-

son et al., 2005), authenticated by signatures (existen-

tially unforgeable and deterministic) or Short Authen-

tication Strings. Before the protocol starts, Alice and

Bob agree on the elliptic curve and the lengths of keys

and nonces. Public veriﬁcation keys should be pro-

vided to the recipients in an authenticated way before

the communication starts and are stored in the Crypto

Box address book. However, in many real scenarios

it is not possible to properly provide such a veriﬁca-

tion key. If the signature cannot be veriﬁed by the

recipient, the protocol offers vocal veriﬁcation as an

alternative, which authenticates the speakers and the

parameters used to derive the current session key.

The protocol interaction consists of several steps:

the setup, the key exchange and authentication, the

protocol acknowledgement and the optional vocal

veriﬁcation. Table 1 contains the glossary of terms

used in the protocol speciﬁcation, along with their bit-

lengths.

Introducing a Veriﬁed Authenticated Key Exchange Protocol over Voice Channels for Secure Voice Communication

685

Table 1: Glossary.

Acronyms Deﬁnitions Bits

ﬁxed user identiﬁer 32

random and unique

nonce

Session Key 256

SAS

Short Authentication

String

, R

)

Short Authentication

String seeds

(128, 32)

, Q

)

secret/public ECDHE

key pair

(256, 256)

, V

)

signing/veriﬁcation

key pair

(256, 256)

Sign

(·)

signature (signed

with S

)

256

(·) hash function X

Setup: The negotiation stage has been considerably

simpliﬁed. Participants have to mutually agree

on starting the key exchange procedure, therefore

the actual key exchange protocol is preceded only

by fast and automatic role negotiation in order to

prevent mutual interference or logjams. Then, both

Alice and Bob choose a random private integer d, a

random and unique nonce N, a random value R and

compute a public key Q. Unique nonce guarantees

the uniqueness of the triple (ID, Q, N).

Key Exchange and Authentication: In this stage

Alice and Bob exchange values that are used to

derive the Session Key (K

) and the SAS. Alice sends

her public ID, the nonce, the ephemeral public key

and the hash, with R

included. Bob responds with

his values, appends R

, and additionally sends his

signature over all sent parameters required for K

calculation. Alice answers with her signature over

the same data and ﬁnally reveals R

. It is worth

noting that the protocol permits a situation when the

signature cannot be veriﬁed. If any of the recipients

did not obtain a veriﬁcation key corresponding to the

sender’s ID, the signature is checked against channel

errors but not processed further.

Protocol Acknowledgment: When all cryptographic

parameters are exchanged, voice encryption can be

started. Encryption is initiated after a reception of

Bob’s acknowledgment by Alice. The acknowl-

edgment is a conﬁrmation of error-less message

reception, so can be non-encrypted.

Short Authentication String Comparison: Each

participant can request for vocally challenging SAS

equality with the peer. SAS comparison is obligatory

. . . . . . . . . . . . . . . Unsecured call initiation . . . . . . . . . . . . . . .

Alice

vocal agreement on

←−−−−−−−−−−−−−→

protocol initialization

Bob

. . . . . . . . . . . . . . . . . . . . . . . . Setup . . . . . . . . . . . . . . . . . . . . . . . .

1 : N

←$ Z

∗

A/B role

←−−−−−−−−−−−−−→

negotiation

←$ Z

∗

2 : d

←$ Z

∗

256

←$ Z

∗

256

3 : Q

= d

G Q

= d

4 : R

←$ Z

∗

128

←$ Z

∗

. . . . . . . . . . . Key exchange and authentication . . . . . . . . . . .

5 :

, N

, Q

−−−−−−−−−−−−−−−→

128

(ID

)

6 :

, N

, Q

, R

←−−−−−−−−−−−−−−−

Sign

(N)

7 : Z = d

, Sign

(H)

−−−−−−−−−−−−−−−→ Z = d

8 : K

= h

256

(Zk•) K

= h

256

(Zk•)

. . . . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . . . . . .

9 :

ACK

←−−−−−−−−−−−−−−−

. . . . . . . SAS comparison over Encrypted Channel . . . . . . .

10 : SAS = h

()

SAS vocal

←−−−−−−−−−−−−−→

comparison

SAS = h

()

Symbols:

N ≡ ’B’kID

kID

H ≡ ’A’kID

kID

• ≡ ID

kID

 ≡ R

kID

Figure 2: Key exchange protocol over voice channels.

if any of the users was not able to verify the signature.

It is assumed that the comparison process is authen-

ticated - users are able to recognize voice character-

istics of the peer (timbre, tempo, etc.). The SAS is

displayed on the Crypto Box as a short string of digits

or words to be vocally uttered by the users.

4 FORMAL VERIFICATION

Veriﬁcation of the protocol is performed in a sym-

bolic model, where all cryptographic primitives are

assumed perfect and give the adversary no advantage

(Dolev and Yao, 1983). In the analyzed scenario, it

means that all parties generate truly random numbers,

signatures are unforgeable and ECDH parameters do

not reveal any secret information. Formal symbolic

veriﬁcation can be considered as a ﬁrst step of a proto-

col analysis, paving the way to computational model

veriﬁcation (Goldwasser and Micali, 1984; Blanchet,

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

686

2012), in which the adversary gets the power to attack

cryptographic algorithms.

A formal analysis in a symbolic model of the

proposed protocol was done with Tamarin Prover

(Schmidt, 2012; Meier, 2013; Meier et al., 2013), a

powerful and increasingly popular automatic veriﬁca-

tion tool designed at ETH Z

urich. Tamarin is based

on multiset rewriting (MSR) language and supports

generic Difﬁe-Hellman group operations. In addi-

tion, Tamarin can model many cryptographic prim-

itives like signatures or hashes and offers an impres-

sive database of examples, that makes the tool suitable

for the protocol evaluation.

4.1 Protocol Modeling

Veriﬁcation by Tamarin implies providing an abstract

protocol model, which tries to faithfully express rel-

evant information from the security perspective, but

still being within the scope of feasibility of the analy-

sis. The protocol model code can be found in (Kras-

nowski, 2019). Several protocol restrictions were re-

laxed, allowing users to run multiple protocol instan-

tiations at the same time and to “forget” the veriﬁca-

tion key of the peer. SAS veriﬁcation is performed by

a separate protocol rule which is not obligatory, sim-

ulating a realistic case when users simply ignore it.

Vocal challenging is modeled as communicating over

an authenticated (not secret) channel, that is a chan-

nel which the adversary can intercept but not modify.

Last ACK message is skipped.

4.2 Security Properties and Veriﬁcation

Results

The protocol model was checked against the Dolev-

Yao adversary (Dolev and Yao, 1983), having a full

control over the network and with the power to reveal

the long-term secret key of any user (ephemeral data

is considered secure). Evaluation was done in four au-

thentication conﬁgurations: mutual signature authen-

tication between two honest users, unilateral signature

authentication (when only one user can verify the sig-

nature of the peer), vocal veriﬁcation or no authenti-

cation.

Veriﬁcation focused on most critical security

properties: (perfect forward) secrecy and a mutual in-

jective agreement (Lowe, 1997) on the Session Key.

The protocol was also veriﬁed for resilience to reﬂec-

tion attacks (a user cannot accept her own identity as a

peer) and for signing key compromise impersonation

(adversary can impersonate only corrupted users).

Results of protocol veriﬁcation can be found in

Table 2. Protocol conﬁgurations involving signature

Table 2: Security properties veriﬁed by Tamarin in four au-

thentication scenarios: (a) mutual signature authentication,

(b) unilateral signature authentication, (c) SAS vocal veriﬁ-

cation and (d) no authentication.

Authentication

scenario:

(a) (b) (c) (d)

Session Key secrecy 3 3 3 7

forward secrecy 3 3 3 7

injective agreement 3 3 3 7

reﬂection attack 3 3 7 7

key compromise

impersonation

3 3 - -

authentication or authenticated SAS comparison are

proven to provide perfect forward secrecy and injec-

tive agreement. Unilateral signature authentication

between two honest users who know each other guar-

antees the same level of security as mutual signature

authentication. Surprisingly, vocal veriﬁcation does

not protect against reﬂection attack, because the user

can trivially compare SAS with herself. Table results

indicate the importance of authentication - none of the

properties were veriﬁed if no authentication was per-

formed.

5 SECURITY CONSIDERATIONS

The following sections explain in more detail the

protocol characteristics, providing several justiﬁca-

tions and practical recommendations. It starts from

the overview of fundamental protocol elements: the

choice of public-based cryptography, the role of sig-

natures and of Short Authentication Strings. Next sec-

tion enlists potential protocol weaknesses and possi-

ble ﬁxes.

5.1 Discussion

Public Key Agreement versus Symmetric Cryp-

tography: In exceptionally constrained resource

devices, such as IoT sensors or RFID cards, a pursue

for ultra-lightweight key exchange protocols led to

the shift from the public key encryption towards

symmetric encryption techniques (Echevarria et al.,

2016; Lee et al., 2014; Baashirah and Abuzneid,

2018). Even the ZRTP protocol offers a possibility of

a key exchange in a lightweight preshared mode. In

this conﬁguration, two entities share a secret which

is used to encrypt or refresh the keying material for

the new session. In order to achieve Perfect Forward

Secrecy, the long-term secret should be regularly

updated, desirably after each successful key exchange

run. The update decision has to be mutual, otherwise

Introducing a Veriﬁed Authenticated Key Exchange Protocol over Voice Channels for Secure Voice Communication

687

risking one-side update and user desynchronization.

Unfortunately, in voice channels such a risk cannot

be eliminated, because the last update conﬁrmation

message may not be delivered. Decreasing the chance

of desynchronization by sending more conﬁrma-

tion messages would negatively affect the protocol

run-time. Another solution, based on on-the-ﬂy

resynchronization mechanisms requires an online

server keeping the track of all key updates or a costly

and potentially unsecure “guessing” the long-term

parameters until decryption is successful (Baashirah

and Abuzneid, 2018). Finally, as was emphasized

before, in some scenarios the exchange of long-term

secret is not possible, limiting the usability of sym-

metric cryptography. In the light of above-mentioned

reasons and relatively smaller hardware restrictions

compared to IoT sensors, public-based key exchange

scheme seems more adequate.

Role of Short Authentication Strings: If the key ex-

change is not interfered by a third party, both partic-

ipants obtain the same Short Authentication String.

Challenging SAS vocally between honest users has a

twofold role. Firstly, it enables authentication of users

based on voice identiﬁcation. Secondly, the inequality

of codes may indicate the presence of an active MITM

adversary. However, MITM manipulations would be

undetected if the adversary is somehow able to inﬂu-

ence or precompute the SAS value before the users.

Computation of the code depends on seed values

and R

chosen randomly by honest users. Impor-

tantly, Alice and Bob are forced to select seeds before

knowing the value of the peer - Alice by sending the

hash of R

in the ﬁrst message and Bob by revealing

before R

. Such construction, inspired by (Pasini

and Vaudenay, 2006), prevents adaptive selection of

seeds by each party. The same rule applies to the ad-

versary, who cannot predict the SAS value until it is

too late. The only hope for him is a random guess with

a low probability of a success, or an extraction of R

from the hash sent in the ﬁrst message by brute force

search. For this last reason, the length of R

should

be considerably larger than R

. On the other hand, the

difference of lengths is partially compensated by tak-

ing Q

as an additional input of the hash function.

It is worth noting, that SAS value does not have to be

conﬁdential, since it plays only the authentication role

and cannot be modiﬁed without detection.

In practice, the security of vocal veriﬁcation

depends also on how users abide to it. The SAS

could be represented by a smaller number of simple

pictographs or easily pronounceable words, the same

way as in the ZRTP which has the PGP Word List

incorporated into its framework (Callas et al., 2011;

Zimmermann, 1996). The device should encourage

the mutual SAS comparison by indicating a part of

the SAS to pronounce and a part to hear from the peer.

Signature-based Authentication: Signatures pro-

vide device authentication and message integrity, sim-

ilarly to message authentication codes (MAC), which

are simpler and easier to compute. Indeed, in some

scenarios choosing hash-based MAC instead of sig-

natures would be sufﬁcient. However, signatures give

wider ﬂexibility, justifying higher computational cost.

A natural advantage of signatures is that they do not

require mutual agreement and secure exchange of a

long-term secret between two parties. Moreover, each

user keeps in memory only one private signing key,

used regardless of the receiver’s identity. In conse-

quence, if the user is corrupted, the adversary should

be able to impersonate only that person.

When one user cannot obtain a veriﬁcation key

due to insecure environment, it is still possible to

achieve unilateral authentication (Boyd and Mathuria,

2003; Maurer et al., 2013; Dodis and Fiore, 2017).

One-side authentication prevents MITM attacks, leav-

ing only two possibilities: both honest users securely

exchange a secret or the adversary is an authenticator

(Maurer et al., 2013). It naturally implies that if the

users want to communicate and they know they can

perform unilateral authentication, the adversary can-

not interfere undetected in another way than prevent-

ing the successful exchange. However, the user who

failed to authenticate the peer is still complied to chal-

lenge the SAS, because from her perspective it is the

only formal way to verify the absence of the MITM

manipulations.

A signature key management policy, due to a lack

of any PKI infrastructure, has a crucial impact on

the system security and usability. This work points

out two possible schemes, decentralized and fully

centralized, which can be chosen depending on the

needs. In a centralized system, keys are managed by

an ofﬂine central authority, keeping the track of all

records and being responsible for key distribution

and update. In a decentralized case, each user is

entitled to generate her own key pair and distribute

public keys to speciﬁc users in authenticated way.

Following the PGP model, sharing the key can be

performed remotely based on speaker identiﬁcation

and vocal authentication of the channel. Thus, the

proposed protocol with SAS comparison gives the

possibility to authenticate the exchange of signature

veriﬁcation keys.

Identity Protection: In many situations protecting

the identity of the user is as important as securing the

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

688

content of the speech. However, calling anybody with

a civilian communication networks is always associ-

ated with revealing user metadata (i.e. phone num-

ber, user credentials, location). Even if the metadata

is publicly known, it may be advantageous to at least

hide the identity of the encrypting device from passive

eavesdroppers.

It is possible to redesign the proposed protocol to

attain identity anonymity without the change of any

other substantial protocol property. The ID and the

signatures of Alice and Bob can be sent encrypted

with the key derived from DH secret exchanged dur-

ing ﬁrst message round-trip, in a similar manner as

in the Initial Exchange of IKEv2 standard (Kaufman

et al., 2010). The complexity of a protocol providing

anonymity would increase, since it will require addi-

tional data encryption. It is also important to care-

fully evaluate the way the encryption key is derived

and how it is related to the the session key, in order to

give no foothold for cryptanalysis.

5.2 Possible Attacks and Threats

Many protocol vulnerabilities are focusing on the se-

lection of speciﬁc cryptographic algorithms, its im-

plementation and ﬁnally on compliance to protocol

rules. The biggest threat is posed by not respecting

the obligation of SAS comparison by real users, open-

ing a space for MITM attacks.

The capabilities of modern speech synthesizers

which exploit AI techniques to impersonate speaker’s

voice (Gao et al., 2018) question the level of authen-

tication provided by voice recognition. Instead of

breaking the SAS security, the adversary may simply

simulate or replay the speaker pronouncing the code

(Shirvanian et al., 2018). The risk is ampliﬁed by the

fact that the voice sent is highly compressed and thus

signiﬁcantly differs from its real characteristics. For

this reason, it is recommended to extend sequence

comparison by contextual questions, like describing

the last watched movie, or to share personal informa-

tion known only by the peer but not by the adversary.

If honest users are capable to verify signatures of

each other and of achieving strong authentication, the

adversary may try a downgrade-attack. It can be done

simply by modifying users’ ID and imposing vocal

veriﬁcation. The problem may be partially solved by

displaying the IDs along the SAS. However, the real

solution would be to force signature veriﬁcation dur-

ing each protocol by default.

Finally, the proposed protocol cannot protect

against the consequences of a device being stolen or

misused, giving the responsibility to the manufacturer

to provide strong enough password or biometric pro-

tection. To minimize the negative consequences of a

theft, the device should be protected against physical

tampering, making reverse engineering very difﬁcult.

6 CONCLUSIONS

Our work is the ﬁrst attempt to resolve the problem of

cryptographic key exchange over voice channels for

military-grade secure voice communications. It also

introduces challenges related to secure communica-

tions over voice channels like extremely small band-

width, no guarantee of message delivery and the is-

sue of battery consumption. The paper lists the secu-

rity requirements posed to the system, like protecting

against interception and MITM attacks, emphasizing

the importance of user authentication in absence of a

trusted server. All these concerns and limitations jus-

tify the need of a dedicated protocol instead of relying

fully on standardized solutions.

We proposed a simpliﬁed key exchange protocol

between two honest parties which is based on the

ephemeral elliptic curve Difﬁe-Hellman (ECDHE)

protocol. The protocol offers two ways of authen-

tication - by signatures and by vocally challenging

the equality of the Short Authentication Strings. A

symbolic model of the protocol was analyzed using

Tamarin Prover in order to verify the crucial secu-

rity properties as Perfect Forward Secrecy and mutual

agreement on the Session Key. The process of the ver-

iﬁcation was explained, pointing out the limitations

of a symbolic analysis, particularly model simpliﬁca-

tions and perfect cryptography assumption.

Formal veriﬁcation was followed by the discus-

sion of the protocol properties, like unilateral authen-

tication provided by one-side signature veriﬁcation or

the role of vocal comparison in preventing MITM at-

tacks. The analysis led to the observation that all ana-

lyzed techniques in themselves do not provide perfect

authentication, thus informal identity authentication

methods has to be introduced.

Potential vulnerabilities and attacks on the sys-

tem were also covered in this work. Several propo-

sitions and practical solutions regarding key manage-

ment, proper SAS comparison or identity protection

can serve as a guide for engineers working on the im-

plementations of exchange protocols over voice chan-

nels or in similar scenarios.

ACKNOWLEDGEMENTS

This work is supported by grant DGA Cifre-Defense

program No 01D17022178 DGA/DS/MRIS.

Introducing a Veriﬁed Authenticated Key Exchange Protocol over Voice Channels for Secure Voice Communication

689

REFERENCES

Arkko, J., Carrara, E., Lindholm, F., Norrman, K.,

and Naslund, M. (2004). Mikey: Multimedia

Internet KEYing (Proposed standard RFC3830).

Network Working Group. Retrieved from

https://tools.ietf.org/html/rfc3830.

Baashirah, R. and Abuzneid, A. (2018). Survey on promi-

nent RFID authentication protocols for passive tags.

Sensors, 18(10):3584.

Biancucci, G., Claudi, A., and Dragoni, A. F. (2013). Se-

cure data and voice transmission over GSM voice

channel: Applications for secure communications.

In 2013 4th International Conference on Intelligent

Systems, Modelling and Simulation, pages 230–233.

IEEE.

Blanchet, B. (2012). Security protocol veriﬁcation: Sym-

bolic and computational models. In Proceedings of

the First international conference on Principles of Se-

curity and Trust, pages 3–29. Springer-Verlag.

Boyd, C. and Mathuria, A. (2003). Protocols for authenti-

cation and key establishment, volume 1. Springer.

Callas, J., Johnston, A., and Zimmermann, P. (2011).

ZRTP: Media path key agreement for unicast se-

cure RTP (RFC6189). IETF. Retrieved from

https://tools.ietf.org/html/rfc6189.

Dhananjay, A., Sharma, A., Paik, M., Chen, J., Kuppusamy,

T. K., Li, J., and Subramanian, L. (2010). Hermes:

data transmission over unknown voice channels. In

Proceedings of the sixteenth annual international con-

ference on Mobile computing and networking, pages

113–124. ACM.

Dodis, Y. and Fiore, D. (2017). Unilaterally-authenticated

key exchange. In International Conference on Finan-

cial Cryptography and Data Security, pages 542–560.

Springer.

Dolev, D. and Yao, A. (1983). On the security of public key

protocols. IEEE Transactions on information theory,

29(2):198–208.

Echevarria, J. J., Legarda, J., Larra

naga, J., and Ruiz-

de Garibay, J. (2016). lwAKE: A lightweight Au-

thenticated Key Exchange for class 0 devices. In-

ternational Journal of Distributed Sensor Networks,

12(5):6236494.

Gao, Y., Singh, R., and Raj, B. (2018). Voice imperson-

ation using generative adversarial networks. In 2018

IEEE International Conference on Acoustics, Speech

and Signal Processing (ICASSP), pages 2506–2510.

IEEE.

Goldwasser, S. and Micali, S. (1984). Probabilistic en-

cryption. Journal of computer and system sciences,

28(2):270–299.

Hankerson, D., Menezes, A. J., and Vanstone, S. (2005).

Guide to elliptic curve cryptography. Computing Re-

views, 46(1):13.

Katugampala, N., Al-Naimi, K., Villette, S., and Kondoz,

A. (2004). Real time data transmission over GSM

voice channel for secure voice & data applications.

In 2nd IEE Secure Mobile Commmunications Forum:

Exploring the Technical Challenges in Secure GSM

and WLAN. IET.

Kaufman, C., Hoffman, P., Nir, Y., and Eronen, P. (2010).

Internet Key Exchange Protocol Version 2 (IKEv2)

(RFC7296). IETF.

Krasnowski, P. (2019). Tamarin code of the key

exchange over voice channels. Available at

https://github.com/PiotrKrasnowski/AKE over Voice.

Lee, J.-Y., Lin, W.-C., and Huang, Y.-H. (2014). A

lightweight authentication protocol for internet of

things. In 2014 International Symposium on Next-

Generation Electronics (ISNE), pages 1–2. IEEE.

Lowe, G. (1997). A hierarchy of authentication speciﬁca-

tions. In Proceedings 10th Computer Security Foun-

dations Workshop, pages 31–43. IEEE.

Maurer, U., Tackmann, B., and Coretti, S. (2013). Key Ex-

change with Unilateral Authentication: Composable

security deﬁnition and modular protocol design. IACR

Cryptology ePrint Archive, 2013:555.

Meier, S. (2013). Advancing automated security protocol

veriﬁcation. PhD thesis, ETH Zurich.

Meier, S., Schmidt, B., Cremers, C., and Basin, D.

(2013). The TAMARIN prover for the symbolic anal-

ysis of security protocols. In International Confer-

ence on Computer Aided Veriﬁcation, pages 696–701.

Springer.

Pasini, S. and Vaudenay, S. (2006). SAS-based authenti-

cated key agreement. In Public Key Cryptography -

PKC 2006, pages 395–409. Springer Berlin Heidel-

berg.

Schmidt, B. (2012). Formal analysis of key exchange proto-

cols and physical protocols. PhD thesis, ETH Zurich.

Scott-Railton, J., Marczak, B., Razzak, B. A., Crete-

Nishihata, M., and Deibert, R. (2017). Reckless ex-

ploit: Mexican journalists, lawyers, and a child tar-

geted with NSO spyware. Technical report.

Shahbazi, A., Rezaie, A. H., Sayadiyan, A., and

Mosayyebpour, S. (2009). A novel speech-like sym-

bol design for data transmission through GSM voice

channel. In 2009 IEEE International Symposium on

Signal Processing and Information Technology (IS-

SPIT), pages 478–483. IEEE.

Shirvanian, M., Saxena, N., and Mukhopadhyay, D. (2018).

Short voice imitation man-in-the-middle attacks on

Crypto Phones: Defeating humans and machines.

Journal of Computer Security, 26(3):311–333.

Zimmermann, P. (1996). PGPfone Owner’s Man-

ual. Pretty Good Privacy. Retrieved from

http://web.mit.edu/network/pgpfone/manual/.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

690