
payment card, or user device (although he may need
to consult his user device to verify m, depending on
how it has been implemented). Furthermore, since
the user is identified as part of the process, any rele-
vant status checks can be made automatically against
the information held on record. This means that the
user does not need to carry loyalty cards, discount
coupons, or proof of age or membership—as these
can all be applied upon identification.
For security, we have assumed that seeded ran-
dom string generators on the user device and the ver-
ifier generate a fresh m every minute. To improve us-
ability, an implementation of the scheme might con-
sider ways to increase the size of the interval between
changes of m to free the user from the requirement
that he must carry a user device during the authenti-
cation phase. This could include the use of message
templates, along with shapes or colours to increase the
message space, so that what needs to be memorised is
more user-friendly, rather than a random string. For
example, the system might allow the user to create or
select a rule pertaining to the structure of an expected
message that is valid for a day, then the verifier would
randomly generate a fresh string every transaction that
satisfies the rule so that the user only needs to ver-
ify that it fits the template (e.g., ‘a valid 5-letter word
followed by a green triangle’). The user would then
memorise the rule before a shopping session and not
need to further consult his user device. The adversary
could replay such a message to perform a phishing at-
tack, but not at scale, so the gains in usability may be
worth the risks to security.
Asymmetrical Channel. The visual channel enables
asymmetric communication, as the capabilities re-
quired for sending information are different to those
required for receiving it. Each party can either dis-
play to or read from the channel depending on its ca-
pabilities. This means that there are constraints on
what each party can do to each other and parts of the
system can be restricted to unidirectional communi-
cation. Our scheme leverages this property in Steps 1
and 2, where the user presents his biometric trait and
the terminal can only read it, and in Steps 10 and 11,
where the terminal displays m
′
and the user can only
choose whether or not to verify it.
Contextual Awareness. The capabilities that can be
used to read information from the visual channel can
also collect incidental information from the surround-
ing environment. Depending on its position, the cam-
era on the terminal can capture additional information
around the user that could be used to facilitate ad-
vanced fraud detection techniques, such as verifying
that the terminal is operating in the expected environ-
ment. An implementation of the scheme might lever-
age this property by passing an image of the scene to
the verifier; expected objects, markers, or lighting ef-
fects could be placed in the environment as a form
of signature, or a clock could be placed in the en-
vironment such that the time captured in the image
could be extracted and cross-checked with the times-
tamps used as nonces to strengthen the assertion of
freshness with an independent factor. A sophisticated
adversary could still fabricate the entire environment,
but each step would increase the effort required of the
adversary and present a potential point of failure for
an attack.
Privacy Risk Mitigation. The use of a vi-
sual channel—especially when collecting peripheral
information—poses a risk to the privacy of the user.
Any images sent to the verifier should have their util-
ity weighed against their potential impact on privacy.
Countermeasures to mitigate privacy leakage from
images include reducing the resolution and blurring
unnecessary details before sending. To protect the
biometric data of the user, biometric traits should only
be processed locally on the terminal and should be ob-
scured from any images sent to the verifier.
6 RELATED WORK
Identification-based Systems. With regard to the use
of biometric identification as part of an authentica-
tion system, some payment providers have trialled the
technique with the promise of improved convenience
for the user. Smile-to-Pay (Lee, 2017), developed by
Ant Financial for AliPay, uses a 3D camera to cap-
ture the user’s facial likeness, perform liveness de-
tection, and identify the user within 2 seconds. The
system then sends a verification request to the user’s
smartphone that requires a timely response to verify
the match. Biometric Checkout Program (Mastercard,
2022), developed by Mastercard, operates in a similar
manner, allowing the user to identify himself to the
terminal over a visual channel using either his face
or palm. Both of these systems require a user-to-
verifier connexion to verify the match. To the best of
our knowledge, we are the first to propose the use of
biometric identification to facilitate mutual authenti-
cation and to do so without requiring a user-to-verifier
connexion.
Visual Channel. With regard to the use of a visual
channel, some existing mobile payment systems have
explored the use of a QR code to pass information
between a user device and the terminal. In Yoyo Wal-
let (Yoyo, 2017), the user must first authenticate to
a smartphone application using a PIN and can then
access a QR code that contains tokenised payment in-
A Mobile Payment Scheme Using Biometric Identification with Mutual Authentication
583