Speeding Up the Computation of Elliptic Curve Scalar Multiplication
based on CRT and DRM
Mohammad Anagreh
1,2
, Eero Vainikko
1
and Peeter Laud
2
1
Institute of Computer Science, University of Tartu, J. Liivi 2, Tartu, Estonia
2
Cybernetica, M
¨
aealuse 2/1, Tallinn, Estonia
Keywords:
ECC, Parallel Computing, CRT, DRM.
Abstract:
In this paper, we study the parallel implementations of elliptic curve scalar multiplication over prime fields
using signed binary representations. Our implementation speeds up the calculation of scalar multiplication
in comparison with the standard case. We introduce parallel algorithms for computing elliptic curve scalar
multiplication based on representing the scalar by the Complementary Recoding Technique (CRT) and the
Direct Recording Method (DRM). Both implementations of the proposed algorithms show speed-ups reaching
up to 60% in comparison with execution time for sequential cases of the algorithms. We find that ECC-DRM
is faster than ECC-CRT in both parallel and sequential counterparts.
1 INTRODUCTION
Elliptic curve cryptosystems (ECC) were indepen-
dently proposed by Koblitz (Koblitz, 1987) and Miller
(Miller, 1986). They are widely used in many cryp-
tographic primitives and protocols such as asymmet-
ric encryption, digital signature and key exchange.
One of the most important advantages of ECC is its
suitability for using it in case of limited memory re-
sources, such as portable devices, because it has a
shorter key size. ECC shows a high-level of security
with shorter key sizes in comparison with other ex-
isting algorithms like RSA (Rivest et al., 1978). The
minimum key size of the ECC is 160-bits having the
same security level as a standard key size of RSA of
1024-bits (Gura et al., 2004). Computing the scalar
multiplication is an expensive operation in the ellip-
tic curve cryptosystem. Elliptic curve scalar multipli-
cation is the operation of successively adding an EC
point along an elliptic curve to itself d times repeat-
edly: Q = dP, where P = (x,y) is a given point on the
elliptic curve. The multiplication algorithms typically
consider the binary representation of d. Therefore,
many researchers have focused to enhance the calcu-
lation of scalar multiplication by proposing new re-
lated algorithms such as signed binary representation,
as well as by enhancing the calculation method itself
such as using a parallel calculation. The Hamming
Weight (HW) of a (signed) binary representation of d
is the number of non-zero bits in it. The number of
adding and doubling operations on an elliptic curve
scalar multiplication is based on the length n of the
binary representation of d.
Reducing the number of non-zero bits in the scalar
representation d will reduce the number of adding
operations in the ECC scalar multiplication. There-
fore, lower HW is preferred to be used in the ECC
scalar multiplication. Several researchers have pro-
posed new methods to convert the binary representa-
tion to some signed binary representation in order to
reduce the Hamming Weight of the representation of
d. These representations are Mutual Opposite Form
(MOF) (Okeya et al., 2004), Joint Sparse Form (JSF)
(Solinas, 2001), Non-Adjacent Form (NAF) (Booth,
1951). In this paper, we consider Complementary
Recoding Technique (CRT) (Balasubramaniam and
Kathikeyan, 2007), which enhanced by Direct Recod-
ing method (DRM) (HK and Sanghi, 2010) and other
methods (Huang et al., 2010). On the other hand,
there are several methods proposed to accelerate the
calculation of the ECC scalar multiplication by paral-
lel computing (Azarderakhsh and Reyhani-Masoleh,
2015) (Asif and Kong, 2017) (Gutub, 2010).
In this paper, we propose algorithms to acceler-
ate the performance of computing elliptic curve scalar
multiplication by parallelizing the scalar multiplica-
tion algorithm. The proposed algorithms are based
on combining the Add-subtract scalar multiplication
algorithm and transforming the scalar d from the bi-
nary representation to the signed binary representa-
176
Anagreh, M., Vainikko, E. and Laud, P.
Speeding Up the Computation of Elliptic Curve Scalar Multiplication based on CRT and DRM.
DOI: 10.5220/0009129501760184
In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 176-184
ISBN: 978-989-758-399-5; ISSN: 2184-4356
Copyright
c
2022 by SCITEPRESS Science and Technology Publications, Lda. All rights reserved
tion. One of our algorithms makes use of the Comple-
mentary Recoding Technique (CRT), while the other
one is based on the Direct Recoding method (DRM).
For both representations, we consider different ways
of scheduling the computation on two processors. Our
implementation of the two algorithms shows that the
proposed methods are faster than the sequential cal-
culation of the ECC scalar multiplication.
This paper is organized as follows: Section 2
briefly presents the preliminaries. Section 3 shows
some related work while section 4 is the proposed
work and the algorithms. Section 5 shows the re-
sults and presents the experiments. The last section
concludes the proposed method and discusses future
work.
2 PRELIMINARIES
2.1 Elliptic Curves over Prime Fields F
p
In this paper, we focus on the curves over prime fields
F
p
. These curves are defined through the cubic equa-
tion as identified in Equation (2) with Cartesian co-
ordinate variables (x, y) and coefficients (a, b) as ele-
ments of F
p
. All the values can be considered integers
that are computed modulo the prime number p. The
cubic equation with coefficients (a, b) and variables
(x,y) for the elliptic curves over F
p
is the following:
y
2
= (x
3
+ ax + b) mod p (1)
let the point P = (x
1
,x
2
) and point Q = (x
2
,y
2
) be in
the elliptic curve over F
p
, defined by the coefficients
(a,b). In addition, let O be the point at infinity. The
rules for addition operation in the EC is as follows:
P + O = P
(2)
Given point P and point Q, if x
1
= x
2
and y
2
= y
1
then
P + Q = 0
(3)
In general, R = Q + P, where the result R = (x
3
,y
3
) is
defined as follows:
x
2
= λ
2
x
1
x
2
mod p
(4)
y
3
= λ(x
1
x
3
) y
2
mod p
(5)
λ =
y
2
y
1
x
2
x
1
mod p, if P 6= Q
3x
2
1
+a
2y
1
mod p, if P = Q
(6)
In summary, for any two points P, Q on a given el-
liptic curve, there are two main operations. The op-
eration R = P + Q when P 6= Q is called point addi-
tion and R = 2P is called point doubling. Addition
operation has 5 sub-operations: 2 squaring, 2 mul-
tiplications and 1 inversion. Consequently, for non-
negative integer number d, it is possible to define
the scalar point multiplication Q = dP on the elliptic
curve through the application of doubling and adding
operations, illustrated in Figure 1.
Figure 1: Adding and doubling points on EC.
2.2 Signed Binary Presentation
A signed binary representation of d is a vector
(d
0
,...,d
n1
), where
n1
i=0
2
i
d
i
= d and each d
i
is
an element of {−1,0,1}. Aiming to reduce the
Hamming weight of the representation, a number
of different signed binary representations have been
proposed, including MOF (Okeya et al., 2004),
NAF (Booth, 1951), CRT (Balasubramaniam and
Kathikeyan, 2007), DRM (HK and Sanghi, 2010) and
others.
In the following, we denote 1 = 1, 0 = 0, and
1 = 1.
2.2.1 Complementary Recoding Technique
(CRT)
CRT is one of the techniques to convert a number
to a canonical signed binary representation that re-
duces the Hamming weight (Balasubramaniam and
Kathikeyan, 2007). If d denotes an n-bit integer, as
well as its (usual) binary representation, then its CRT
representation is
n1
i=0
2
i
d
i
= (100...0)
(n+1) bits
ˆ
d
1, where
ˆ
d = 2
n
1 d denotes the binary comple-
ment of d. This conversion is very simple, efficient
and low time complexity in comparison with other
methods (HK and Sanghi, 2010).
Example 1: Let d = 7327, its binary representation
is (1110010011111)
2
. Converting the binary repre-
Speeding Up the Computation of Elliptic Curve Scalar Multiplication based on CRT and DRM
177
sentation to signed binary representation by applying
CRT is d =
n1
i=0
2
i
d
i
= (100...0)
(n+1)bits
ˆ
d 1 = (10000000000000)
2
-
(0001101100000)
2
- 1 = (10001101100001)
2
. In-
deed, converting the signed binary representation to
decimal, we get (10001101100001)
2
= 8192 512
256 64 32 1 = 7327 = d.
The Hamming weight for the binary representa-
tion of 7327 is 9, while the Hamming weight for
signed binary representation using CRT is 6. Smaller
hamming weight will save the number of operations
of calculating the EC scalar multiplication.
2.2.2 Direct Recoding Method (DRM)
DRM is another converting method to signed binary
representation (HK and Sanghi, 2010). This method
is based on the CRT but with time complexity less
than CRT because it uses only the single operation
of bitwise subtraction with 0 1 = 1. Also, DRM
generally results in smaller Hamming weight of the
result than CRT (HK and Sanghi, 2010).
The procedure to convert d to the signed binary
representation using DRM is the following. Let p be
the integer satisfying 2
p
d > 2
p1
. then d = (2
p
)
2
(2
p
k)
2
, where the subtraction of the bit 1 from the
bit 0 results in
1.
Example 2: Let d = 248. The binary representation
of d is (11111000)
2
. Converting the binary represen-
tation to the signed binary representation by applying
DRM as follows:
2
8
= (100000000)
2
and (2
8
248) = (1000)
2
. Then
d = (100000000)
2
(1000)
2
= (100001000)
2
.
Indeed, let us convert the signed binary representation
(100001000) we got by applying the DRM to deci-
mal, d = 256 8 = 248. The hamming weight for the
binary representation of 248 is 5, while the hamming
weight for signed binary representation using DRM
is 2. So, the conversion will bring savings during the
calculation of the EC scalar multiplication. Note that
the signed binary representation of 248 using CRT is
(1000001111)
2
, which has the Hamming weight 5.
2.3 ECC Scalar Multiplication
The scalar multiplication is one of the main opera-
tions in the ECC. Scalar multiplication is built up
from two main operations the addition of points,
and the doubling of a point. The scalar d is an in-
teger that has to be represented in (signed) binary.
The occurrence of a bit 1 in the representation cor-
responds to the operation of adding two points. There
are approximately n/2 such additions in a scalar mul-
tiplication. On the other hand, the number of dou-
bling operations is n 1. In the case of signed bi-
nary representation, the third digit which is 1 will be
processed by the subtracting operations. Algorithm
1 is an Adding-Subtracting Scalar Multiplication Al-
gorithm, which is used to compute the elliptic curve
scalar multiplication based for a scalar d =
n1
i=0
2
i
d
i
,
represented either in binary (d
i
{0,1}) or in signed
binary (d
i
{1, 0, 1}).
Algorithm 1: Adding-Subtracting Scalar Multiplication.
Data: Point on EC P, a string of signed bits
(d
0
,...,d
n1
)
Result: Q = dP
begin
Q 0, R P
for i = 0 to n 1 do
if (d
i
= 1) then
Q Q + R
else if (d
i
= 1) then
Q Q - R
end
R 2R
end
return Q
end
The example below shows how to find the ECC scalar
multiplication for a small scalar d.
Example 3: Finding the ECC scalar multiplication
for d = 115 .
First, convert the integer d to the binary, so d =
(1001101)
2
Then, find the ECC scalar multiplication based on
scalar d from right to left as illustrated in Figure 2.
Figure 2: Finding ECC Scalar Multiplication.
3 RELATED WORK
Many researchers have been working to enhance the
ECC by enhancing the calculation in the scalar multi-
plication. The improvement of the scalar multiplica-
tion can be achieved by improving or proposing some
related algorithms in scalar multiplication. Applying
the signed binary representation algorithms to find the
scalar multiplication is an efficient way to reduce the
number of non-zero bits in the key. Hamming Weight
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
178
is a big player to reduce the number of adding opera-
tions in computing the scalar multiplication.
In 1951 Booth proposed a new scalar representa-
tion called signed binary representation. There are
many methods to represent integers in signed binary
such as NAF, JSF, and MOF, Also in 2003 a new
method to compute general multiplication was pro-
posed by Chang et al. (Chang et al., 2003) which
is the result of using NAF, MOF, and JSF. Different
researchers proposed methods to calculate the scalar
multiplication in parallel computing using the binary
or signed binary representation.
Anagreh et al. (Anagreh et al., 2014), proposed a par-
allel method to compute scalar multiplication based
on the mutual opposite form (MOF). They extracted
a new algorithm that combined Adding- Subtracting
Scalar Multiplication Algorithm and Mutual Oppo-
site Form (MOF). They used two processors to per-
form the parallel calculation, the Method calculates
the doubling operation in a processor and adding op-
eration in another processor at the same time. The
proposed method computes the scalar multiplication
without performing the MOF conversion. The pro-
posed method is performing the comparison operation
of the given bit-string d to decide where the second
processor has to add or subtract the doubled point in
case of non-zero bits {1, 1}. The proposed method
achieves the speed-up 90% faster than the sequential
version of the ECC scalar multiplication with MOF.
Negre et al. (Negre and Robert, 2015) proposed a
new parallel approach for finding the scalar multipli-
cation. They split the scalar multiplication based on
NAF into two parts for the prime field F
p
and three
parts for the binary field F
2m
. In their method, both
operations doubling and (addition or subtraction) will
be performed in a separate thread. In the case of
prime fields, the operations of scalar multiplication
are split into two sections, based on representing d as
d = k
1
+ 2
s
k
2
. The first section Q
1
= k
1
P will be per-
formed in the first thread. The second part Q
2
= 2
s
k
2
P
will be performed in the second thread. Finding the
scalar multiplication in their proposed job given by Q
= Q
1
+ Q
2
, the two points Q
1
and Q
2
are added to
get the scalar multiplication Q. The proposed method
achieved an improvement by at least 10% the compu-
tation time of the scalar multiplication.
Software implementation proposed by Robert
(Robert, 2014) for finding ECC scalar multiplica-
tion. In their proposed method, they used two threads
to perform the parallel calculation. As well as, for
various elliptic curves over the prime F
p
used four
threads. Two algorithms are used in their job Double-
and-add and Half-and-add algorithms. In this work,
putting the doubling operations into one thread (pro-
ducer) while additions and subtractions operation into
another thread (consumer). One single mutex at the
beginning of the computation is used to avoid using
the mutex synchronization as much as possible. The
goal of using the mutex is to keep the consumer in in-
active state at the beginning of the processing while
the producer processes the doubling operation. The
method shows some violation of read-after-write de-
pendency. The memory violation might happen be-
cause of the size of the first batch of points which is
before releasing the mutex was too small. As well as,
in the case of the long sequence of zeros in the bi-
nary or NAF scalar representation. The results show
that there is an error rate that is limited to less than
1% but is not acceptable. To eliminate this problem, a
variable in a global memory as a loop counter is used.
An extra operation is added to the scheme that will
cause the reduction of the execution time in the par-
allel version. The NAF conversion is not a part of the
parallel section. The result shows that the enhance-
ment reached to 15% in comparison with the sequen-
tial version.
Phalakarn et al. (Phalakarn et al., 2018) proposed
a new representation for right-to-left parallel elliptic
curve scalar multiplication. The mathematical model
reduced the calculation time for finding ECC scalar
multiplication. Authors proposed algorithms that will
generate the representations which will reduce the ex-
ecution time of the scheme. Three processors are used
to perform the whole calculation in the scheme. Two
processors are for performing the doubling P and Q.
The third processor is for performing the addition op-
eration using two binary representations m and n. The
issue of the communication between the processors in
the model is still opened and may it cause an increas-
ing time complexity because it is an extra operation.
Anagreh et al. (Anagreh et al., 2019) introduced
an algorithm to find the ECC scalar multiplication
based on NAF representation. They used two pro-
cessors to perform the whole calculation in Parallel
computing. The first processor performs the doubling
operations while the second processor performs the
NAF conversion and (addition or subtraction) oper-
ations at the same time. Shared memory is used to
transmit the doubled points from the first processor
to the second processor. They performed the NAF
conversion by the second Processor before starting to
calculate the addition or subtraction operations. This
method eliminates the use of mutexes, as the con-
sumption of doubled points by the second processor
will not overtake their production by the first pro-
cessor. The result shows an enhancement is 60%
faster than the standard version of the ECC calcula-
tion based on NAF.
Speeding Up the Computation of Elliptic Curve Scalar Multiplication based on CRT and DRM
179
4 PARALLEL ALGORITHM
Reducing the execution time of the scalar multiplica-
tion by applying some efficient method is desired.
In this work, we propose and compare two parallel
algorithms to calculate the scalar multiplication based
on signed binary representations. We extract both al-
gorithms by combining the Add-Subtract Scalar Mul-
tiplication Algorithm and Converting Methods for
finding the signed binary representation. The convert-
ing methods from binary representation to signed bi-
nary representation are CRT and DRM respectively.
The first algorithm based on circular buffers and the
second is based on the delayed consumption of dou-
bled points. The first algorithm optimizes the inter-
processor communication costs, while the second al-
gorithm optimizes the synchronization costs.
4.1 Algorithm based on Circular
Buffers
In our first parallel algorithm, we use a circular buffer
to transmit the processed data among the two proces-
sors in the scheme. The circular buffer is considered a
shared memory. The processors can access the shared
memory at any time to perform both operation read
and write. Processor-1 can write the doubled point P
and the scalar d
i
in a specific location in the circu-
lar buffer. Processor-2 can read the doubled point P
and the scalar d
i
from the circular buffer to perform
the addition or subtraction operations. Circular buffer
has two pointers front and rear to organize the read-
ing and the writing operations. In each iteration in
the scheme, writing should be in a location pointed
by a front pointer Push
f ront
. The reading in the cir-
cular buffer should be in a location pointed by a rear
pointer Pull
rear
, where f ront > rear for all writing
and reading operations in the scheme. Such read-
ing and writing operation is the most important issue
to avoid any corruption in the calculation. As well
as, we use two attributes for performing the reading
and writing operations which are is-full() and is-not-
empty(). The main goal of using the attributes is to
check the situation of the circular buffer before per-
forming the reading or the writing operations. In case
the circular buffer is full, then keep cycling without
performing any operation until there is an empty lo-
cation in the circular buffer, then Processor-1 write
the point and scalar in the empty location in the cir-
cular buffer. The second attribute will be used by
Processor-2 before performing the addition or sub-
traction operations. The number of writing operations
in the scheme that will be performed in the Processor-
1 is based on the number of the bits n in the scalar d.
Moreover, the number of reading operations that will
be performed by the Processor-2 is based on the num-
ber of non-zero bits {1,1} in the scalar d.
Task decomposition strategy is applied in our par-
allel implementation of the scalar multiplication Al-
gorithm 2. We use two Processors to perform the mul-
tiplication. Processor-1 is responsible for perform-
ing three subtasks, see Processor-1 section in Algo-
rithm 2.
Algorithm 2: Parallel Scalar Multiplication based on cir-
cular buffers and signed binary representations.
Data: Integer d, Point in EC P
Result: Q = dP, based on a signed binary repre-
sentation
begin
Processor 1 signed binary conversion, Dou-
bling Operations
begin
R P
REP = Convert to signed binary(d)
for i = 0 to n 1 do
repeat
until ¬ buffer is full()
if REP
i
6= 0 then
Push(R,REP
i
)
end
R 2R
end
Push(0,0)
end
Processor 2 Addition Operations
begin
Q 0
repeat
if buffer is not empty() then
Pull(R,d
i
)
if d
i
= 1 then
Q Q + R
else
Q Q - R
end
until d
i
= 0
return Q
end
end
The first task, is the conversion of the scalar d to
a signed binary representation with digits {1,0,1},
using one of the conversion algorithms discussed in
Sec. 2.2. In our experiments, we have considered
the CRT and DRM representations. The second task
is calculating the doubling operations in the elliptic
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
180
curve based on the number of bits n in the scalar
d, where the point in elliptic curve P = (x,y) is
given. Performing the doubling operation by calling
the function n times, where n is the number of bits
in the signed binary representation. Regardless, is it
a 1, 0 or 1. The last task performed by Processor-
1 is writing the doubled point R and the digit REP
i
in an empty location in the circular buffer. As we
explained above, the circular buffer is shared mem-
ory and both Processors can access the shared data
for performing reading or writing operations. To indi-
cate that no more points will be pushed into the buffer,
Processor-1 will finish by pushing the pair (0,0).
Processor-2 is responsible for performing three
sub-task as well, see Processor-2 section in Algorithm
2. The first task is reading the doubled point R and
the digit d
i
from the circular buffer. Note, that each
doubled point has a specific digit d
i
, that will be stored
together in the circular buffer to keep the sequence
of the doubling operations P,2P,4P,8P,....,2
n
P. The
second task is performing the addition or subtraction
operations based on the non-zero bits of the scalar
d. If the bit d
i
is 1, Processor-2 has to perform the
addition operation. If the bit d
i
in the scalar d is
1, Processor-2 has to perform the subtraction opera-
tion which is the third task Processor-2 has to per-
form. Calculating the addition operation or/and sub-
tracting operation will be saved in the accumulator Q
which is the final result of finding EC scalar mul-
tiplication. The circular buffer is used to organize
transmitting the data between two processors in the
whole scheme. The data which has to transmit from
Processor-1 to Processor-2 is located in the shared
memory. Processor-1 writes in the circular buffer
while Processor-2 reads the stored data from the cir-
cular buffer. Every time Processor-1 is going to write
in the circular buffer, Processor-1 has to check that
circular is not full and there is an empty location to
the doubled point R and the scalar d
i
.
In case the circular buffer is full, Processor-1 has
to keep looping until there is an available location in
the circular buffer. Processor-2 has to check every
time that there is new data stored in the circular buffer
by Processor-1. Then, Read the data and perform-
ing the addition or subtraction operation based on the
scalar d.
4.2 Algorithm based on Delayed
Consumption
Compared to Alg. 2, the proposed Algorithm 3 moves
the task of doing the signed binary conversion of d
from Processor-1 to Processor-2. Hence Processor-1
only computes the point doublings. These are stored
in the array R = (R
0
,...,R
n1
), which has to be kept
in the shared memory. All the points of R will be
doubled regardless of where is the d
i
is zeros or ones.
Processor-2 reads the elements of R and either
adds or subtracts them from the accumulated value
Q, according to the signed bit representation of the
scalar d. Processor-2 will perform the signed binary
conversion first while the Processor-1 performs the
doubling operations and save the R
i
in circular Buffer.
Once Processor-2 finishes performing the conversion,
Processor-2 will start reading R
i
to perform the addi-
tion and subtraction operations.
Algorithm 3: Parallel Scalar Multiplication based on the
delayed consumption of doubled points.
Data: Integer d, Point in EC P
Result: Q = dP, based on a signed binary repre-
sentation
begin
Processor 1 Doubling Operations
begin
R
0
P
for i = 1 to n 1 do
R
i
2R
i1
end
end
Processor 2 signed binary conversion, Addi-
tion Operations
begin
Q 0
(d
0
,...,d
n1
) =
Convert to signed binary(d)
for i = 0 to n 1 do
if d
i
= 1 then
Q Q + R
i
else
Q Q - R
i
end
end
return Q
end
end
Again, in our experiments, we have considered
both the CRT and DRM conversion methods in order
to compute a signed binary representation of d.
Speeding Up the Computation of Elliptic Curve Scalar Multiplication based on CRT and DRM
181
5 EXPERIMENTAL EVALUATION
5.1 Algorithm based on Circular
Buffers
We can summarize that the proposed method is ex-
tracting a new algorithm that combines two algo-
rithms: Add-Subtract Scalar Multiplication, and a
method to give a signed binary representation of the
scalar. It performs the parallel computing on the ex-
tracted algorithm, given in Algorithm 2. We realized
the algorithm with either the CRT or the DRM method
in two versions of the code, Parallel and Sequential.
The evaluation of the algorithm is based on the paral-
lel and sequential versions for both the CRT and the
DRM method.
As with almost all parallel applications, it is im-
portant to produce the best sequential code before
starting to parallelize the code. Task decomposition
strategy is used to divide the work into two Proces-
sors to perform the overall scheme to get the best re-
sult. Both sequential and parallel codes are written in
Visual C++.Net. We use the Open MP library that is
supported in the Visual C++.Net package in order to
write the parallel section in the parallel version. As
well as, we used a ttmath library under C++ to define
a big integer number (bigger than or equal 1024-bits).
It is important to note that we use an Intel Core i5
7th-Gen machine to test both versions (Parallel and
Sequential) using Windows 10. We performed each
key size 10 times and the average execution time is
taken for all key sizes as shown in Figures 3 and 4.
In the implementation, we tested six different key
sizes for both algorithms in both cases parallel and
sequential: 160-bits, 192-bits, 224-bits, 256-bits,384-
bits, and 521-bits. We generated a big integer number
randomly for all key sizes we use in the implementa-
tion. Each number used in both parallel and sequen-
tial versions to determine the number of addition and
subtraction operations.
Figure 3: Execution time for Algorithm 2 using CRT.
The execution times for serial and parallel versions
are taken as shown in the figures for the different key
sizes of the ECC. In the case of the CRT encoding
method, the differences between serial time and par-
allel time are a big difference in the case of key size
521-bits, 192-bits and 160-bits as shown in figure 3.
The speed-up reaches 60% in comparison with the se-
rial version of the same key size.
Figure 4: Execution time for Algorithm 2 using DRM.
In the case of the DRM encoding method, the differ-
ence between serial time and the execution time in the
parallel version is significant in 192-bits and 160-bits
key size. The speedup is 60% in comparison with the
execution time of the serial version of the same key
size.
The testing is according to a random key gener-
ated to perform the scalar multiplication. The same
key is used to perform the calculation of the scalar
multiplication in both version parallel and serial for
each key size.
Figure 5: Speed up and Efficiency for CRT.
The number of non-zero bits in the key effect in the
calculation of the ECC scalar multiplication. The
occurrence of the bit 1 or/and 1 means perform-
ing the adding or/and subtraction operations by the
Processor-2. The average number of the non-zero bits
in the key is around 50% or less because of using the
signed binary representation.
Figure 6: Speed up and Efficiency for DRM.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
182
The execution time of one adding operation (or sub-
traction) is around two times and half of execution
time of the doubling operations. Adding operation is
much costly in comparison with doubling operation.
Therefore, the occurrence of non-zero bits in the key
even its around 50% doesn’t mean that the Processor-
1 will process the operation more than Processor-2.
In this case, it is important to note, that one adding
operation has a 5 sub-operations which are 2 squar-
ing, 2 multiplications and 1 inversion, that make an
adding operation is an expansive operation in compar-
ison with doubling operation. Therefore, performing
the whole calculation of the scalar multiplication by
this method ensures some kind of balancing. We can
see the efficiency of the whole calculation of the dif-
ferent key size is around 70% to 80%, see both figures
5 and 6.
5.2 Algorithm based on Delayed
Consumption
In Alg. 2, Processor-1 has to find a signed binary rep-
resentation of d and perform the doubling operations.
Then, for non-zero bits in the signed binary represen-
tation, save the doubled points in the circular buffer.
The doubled points that have been saved in the circu-
lar buffer will be readable by the Processor-2 to per-
form addition (or subtraction) operations. In Alg. 3,
the finding of a signed binary representation is done
by Processor-2. In this method, Processor-1 has to
perform the doubling operations and save all doubled
points in shared memory, no matter whether the cor-
responding bit in the signed binary representation is
zero or non-zero. The number of writing operations
in the shared memory is the same as the length of the
scalar d. Processor-2 has to read all saved points from
the shared memory. It also has to decide whether the
doubled point should be added or subtracted, based
on the CRT or DRM representation. Therefore, in
case the bit is 1, it performs the addition operation,
in case the bit is -1, it performs the subtraction oper-
ation, while in case the bit is 0, it drops the point and
keeps reading. Figure 7, shows the benchmarking re-
sult of Alg. 3 for both implementation of the CRT and
DRM. In general, the results show that the second al-
gorithm is less efficient than the first, especially when
using a small key size.
DRM is a low cost operation in comparison with
CRT and another conversion method. In DRM, the
time complexity of the conversion is less than the time
complexity of conversion by applying the CRT. As
well as, the number of non-zero bits in the signed bi-
nary converted by DRM is less than the signed binary
converted by CRT and other standard methods. As
Figure 7: Execution time for the second method.
Figure 8: Execution time for CRT and DRM.
mentioned above in example 2. The hamming weight
of DRM representation is 2, which is less than the
hamming weight of CRT representation. Less ham-
ming weight will save the calculation time of finding
ECC scalar multiplication in comparison of using an-
other representation. In figure 5, we can recognize
the difference in the execution time of both DRM and
CRT for both serial and Parallel version. The calcu-
lation of the ECC scalar multiplication using DRM
Representation is faster than using CRT representa-
tion. Overall Key sizes and in both serial and parallel
versions, finding scalar multiplication based on DRM
representation is faster than finding the scalar multi-
plication based on CRT.
6 CONCLUSION
In this work, we proposed two algorithms to calcu-
late the ECC scalar multiplication based on CRT and
DRM representation. The first algorithm based on
CRT representation and the second algorithm based
on DRM representation. We proposed a parallel al-
gorithm to perform both calculations of the two pro-
posed algorithms separately using two processors.
The results show speed-up reach to 60% in compari-
Speeding Up the Computation of Elliptic Curve Scalar Multiplication based on CRT and DRM
183
son with a serial version for both algorithms. As well
as, we introduced the difference in execution time for
both DRM and CRT. Future work includes using three
threads to perform the calculation in case the number
of non-zero bits in the key is more than usual, which
will make the calculation of adding point more costly
the third thread will help to reduce the execution
time in this case.
REFERENCES
Anagreh, M., Samsudin, A., and Omar, M. A. (2014). Par-
allel method for computing elliptic curve scalar mul-
tiplication based on mof. In Int. Arab J. Inf. Technol,
11(6).
Anagreh, M., Vainikko, E., and Laud, P. (2019). Acceler-
ate performance for elliptic curve scalar multiplication
based on naf by parallel computing. In ICISSP 2019
- 5th International Conference on Information System
Security and Privacy. SITEPRESS.
Asif, S. and Kong, Y. (2017). Highly parallel modular mul-
tiplier for elliptic curve cryptography in residue num-
ber system. In Circuits, Systems, and Signal Process-
ing, 26(6).
Azarderakhsh, R. and Reyhani-Masoleh, A. (2015). Parallel
and high-speed computations of elliptic curve cryp-
tography using hybrid-double multipliers. In IEEE
Transactions on Parallel and Distributed Systems,
26(6).
Balasubramaniam, P. and Kathikeyan, E. (2007). Elliptic
curve scalar multiplication algorithm using comple-
mentary recoding. In Applied mathematics and com-
putation, 1(190).
Booth, A. (1951). A signed binary multiplication technique.
In Journal of Applied Mathematics, 4.
Chang, C. C., Kuo, Y. T., and Lin, C. H. (2003). Fast al-
gorithms for common-multiplicand multiplication and
exponentiation by performing complements. In In
17th International Conference on Advanced Informa-
tion Networking and Applications, pages 807–811.
IEEE.
Gura, N., Patel, A., Wander, A., Eberle, H., and Shantz,
S. C. (2004). Comparing elliptic curve cryptography
and rsa on 8-bit cpus. In In International workshop
on cryptographic hardware and embedded systems,
pages 119–132. Springer.
Gutub, A. (2010). Remodeling of elliptic curve cryptog-
raphy scalar multiplication architecture using parallel
jacobian coordinate system. In International Journal
of Computer Science and Security (IJCSS), 4(4).
HK, P. and Sanghi, M. (2010). Speeding up computation of
scalar multiplication in elliptic curve cryptosystem. In
International Journal on Computer Science and Engi-
neering, 4(2).
Huang, X., Shah, P. G., and D, S. (2010). Minimizing ham-
ming weight based on 1’s complement of binary num-
bers over gf (2 m). In In 2010 The 12th International
Conference on Advanced Communication Technology
(ICACT), volume 2, pages 1226–1230. IEEE.
Koblitz, N. (1987). Elliptic curve cryptosystems, volume
48(177): 203-209. Mathematics of computation.
Miller, V. (1986). Use of elliptic curves in cryptography. In
In Conference on the theory and application of crypto-
graphic techniques, number 108 in LNCS, pages 417–
426, Berlin, Heidelberg. Springer.
Negre, C. and Robert, J.-M. (2015). Parallel approaches for
efficient scalar multiplication over elliptic curve. In
In- SECRYPT: International Conference on Security
and Cryptography, pages 202–209. IEEE.
Okeya, K., Schmidt-Samoa, K., Spahn, C., and Takagi, T.
(2004). Signed binary representations revisited. In In
Annual International Cryptology Conference, pages
123–139. Springer.
Phalakarn, K., Phalakarn, K., and Suppakitpaisarn, V.
(2018). Optimal representation for right-to-left par-
allel scalar and multi-scalar point multiplication. In
International Journal of Networking and Computing,
8(2).
Rivest, R., Shamir, A., and Adleman, L. (1978). A method
for obtaining digital signatures and public-key cryp-
tosystems. Communications of the acm, 21(2).
Robert, J.-M. (2014). Parallelized software implementation
of elliptic curve scalar multiplication. In In Interna-
tional Conference on Information Security and Cryp-
tology, pages 445–262. Springer.
Solinas, J. (2001). Low-weight binary representations for
pairs of integers. In technical report corr 2001-41,
Center for Applied Cryptographic Research, Univer-
sity of Waterloo, Canada.
ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy
184