Accelerating Square Root Computations over Large GF (2

)

Salah Harb and Moath Jarrah

Computer Engineering Department, Jordan University of Science and Technology,

CIT college, P.O.Box 3030, 22110, Irbid, Jordan

Keywords: Cryptosystems, Computation, Elliptic Curve Cryptography, Galois Field, Hardware Implementation, Square

Root.

Abstract: The communication networks of low-resources applications require implementing cryptographic protocols

and operations with less computational and architectural complexities. In this paper, an efficient method for

high speed calculations of square (SQR) root is proposed over Galois Fields GF (2

). The method is based on

using the results of certain pre-computations, and transforming the SQR root calculations into a system of

linear equations. The computational complexity of our proposed method for computing the SQR root in GF

) is O(m) which is significantly better than existing methods such as Tonelli-Shanks and Cipolla. Our

proposed method was implemented using different types of multipliers over several polynomial degrees.

Software and hardware implementations were developed in NTL-C++ and VHDL, respectively. Our software

experimental results show up to 38 times faster than Doliskani & Schost method. Moreover, our method is

840 times faster than Tonelli-Shanks method. In terms of hardware implementation and since Tonelli-Shanks

requires less resources than Doliskani & Schost, we compare our method with Tonelli-Shanks. The hardware

experimental results show that up to 50% less LUTs with a speedup of 18% that can be obtained compared to

Tonelli-Shanks method.

1 INTRODUCTION

The task of computing square (SQR) roots in Galois

fields GF (p

) has a practical importance to the

cryptography such as point counting, the prime-

proving algorithms and asymmetric encryption

scheme in the elliptic curves (ECs) (Menezes, 1993),

where SQR operation is required. In the basic

ELGamal encryption scheme (ElGamal, 1985), a

point P is defined to represent a message on a selected

elliptic curve E

(

x, y

)

over GF (p

), where



(

,

)

:y





+ax+b for a,b ∈ GF (p

). If M∈

GF (p

) denotes the message, then the point P has a

form of (,





(

,0

)

) in a basic cryptographic

scheme.

Finding the points on a curve in EC cryptography

(ECC) requires the SQR root operation. If ∈ GF

) is given, the success SQR root  =

±



(





++

)

indicates that this point is located

on the curve and applicable for different EC

operations. The SQR root operation is used widely in

compression and restoration points on ECC (Boneh &

Franklin, 2003) (Galbraith, et al., 2003). A point with

coordinates (, ) on the curve is compressed to the

form(, ), where ∈ {0, 1}. To restore the numeric

value () by(, ), it is necessary to solve the

quadratic equation



=

(



)

, which is similar to

compute the SQR root of P

(

)

i.e.



()

. As

mentioned earlier, SQR computations of GF (p

) is a

significant operation, especially for ECC and other

asymmetric cryptosystems (Galbraith, et al., 2003).

Computing SQR root of the binary GF (2

) finite field

is a time-consuming operation. The effectiveness of

cryptosystems that use SQR operations critically

relies on their implementations in finding SQR roots.

With the development of distributed computing in

recent years, the capabilities of computer systems that

can be used in security and critical fields have

significantly increased (Bryen, 2015) (Bitzinger &

Vlavianos, 2016). The simplest measure to improve

security of systems that use GF (2

) is to increase the

number of bits (m) in the intermediate operands,

where (m) is the operand length. Usually, the increase

in the number of bits (m) leads to dramatic slowdown

of the performance of cryptographic protection

systems (Barreto & Voloch, 2006). When (m) is

increased, the execution time of SQR operations of

GF (2

) is proportionally increased to (



). As a

Harb, S. and Jarrah, M.

Accelerating Square Root Computations Over Large GF (2

DOI: 10.5220/0006386702290236

In Proceedings of the 14th International Joint Conference on e-Business and Telecommunications (ICETE 2017) - Volume 4: SECRYPT, pages 229-236

ISBN: 978-989-758-259-2

229

result, the execution time of SQR root operations

increases significantly. Hence, reducing the

execution time of the SQR operation is critical for

practical applications (ElGamal, 1985) (Galbraith, et

al., 2003). In this paper, we propose a new method to

accelerate the SQR root computations using pre-

computed weights over many GF (2

) finite fields.

Based on the chosen GF (2

), our proposed method

computes the weights only one time and stores them

to be used in data processing. Our experimental

results show high improvements in terms of the

utilized area and the execution time.

2 ANALYSIS OF SQR ROOT

METHODS IN FINITE FIELDS

The practical importance of SQR root computations

in finite fields has resulted in intensive research.

Many methods to compute SQR root that use GF (2

)

were proposed with two prominent methods which

are: Cipolla (Cipolla, 1903) and Tonelli (Tonelli,

1891). These methods had been extended to the case

of fields GF (q

), where q is a prime, as in Tonelli-

Shanks (Shanks, 1973) and Alderman-Manders-

Miller (L.M. Aldeman & Miller, 1977) methods. In

1977, Tonelli-Shanks method was extended to the

case of extracting the root of arbitrary degrees (L.M.

Aldeman & Miller, 1977) (Z. Cao & Fan, 2011). A

specialized method for computing the cubic root,

characterized by its high speed was developed by (N.

Nishihara & Sueyoshi, 2009).

In Galois fields GF (2

), addition and

multiplication are the basic polynomial operations for

the elements {0, (2

-1)}. Addition operation

corresponds to a XOR logic and is denoted as '+'.

Multiplication operation in GF (2

) consists of two

steps: polynomial multiplication (multiplication

without carry), indicated by the symbol '⊗' and

reduction.

Reduction requires finding the remainder,

denoted as ‘rem’ or ‘mod’, through dividing the

multiplication result by the selected field irreducible

polynomial () of the corresponding degree (m) (I.

Blake & Smart, 1999) (IEEE, 2002). For each

element in GF (2

), generated by the irreducible

polynomial 

(



)

of degree (m), there is a

multiplicative cyclic group of order (n) that does not

exceed (2

-1) (Atkin & F.Morain, 1993) (D.

Hankerson & Vanstone, 2004). The cyclic group of

element i, where (2 ≤ i ≤ 2

-1), can be generated by

multiplying the result (s) of a previous element in GF

) with i until getting i as a multiplication result

(IEEE, 2002). For example, a GF (2

) is formed by

the irreducible polynomial:

() = 



+



+1, (P = 25

= 110 01

The cyclic groups are generated in table 1. In each of

the cyclic group of GF (2

), a cyclic subgroup can be

found where each element is the SQR of the previous

group. These subgroups of the field that are formed

by a polynomial() =



+



+1 are given

in table 2. Clearly, the order of each of the quadratic

subgroups does not exceed



, which is less than

or equal to (m).

Table 1: Cyclic Groups of the Field GF (2

Element Cyclic Power Groups

2, 4, 8, 9, 11, 15, 7, 14, 5, 10, 13, 3, 6,

12, 1, 2

3 3, 5, 15, 8, 1, 3

4, 9, 15, 14, 10, 3, 12, 2, 8, 11, 7, 5,

13, 6, 1, 4

5 5, 8, 3, 15, 1, 5

6, 13, 5, 7, 11, 8, 2, 12, 3, 10, 14, 15,

9, 4, 1, 6

7, 12, 15, 6, 11, 3, 9, 13, 8, 10, 4, 5, 2,

14, 1, 7

8 8, 15, 5, 3, 1, 8

9, 14, 3, 2, 11, 5, 6, 4, 15, 10, 12, 8, 7,

13, 1, 9

10 10, 11, 1, 10

11 11, 10, 1, 11

12, 6, 3, 13, 10, 5, 14, 7, 15, 11, 9, 8,

4, 2, 1, 12

13, 7, 8, 12, 10, 15, 4, 6, 5, 11, 2, 3,

14, 9, 1, 13

14, 2, 5, 4, 10, 8, 13, 9, 3, 11, 6, 15,

12, 7, 1, 14

15 15, 3, 8, 5, 1, 15

Table 2: Quadratic Cyclic Subgroups of the Field GF (2

Element Power Cyclic Quadratic Subgroups

2 2, 4, 9, 14, 2

3 3, 5, 8, 15, 3

4 4, 9, 14, 2, 4

5 5, 8, 15, 3, 5

6 6, 13, 7, 12, 6

7 7, 12, 6, 13, 7

8 8, 15, 3, 5, 8

9 9, 14, 2, 4, 9

10 10, 11, 10

11 11, 10, 11

12 12, 6, 13, 7, 12

13 13, 7, 12, 6, 13

14 14, 2, 4, 9, 14

15 15, 3, 5, 8, 15

For any GF (2

), there is at least one irreducible

polynomial, which implies having different cyclic

SECRYPT 2017 - 14th International Conference on Security and Cryptography

230

groups of two or more irreducible polynomials for the

same GF (2

). For example, GF (2

) has

another()= 



+  + 1, which has cyclic

groups and quadratic subgroups presented in table 3.

In Tonelli-Shanks method, computing

√

in GF (2

)

requires finding the quadratic cyclic subgroup of A

until the element that is preceding A is found. For

example, according to table 2, if A = 15, the preceding

element in the quadratic sub group is 8, which is the

SQR root of A = 15. Indeed: 88 mod 25 = 64 mod

25 = 15.

Table 3: Cyclic Groups of the Field GF (2

Element Cyclic Power Groups

Cyclic Quadratic

Subgroups

2, 4, 8, 3, 6, 12, 11, 5, 10,

7, 14, 15, 13, 9, 1, 2

2, 4, 3, 5, 2

3, 5, 15, 2, 6, 10, 13, 4,

12, 7, 9, 8, 11, 14, 1, 3

3, 5, 2, 4, 3

4, 3, 12, 5, 7, 15, 9, 2, 8,

6, 11, 10, 14, 13, 1, 4

4, 3, 5, 2, 4

5, 2, 10, 11, 7, 8, 14, 3,

15, 6, 13, 12, 9, 11, 1, 5

5, 2, 4, 3, 5

6 6, 7, 1, 6 6, 7, 6

7 7, 6, 1, 7 7, 6, 7

8 8, 12, 10, 15, 1, 8 8, 12, 15, 10, 8

9, 13, 15, 14, 7, 10, 5, 11,

12, 6, 3, 4, 2, 1, 9

9, 13, 14, 11, 9

10 10, 8, 15, 12, 1, 10 10, 8, 12, 15, 10

11, 9, 12, 13, 6, 15 -3 14,

8, 7, 4, 10, 2, 5, 1, 11

11, 9, 13, 14, 11

12 12, 15, 8, 10, 1, 12 12, 15, 10, 8, 12

13, 14, 10, 11, 6, 8, 2, 9,

15, 7, 5, 12, 3, 4, 1, 13

13, 14, 11, 9, 13

14, 11, 8, 9, 7, 12, 4, 13,

10, 6, 2, 15, 5, 3, 1, 14

14, 11, 9, 13, 14

15 15, 10, 12, 8, 1, 15 15, 10, 8, 12, 15

The passage through the quadratic cyclic subgroup is

equivalent to the operation of exponentiation (Z. Cao

& Fan, 2011):

=







()

(1)

Thus, the idea of calculating the SQR root in GF

) is theoretically quite simple, but its

implementation involves a significant amount of

computing resources since the calculation of equation

(1) assumes (m-1) operations of root squaring and

reduction. Root squaring follows the rules of

polynomial multiplication, which ignores the carries.

The operation of a polynomial squaring has an

important property: odd-positioned bits of the

squared polynomial of number A are zeros, and the

even-positioned bits are double bits of A, that is, if:

A=



+



+. . +



(2)

hen, A

is:

AA=



+





+. . +





The importance of this property is that the

calculation of the SQR of a polynomial does not

require any computational overheads, and is basically

a one-position shifting process for the bits in the

initial number. When assessing the computational

complexity of the SQR root operation using Tonelli-

Shanks method, it should be noticed that in practice,

the operand length of the field elements (m)

significantly exceeds the processor word length (l).

Therefore, the elements of the field are divided into t

sections, where t = m / l.

The reduction operation puts the result of a

polynomial multiplication into the limits of the field

GF (2

). The polynomial division involves

performing (m-1) cycles. For each cycle, one bit is

shifted in the (m+1)-bit-() code, and added

logically to the current remainder when the last

significant digit is one.

Shifting the (m+1)-bit-() code by one digit

requires (t+1) shift operations. Since this operation is

performed in each of the (m-1) cycles of reduction,

the total number of shift operations is: (t+1).(m-1).

Based on the fact that the reduction operation

involves addition in half of the cycles, the average

number of such operations is (m-1)/2. Implementing

this operation on a l-bit processor requires (t+1)

logical addition operations to be computed.

Therefore, a single reduction operation requires

(t+1).(m-1)/2 operations. Accordingly, the average

running time of a single reduction operation of a SQR

is: 1.5

⋅

(t+1)

⋅

(m-1)

⋅τ

, where (τ) is the execution time

of a single logical operation.

Given that SQR root requires (m-1) squaring

operations, then the average number (N

) of logical

operations that are required by the SQR root in GF

) is given by equation (3).





=1.5.

(

+1

)

.(−1)



(3)

A similar estimation of the computational

complexity of (



)for Cipolla-Lehmer method is

given in (Menezes, 1993). With the increasing

productivity of distributed computer systems that can

potentially be used in a cryptanalysis system (Feng

Wang & Morikawa, 2005), the easiest way to

improve the reliability of these systems is to increase

the word length of the operands. Increasing operands’

lengths increases the computational complexity

dramatically according to equation (3). Previous

research in calculating the SQR in GF (2

) has not

Accelerating Square Root Computations Over Large GF (2

)

231

addressed this problem in real implementation of

cryptosystems, which results in higher complexity.

The algorithm that was proposed in (Ozdemir,

2013), has showed improvements in the computation

of the SQR root in finite fields GF (p). The algorithm

is based on a probabilistic theory and gives higher

probabilities of success than the Tonelli-Shanks and

Cipolla algorithms. The algorithm uses polynomial

factorization through locating a random point P over

an elliptic curve 

(

,

)

:



=



+;  = mP,

where m is an odd integer. If  is not the identity point

of the selected curve, then there is a possibility to

obtain the SQR root

√

by computing the 2



. value.

The algorithm requires a modular multiplication and

squaring operations for the addition and doubling

calculations. The time complexity in (Ozdemir, 2013)

is 

(







)

, where u ≥ 0 is the bit operation for the

EC calculations. The algorithm requires large number

of operations that consume resources and time to

compute the root of a number. The complexity is

increased for large odd values of p.

3 SPEEDING UP

CALCULATIONS OF SQR

ROOT BY STORING MEMORY

WEIGHTS

In practical GF (2

) cryptosystems, the generated

polynomials and the used fields are considered fixed.

Keeping this in mind, an effective approach to

accelerate the operation of SQR root in GF (2

) is to

use pre-computed weights for the generated

polynomials, where these weights are computed only

one time. The pre-computed weights are stored in

memory and used whenever the SQR root operation

on GF (2

) is required. The proposed method is based

on the idea of using the results of pre-computations to

speed up SQR root computations on GF (2

). It is

demonstrated as follows:

The number A is represented as a logical sum as:

A = a







+⋯+a



, where





,



,…,



∈ {0,1}. For the logical summation,

the following is valid:

• When n is even, ( + )



=



+



true.

• By using equation (1) to calculate the SQR

root of A(B =

√

)on GF (2

), it can be

re-written as shown in equation (4).

=





 

(



)

(4)

= (



+



(

)





+



(



)





+. . + 



(



)





)

(



)

=



+



(

)





 

(



)

+..+



(



)







(



)

If m values are pre-calculated as:









(

)





() =

(

)





()





(

)







(



)

(

)





()







(



)







(



)

=

(

)

()



()

Then the calculation of SQR root in equation (4)

becomes:

=



.



+



.



+⋯+



.



(5)

Hence, equation (5) can be directly used to

calculate the SQR root of A. As it is shown in figure

1, once all () weights are generated, each bit ∈

{0,  − 1} in A is forwarded to the corresponding



. If the x-bit is 1, the weight (



) is selected and

logically added to other selected weights. Otherwise,

the weight is discarded by selecting 0.

m-1

...

Figure 1: Calculation of the square root of input A.

Example: The proposed method is illustrated by the

following example:

Suppose we have GF (2

) that is formed using the

polynomial

(



)

=



+



+1, then the pre-

computed weights are:









(

)





(





+



+1

)

=14

SECRYPT 2017 - 14th International Conference on Security and Cryptography

232





=

(

)





(





+



+1

)





=

(

)





(





+



+1

)

Assume A = 15 and the bit values of A are: 



1, 



=1, 



=1, 



=1. To calculate the SQR

root of A, we apply the logical sum operation on the

weights at the one bits of A as:

=



+



+



+



=1+14+2+5=8

Hence, the total number N



of logical operations

that are needed to compute the SQR root using pre-

calculations depends on the number of bits in the

binary code of A. If the number of the one-bits in the

code of A is equal to half of the total number of bits

m, then the value of 



is determined by the following

equation (6).





=0.5..

(6)

By comparing expression (6) with expression (3),

we can conclude that the use of the proposed method

to compute the SQR root in GF (2

) reduces the

number of operations by a considerable factor which

is almost 1/3m. Consequently, this reduction in the

number of the required operations reduces the

executions time of calculating the SQR root of an

input that uses GF (2

4 SOFTWARE/HARDWARE

REALIZATIONS AND TIME

EVALUATION

4.1 Software Implementation and Time

Analysis

To prove the correctness and effectiveness of the

proposed SQR root method, a software

implementation for the SQR root method was

developed using C++ programming language with a

special powerful Galois field plug-in called NTL

(NTL, 2016). NTL is a high-performance C++

framework that offers various data structures and

algorithms for manipulating polynomial operations

over integers and finite fields. All computations are

performed using Pentium Dual-Core processor

running on 2.6 GHz clock with 1 GB RAM.

Implementing the proposed SQR root method

involves initially constructing the irreducible

polynomial for GF (2

) and creating (m) weights to

calculate the root of A using the logic operation

(XOR) or '+'. As shown in figure 2, creating the

weight requires several steps. The first step starts by

taking the constant value 2. Then, the loop (L1)

iterates from 0 to ( − 1) to perform (m) SQR

operations on the constant value 2. This step

computes the base weight (



). The second loop (L2)

iterates from 0 to ( − 2) to perform the

multiplication operation on the base weight (



). At

In each iteration of (L2), the base weight is multiplied

with the previous (



), starting from the base weight

(



For example, using a polynomial

(



)

=



+





+1, m=4, we compute:





Using L1 loop:





=14 //

(

)





(





+



+1

)

=14

Using L2 loop:





=2 //

(

)





(





+



+1

)





=5 //

(

)





(





+



+1

)

Once all (m) weights are generated, the third loop

(L3) is executed to calculate the SQR root of A using

a logic operation (XOR). The XOR operation is

executed according to the number of ones in the

binary code of A. This processing step is similar to

applying an on/off process on the weights.

Constant (2) Construct P(x)

Square Operation

(0)



(m-1)

Base Weight

(i-1)

(i)

(0)



(m-2)

Multiplication Operation

(i)

(i-1)

(i-2)

Figure 2: Data flow of generating (m) weights.

Table 4 presents an approximate time that is needed

to construct (m) weights using the proposed method

over different GF (2

), where (m) is from 16 to 768

bits. The irreducible polynomials for (m) are

generated automatically using a built-in function in

NTL plugin called BuildIrred. As expected, when the

value of (m) increases, the time to generate (m) of

weights increases as well. The execution time to

calculate the SQR root for any m-bit of input A is

Accelerating Square Root Computations Over Large GF (2

)

233

almost negligible since the calculations involve only

the logic operation (XOR).

Table 4: Required Time to Generate (M) Weights (NTL).

M Time(sec)

16 0.0017

64 0.0024

128 0.0065

160 0.0073

233 0.0257

512 0.0434

768 0.6391

Figure 3 presents an average execution time

comparison between our SQR root NTL

implementation for elements over GF (2

), and our

NTL implementation of the famous Tonelli-Shanks

method, where (m) is from 500 to 10000. In each

field, one hundred of random tests were performed for

our method, while ten of random tests were

performed for the Tonelli-Shanks method. As shown

in the figure, our proposed method is much faster than

the Tonelli-Shanks method for all random tests. The

bottleneck of the Tonelli-Shanks method is in the use

of the exponentiation, which takes (m) of squaring

operations to form the cyclic quadratic group for an

element. The method selects the previous element as

the SQR root result.

Figure 3: Average execution time for our proposed method

vs. Tonelli-Shanks algorithm (Software implementation).

On the other hand, our proposed method requires

logic operations (XOR) with one-time pre-

computation of weights. It takes XOR operations

equal to the number of ones at the element or m/2

XOR operations on average.

Figure 4 shows a comparison between our

proposed method over GF (2

) and the method of

(Doliskani & Schost, 2014). Both were implemented

in NTL framework. As shown in the figure, our

method outperforms (Doliskani & Schost, 2014)

method in terms of the average execution time. The

average execution time in (Doliskani & Schost, 2014)

is (

(



)

.+



.) when p = 2. This

concludes that our proposed method, which is

O(m/2), does not have bottlenecks in computing the

SQR root for any input over GF (2

) due to the logic

operation (XOR). Our method does not need to use

the costly field multiplications, inversions, and

squaring operation.

Figure 4: Average execution time for our proposed method

vs. (Doliskani & Schost, 2014) method (Software

implementation).

4.2 Hardware Implementation,

Resource, and Time Analysis

Basically, our proposed implementation concentrates

on achieving efficient and high-speed SQR root

calculation in Galois fields to improve throughput,

speed, area, and/or power consumption. We have

developed hardware components for the SQR (L1)

and Multiplication (L2) loops that were investigated

in subsection 4.1. These implementations were

developed using VHDL language. The hardware

platform that was used is Xilinx Virtex-5 FPGA

family.

Our implementation is based on equation (7). Two

phases to perform a field multiplication in GF (2

)

were applied in order to compute the polynomial

multiplication U(x). A reduction phase is then used on

the selected irreducible polynomial.

Figure 5 shows the average number of LUTs/F.Fs

0,0001

0,001

0,01

0,1

100

500 2500 4500 6500 8500

Time (sec)

Degree (m)

Tonelli-Shanks

Proposed method

0,0001

0,001

0,01

0,1

100 300 500 700 900

Time (sec)

Degree (m)

(Doliskani & Schost, 2014)

Proposed method

SECRYPT 2017 - 14th International Conference on Security and Cryptography

234

resources and the gained frequencies that are needed

to generate one-time weights for each degree (m)

from 16 to 233 bits. A loop, Karatsuba (Che Wun

Chiou & Lin, 2015) and interleaved multipliers

(Karatsuba & Ofman, 1963) (Rodriguez-Henriquez &

Koc, 2003) (D. Narh Amanor & Schimmler, 2005)

(Gathen & Shokrollahi, 2005) were used to generate

and store weights in order to be used on the received

input data. The process of generating the weights is

performed once. This results in achieving higher

frequencies and less logic elements. Figure 5

illustrates the results.



(



)

=

(



)

 

(



)

, ℎ



(



)



(). ()

(7)

Figure 5: Average number of the required resources using

loop, Karatsuba and interleaved multipliers.

To prove the effectiveness of our SQR root method,

we have implemented it using Xilinx Virtex-5 FPGA

device with values of (m) that range from 16 to 233

bits. Our evaluation involves the efficient Tonelli-

Shanks algorithm since it shows less average

computing time and resources requirements than the

Doliskani & Schost method. Figure 6 and figure 7

compare our method to the Tonelli-Shanks method

over GF (2

). As shown in figure 6, our method

requires less LUTs than the Tonelli-Shanks method.

Also, it achieves a higher frequency because of

having only one type of operations (XOR) as shown

in figure 7. The proposed method does not require the

exponentiation operations as in Tonelli-Shanks

method. In Tonelli-Shanks method, the loop

multipliers are used to perform exponentiation

operations. In the

proposed SQR root method, the

weights at a specific selected (m) and an irreducible

polynomial P(x) are generated one time only. In case

of changing the field, a variable can be inserted to

indicate whether to regenerate the weights or not.

Figure 6: Number of LUTs needed in our method vs.

Tonelli-Shanks algorithm.

Figure 7: Achieved frequencies (MHz) in our method vs.

Tonelli-Shanks algorithm.

5 CONCLUSIONS

An efficient and high speed SQR root method over

Galois fields GF (2

) is proposed in this paper. It is

based on using pre-calculated weights. The weights

are calculated one time only and then stored in a

storage memory. Software and hardware

implementations were provided in the paper to verify

the proposed method. The computational complexity

of the proposed method is O(m) which is significantly

less than existing methods that require a time

complexity of O(m

The software implementation of the proposed

method achieves less execution time than both of

520,8

492,4

483,55

483

472,75

100

200

300

400

500

600

16 64 128 160 233

Degree (m)

LUTs F.Fs Frequency (MHz)

219

431

539

689

108

195

239

337

100

200

300

400

500

600

700

800

16 64 128 160 233

Number of LUTs

Degree (m)

Tonelli-Shanks Proposed method

290,639

257,839

227,844

216,38

196,647

365,43

283,476

245,621

245,317

236,583

100

150

200

250

300

350

400

16 64 128 160 233

Frequency (MHz)

Degree (m)

Tonelli-Shanks Proposed method

Accelerating Square Root Computations Over Large GF (2

)

235

Tonelli-Shanks and Doliskani & Schost methods

using several random tests. The hardware

implementation uses different types of multipliers

including loop, Karatsuba, and interleaved

multipliers. The experimental results show that our

method outperforms the Tonelli-Shanks method over

GF (2

) in terms of the required resources and

frequencies using several polynomial degrees.

ACKNOWLEDGEMENTS

This work is supported by the Deanship of Research

at Jordan University of Science and Technology grant

number 20160227/89-2016.

REFERENCES

Atkin, A. & F.Morain, 1993. Elliptic curves and primality

proving. Math. Comput, 61(203), pp. 29-68.

Barreto, P. S. L. M. & Voloch, J. F., 2006. Efficient

Computation of Root in Finite Fields. Designs, Codes

and Cryptography, 39(2), pp. 275-280.

Bitzinger, R. & Vlavianos, H., 2016. Emerging Critical

Technologies and Security in the Asia-Pacific. 1st ed.

s.l.:Palgrave Macmillan.

Boneh, D. & Franklin, M., 2003. Identity-Based Encryption

from the Weil Pairing. SIAM J. of Computing, 32(3),

pp. 586-615.

Bryen, S. D., 2015. Technology Security and National

Power: Winners and Losers. 1st ed. s.l.:Transaction

Publishers.

Che Wun Chiou, C.-Y. L. J.-M. L. Y.-C. Y. H. W. C. & Lin,

L.-C., 2015. Digit-Serial Systolic Karatsuba Multiplier

for Special Classes over GF(2m). Journal of

Computers, 26(1), pp. 40-57.

Cipolla, M., 1903. Un metodo per la risolutione della

congruenza di secondo grado. Rendiconto

dell’Accademia Scienze Fisiche e Matematiche, 9(3),

pp. 154-163.

D. Hankerson, A. M. & Vanstone, S., 2004. Guide to

elliptic curve cryptography. In: New York: Springer-

Verlag.

D. Narh Amanor, C. P. J. P. V. B. & Schimmler, M., 2005.

Efficient hardware architectures for modular

multiplication on FPGAs. s.l., International Conference

on Field Programmable Logic and Applications, pp.

539-542.

Doliskani, J. & Schost, É., 2014. Taking roots over high

extensions of finite fields. Mathematics of

Computation, Volume 83, pp. 435-446.

ElGamal, T., 1985. A public key cryptosystem and a

signature scheme based on discrete logarithms. IEEE

Trans. Inform. Theory 31, Issue 4, pp. 469-472.

Feng Wang, Y. N. & Morikawa, Y., 2005. A high – speed

square root computation in finite fields with application

to elliptic curve cryptography. Mem Fac Eng Okayama

Univ, Volume 39, pp. 82-92.

Galbraith, S., Paulus, S. & Smart, T., 2003. Arithmetic on

superelliptic curves. Mathematics of Computation,

32(237), pp. 393-405.

Gathen, J. v. z. & Shokrollahi, J., 2005. Efficient FPGA-

based Karatsuba multipliers for polynomials over F2.

s.l., Proc. 12th Workshop on Selected Areas in

Cryptography (SAC 2005),, pp. 359-369.

I. Blake, G. S. & Smart, N., 1999. Elliptic curves in

cryptography, Cambridge: Cambridge University

Press.

IEEE, 2002. Standard specifications for public-key

cryptography. [Online]

Available at: http://grouper.ieee.org/groups/1363/

[Accessed July 2016].

Karatsuba, A. & Ofman, Y., 1963. Multiplication of

Multidigit Numbers on Automata. Soviet Physics-

Doklady, 7(7), pp. 595-596.

L.M. Aldeman, K. M. & Miller, G., 1977.

On taking root in

finite fields. Providence, RI, Proc. 18-th IEEE

Symposium on Foundations of Computer Science.

Lehmer, D., 1969. Computer technology applied to the

theory of numbers. Number Theory, Math. Assoc.

Amer, p. 117–151.

Menezes, A. J., 1993. Elliptic Curve Public Key

Cryptosystems. Volume 234, pp. 14-128.

N. Nishihara, R. H. & Sueyoshi, Y., 2009. А remark on the

computation of cube root in finite fields. [Online]

Available at: http://eprint.iacr.org/2009/457.pdf

[Accessed June 2016].

NTL, 2016. NTL: A Library for doing Number Theory.

[Online]

Available at: http://www.shoup.net/ntl/

[Accessed July].

Ozdemir, E., 2013. Computing Square Roots in Finite

Fields. TRANSACTIONS ON INFORMATION

THEORY, 59(9), pp. 5613-5615.

Rodriguez-Henriquez, F. & Koc, K., 2003. On Fully

Parallel Karatsuba Multipliers for GF(2m). s.l., Proc.

Int’l Conf. Computer Science and Technology (CST

2003), p. 405–410.

Shanks, D., 1973. Five number-theoretic algorithms.

Winnipeg, Man, Congressus Numerantium.

Tonelli, A., 1891. Bemerkung über die Auflösung

quadratischer Congruenzen. Göttinger Nachrichten, pp.

344-346.

Z. Cao, Q. S. & Fan, X., 2011. Adleman-Manders-Miller

root extraction. [Online] Available at:

http://arxiv.org/abs/1111.4877 [Accessed June 2016].

SECRYPT 2017 - 14th International Conference on Security and Cryptography

236