A METHOD FOR FLEXIBLE REDUCTION OVER BINARY FIELDS

USING A FIELD MULTIPLIER

Saptarsi Das

, Keshavan Varadarajan

, Ganesh Garga

, Rajdeep Mondal

Ranjani Narayan

and S. K. Nandy

CAD Lab, Indian Institute of Science, Bangalore, India

Morphing Machines Pvt. Ltd., Bangalore, India

Keywords:

Elliptic Curve Cryptography, Binary Fields, Flexible Reduction, Polynomial Multiplication.

Abstract:

Flexibility in implementation of the underlying ﬁeld algebra kernels often dictates the life-span of an Elliptic

Curve Cryptography solution. The systems/methods designed to realize binary ﬁeld arithmetic operations can

be tuned either for performance or for ﬂexibility. Usually ﬂexibility of these solutions adversely affects their

performance. For solutions to reduction operation this adverse effect is particularly prominent. Therefore it is

a non-trivial task to design a ﬂexible reduction method/system without compromising performance. In this pa-

per we present a method for ﬂexible reduction. The proposed reduction technique is based on the well-known

repeated multiplication technique and Barrett reduction. This technique is particularly appealing in the context

of coarse-grain programmable architectures where performance of any kernel is heavily inﬂuenced by granu-

larity of operations. In this context we propose a design of a polynomial multiplier based on the well-known

Interleaved Galois Field multiplier to accelerate the underlying multi-word polynomial multiplications. We

show that this modiﬁed IGF multiplier offers a signiﬁcant improvement in throughput over a purely software

realization or a hybrid software-hardware implementation using a conventional polynomial multiplier.

1 INTRODUCTION

Proliferation of various kinds of threats has lead to

an increased interest in cryptographic solutions for

communication equipments. Thus strong cryptogra-

phy has emerged as an indispensable part of different

communication protocols. One of the strongest de-

terrents of such threats is the class of Elliptic curve

cryptography (ECC) algorithms. Due to ever increas-

ing threat level, the key-length applied to these algo-

rithms keeps on increasing with time. In order to cope

with such growing need for stronger security the ideal

approach would be to design ”future-proof”solutions.

Quantitatively, the life-span of such a solution can be

evaluated by measuring its ﬂexibility to support var-

ious key lengths. The ECC algorithms are designed

based on algebraic properties of ﬁnite ﬁelds. The na-

ture of arithmetic involved in these algorithms makes

it difﬁcult to build arbitrarily ﬂexible solutions with-

out compromising on performance. The two funda-

mental operations involved in ﬁnite ﬁeld arithmetic

are addition and multiplication. Binary ﬁelds (ﬁnite

ﬁelds of the form GF(2

)) are especially popular due

to the ease of implementation of addition and sub-

traction (which are equivalent to one another) over

them. However, multiplication is a relatively expen-

sive operation. Unlike addition or subtraction, multi-

plication of two polynomials from a ﬁnite ﬁeld may

produce a polynomial whose degree exceeds the or-

der of the ﬁnite ﬁeld. In order to translate such a re-

sult to an equivalent canonical form within the order

of the ﬁnite ﬁeld, a reduction operation is performed.

Flexibility in polynomial multiplication can be easily

achieved. However, supporting ﬂexible reduction ef-

ﬁciently over arbitrarily large binary ﬁelds and for any

irreducible polynomial requires special attention.

In this paper we investigate the case of ﬂexible re-

duction and analyze different possible solutions. In

section 2 we discuss the nature of the reduction oper-

ation and show that a software-hardware hybrid so-

lution is best suited for ﬂexible reduction over any

binary ﬁeld using any irreducible polynomial. We

identify the possibility of using a hardware assist in

the form of a ﬁeld multiplier for improving the per-

formance of such a hybrid technique. In this context

we present the design of a Modiﬁed Interleaved Ga-

Das S., Varadarajan K., Garga G., Mondal R., Narayan R. and Nandy S..

A METHOD FOR FLEXIBLE REDUCTION OVER BINARY FIELDS USING A FIELD MULTIPLIER.

DOI: 10.5220/0003447500500058

In Proceedings of the International Conference on Security and Cryptography (SECRYPT-2011), pages 50-58

ISBN: 978-989-8425-71-3

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

lois Field (MIGF) multiplier as an accelerator for the

well-known Repeated Multiplication Method of re-

duction. In section 3 we present the improvement in

performance achieved through the use of MIGF mul-

tiplier. We compute the increase in hardware com-

plexity of the said multiplier which is offset by the

improvement in performance of the reduction opera-

tion. In section 4 we present the synthesis results of

a 32-bit MIGF multiplier and evaluate the improve-

ment in performance of reduction operation over ﬁve

NIST recommended irreducible polynomials. Finally

we conclude the paper with a short summary.

2 REDUCTION OVER BINARY

FIELDS: THE BASIC

OPERATIONS INVOLVED AND

THEIR REALIZATION

The reduction operation is a modulo operation of a

polynomial with an irreducible polynomial that gen-

erates the ﬁnite ﬁeld under consideration. Section 2.1

presents a brief mathematical background of reduc-

tion operation, various ways of implementing it and

the associated implications. In section 2.2 we com-

pare two algorithms for reduction operation and iden-

tify polynomial multiplications as the core computa-

tions in them. In section 2.3 we analyze the multipli-

cation operations involved in reduction. In section 2.4

we present the design of a MIGF multiplier that can

be used for efﬁcient implementation of the aforemen-

tioned polynomial multiplications.

2.1 Mathematical Background of

Reduction Operation

Elements of a binary ﬁeld are usually represented as

polynomials over the base ﬁeld GF(2) i.e. the degree

of the polynomials is determined by the order of the

ﬁeld and the coefﬁcients belong to GF(2). Multipli-

cation of such elements is governed by the addition

and multiplication rules over GF(2). For instance,

let us consider two elements A(x) and B(x) belonging

to the binary ﬁeld GF(2

). These polynomials can

be represented as a string of m symbols, where each

symbol is 0 or 1. Therefore they are equivalent to two

m-bit long binary strings. Equation 1 shows the two

polynomials and their product C(x).

A(x) = Σ

m−1

i=0

B(x) = Σ

m−1

i=0

C(x) = A(x) × B(x) (1)

= Σ

2m−2

i=0

;wherec

= Σ

i+ j=k

As is apparent from equation 1, the result C(x) is al-

most twice as long as the input polynomials. C(x) has

a unique equivalent canonical representation among

the set of polynomials of degree m−1. Though, math-

ematically both the representations are equivalent, ef-

ﬁcient utilization of computation resources necessi-

tates conversion from the 2m−1-bit representation to

the m-bit representation. This conversion often re-

ferred to as reduction operation is based on an irre-

ducible polynomial that generates the binary ﬁeld of

interest. The reduction operation is based on the fact

that a polynomial C(x) belonging to a ﬁnite ﬁeld is

equivalent to the polynomial modulo an irreducible

polynomial P(x) that generates the ﬁnite ﬁeld.

C(x) ≡ C(x) mod P(x) (2)

From equation 2 it is clear that the reduced polyno-

mial can be computed by traditional long division

technique for polynomials. But this method is iter-

ative in nature and requires up to m− 1 iterations.

At this point let us digress a little and consider

the aspect of ﬂexibility regarding reduction opera-

tion over ﬁnite ﬁelds. There are two major factors

that govern ﬂexibility of a reduction method: the or-

der of the ﬁnite ﬁeld and the irreducible polynomial

that generates the ﬁnite ﬁeld. A ﬂexible reduction

method/system should be capable of operating over

ﬁnite ﬁelds of arbitrarily large order. Such a solution

should also be versatile enough to handle all possible

irreducible polynomial for any given ﬁeld order. A

purely hardware approach (Peter et al., 2007; Saqib

et al., 2004) to support arbitrarily ﬂexible reduction

cannot be employed since a hardware solution cannot

be used for ﬁnite ﬁeld beyond a certain range. More-

over supporting all possible irreducible polynomials

even upto a speciﬁed ﬁeld order will immensely in-

crease the complexity of the hardware. A purely soft-

ware implementation is capable of delivering the de-

sired ﬂexibility, but poor performance of such an im-

plementation may make it highly inefﬁcient over very

large ﬁelds. In order to cope with this, it is neces-

sary to develop hybrid solutions. In a hybrid solution

the data-path of the core computations are realized as

fast hardware kernels and the control-path to invoke

and cascade the hardware kernels is realized using a

thin layer of software. Such coexistence of hardware

and software necessitates some kind of a protocol to

govern the communication between the two domains.

One of the most important aspects of such a proto-

col is the data-granularity of the hardware kernels.

Data-granularity determines the amount of data that

can be processed by the individual hardware kernels

at any time. In architecture terminology, this granu-

larity translates to word-length. Transport latency of

data and metadata in such hybrid systems is strongly

A METHOD FOR FLEXIBLE REDUCTION OVER BINARY FIELDS USING A FIELD MULTIPLIER

dependent on data-granularity. In order to minimize

(or even hide) the transport latency, it is preferable to

deploy coarse-grain hardware kernels in hybrid sys-

tems. However, it should be noted that, higher granu-

larity implies increased complexity of the hardware

kernels. Therefore it is essential to ﬁnd a balance

in hardware complexity versus data-granularity to de-

sign optimized hybrid systems.

With this background let us get back to the case

of reduction operation. As mentioned before reduc-

tion involves a series of basic arithmetic and logical

operations. The order of the ﬁnite ﬁeld inﬂuences the

implementation of these basic operations on a coarse-

grain hybrid system. As mentioned before, an ele-

ment of GF(2

) can be represented as an m-bit wide

binary string. Let the word length of a certain coarse-

grain system be w. If m 6 w, then the number can be

represented within a single word. In such a situation,

it is feasible to develop a m× m multiplier and reduc-

tion operation can be integrated with the multiplier

itself. Such a multiplier requires three inputs to oper-

ate, two numbers to be multiplied and an irreducible

polynomial for reduction of the product. Equation

3 describes multiplication of A(x) and B(x) belong-

ing to GF(2

), which is generated by the irreducible

polynomial P(x).

C(x) = (A(x) × B(x)) mod P(x)

= (A(x) × (Σ

m−1

i=0

)) mod P(x)

= Σ

m−1

i=0

(A(x)x

mod P(x)) (3)

Equation 3 describes the operation of a traditional

shift-and-add multiplier. Note that, the shifted multi-

plicands of the form A(x)x

are reduced at each stage.

So the m − 1 iterations of the modulo operation are

embedded in each stage of the multiplier.

On the other hand, if m > w, the element of

the ﬁnite ﬁeld can be represented using ⌈

⌉ words.

Therefore the direct multiplication-and-reduction ap-

proach cannot be applied. Under such circumstances,

it becomes imperative to employ a software algo-

rithm to break the m-bit operations into w-bit oper-

ations. Multiplication of two m-bit polynomials pro-

duces a 2m − 1-bit result which needs to be reduced

separately. Algorithms such as the Karatsuba-Ofman

(Karatsuba and Ofman, 1963) algorithm can be ap-

plied iteratively to perform the aforementioned multi-

plications. The 2m−1-bit result can be reducedby the

Repeated Multiplication Reduction (RMR) method

(Eberle et al., 2003; Satoh and Takano, 2003). The

RMR method and Barrett Reduction (Barrett, 1987)

method are the most suitable techniques for ﬂexible

reduction. A brief description of the RMR method is

reproduced here from (Eberle et al., 2003). Let C

(x)

be the product of two polynomials of degree less than

m. The degree of C

(x) is less than 2m− 1 and it can

be split into two parts as shown in equation 4.

(x) = C

h,0

(x)x

+ C

l,0

(x) (4)

The RMR method is an iterative technique and the

subsequent polynomials are computed by the equa-

tion 5.

j+1

(x) = C

h, j

(x)(P(x) − x

) + C

l, j

(x)

until C

h, j+1

(x) = 0 ⇔ deg(C

j+1

(x)) 6 m− 1 (5)

The RMR method requires m iterations and each it-

eration involves multiplication of C

h, j

(x) with P(x) −

. This polynomial multiplication can be realized as

a set of m left shift operations. However, the most

commonly used irreducible polynomials are usually

trinomials and pentanomials. This implies, each of

the multiplication involves no more than ﬁve left shift

operations. Moreover deg(P(x) − x

) <

. For

such classes of polynomials, the RMR method con-

verges after only two iterations (Peter et al., 2007). In

(Knezevic et al., 2008) the authors have presented an

adaptation of the famous Barrett Reduction method

for binary ﬁelds. In section 2.2 we analyze the RMR

and Barrett reduction method and establish an equiv-

alence between the two.

2.2 Barrett Reduction and the RMR

Method

We reproduce the adaptation of Barrett reduction

from (Knezevic et al., 2008) to compare with the

RMR method for irreducible polynomials with the

following property: deg(P(x) − x

) <

. Let us

consider the RMR method ﬁrst. Let C

(x) be the

product of two polynomials that needs to be reduced.

P(x) = x

+ x

+ ··· + x

+ 1 be the irreducible poly-

nomial. Note that, for the RMR method to converge

within two iterations, k should be less than

. Using

equation 5 we reduce the polynomial C

(x) as shown

in equation 6.

(x) = C

h,0

(x)x

+ C

l,0

(x)

(x) = C

h,0

(x)(P(x) − x

) + C

l,0

(x)

= C

h,1

(x)x

+ C

l,1

(x)

whereC

h,1

(x) = C

h,0

(x)(P(x) − x

)divx

and C

l,1

(x) = C

h,0

(x)(P(x) − x

)modx

l,0

(x)

(x) = C

h,1

(x)(P(x) − x

) + C

l,1

(x)

(6)

Clearly, deg(C

h,1

(x)) 6 k and therefore deg(C

(x)) <

m. Now, let us consider the Barrett Reduction method

SECRYPT 2011 - International Conference on Security and Cryptography

for the same irreducible polynomial. Barrett Reduc-

tion involves computation of three quotients Q

(x),

(x) and Q

(x) along with two remaindersR

(x) and

(x) as shown in equation 7. The ﬁnal result is given

by the remainder polynomial R(x).

(x) = C

(x) div x

= C

h,0

(x)

(x) = Q

(x)P(x)

(x) = Q

(x) div x

= C

h,0

(x)(x

+ x

+ ··· + x

+ 1) div x

= C

h,0

(x) +C

h,1

(x)

(x) = C

(x) mod x

= C

l,0

(x)

(x) = Q

(x)P(x) mod x

= Q

(x)(x

+ ··· + x

+ 1) mod x

= C

h,0

(x)(x

+ ··· + x

+ 1)modx

+ C

h,1

(x)(x

+ ··· + x

+ 1)modx

R(x) = R

(x) + R

(x)

= C

h,1

(x)(P(x) − x

) +C

l,1

(x) (7)

From equations 6 and 7 it is evident that both the

methods are equivalent and both of them require mul-

tiplication of m-bit polynomials. Note that, the mod-

ulo and division operations in the two methods trans-

late to partitioning of the polynomials into lower and

higher half and therefore do not require any arithmetic

operation. In section 2.3 we present a method for per-

forming the aforementioned multiplications in order

to achieve arbitrary ﬂexibility in reduction.

2.3 Multiplication Operations in

Reduction

From equations 6 and 7 we observe that multiplica-

tions of the form C(x)(x

+ ··· + x

+ 1) form the

core of the computations. Therefore it is necessary

to accelerate these multiplications in order to perform

fast reduction. It should also be noted that the only

other operations involved in reduction are addition

over GF(2

). Since there is no carry involved in ad-

dition, addition of two m-bit polynomials which span

more than one word in a w-bit architecture can be re-

alized as ⌈

⌉ w-bit XOR operations. Multiplication

on the other hand requires multi-word shift and ac-

cumulation of results. Consider the two polynomials

C(x) and P

′

(x) of degree m and k respectively. These

polynomials can be represented in a w-bit architecture

as a collection of m

and m

w-bit words respectively.

Equation 8 shows the representation.

C(x) = Σ

−1

i=0

(x)x

where m

′

(x) = Σ

−1

j=0

′

(x)x

where m





(8)

(x)x

and P

(x)x

denote the i-th and j-th words

of the polynomials C(x) and P

′

(x) respectively. The

product of these two polynomials can be computed as

follows:

′

(x) = C(x)P

′

(x)

= Σ

−1

j=0

C(x)P

′

(x)x

= Σ

−1

j=0

(Σ

−1

i=0

(x)x

′

(x)x

) (9)

A closer look at equation 9 reveals that computa-

tion of C

(x)P

′

(x) involves computations of the form

(x)x

. Each of the individual words like C

′

i, j

(x) (re-

fer to ﬁgure 1) in the product of the entire polynomial

C(x) and x

can be computed as follows:

′

i, j

= (C

≪ r | C

i−1

≫ (w− r)) (10)

The individual words like C

′

i, j

(x) in the product of

C(x) and P

′

(x) can be expressed as given by equation

11.

′

i, j

= ⊕

w−1

r=0

≪ r | C

i−1

≫ (w− r))p

′

j,r

(11)

Note that, p

′

j,r

denotes the r-th term in the j-th word

of the polynomial P

′

(x) in equation 11. The opera-

tions of equation 11 can be repeated for each of the

words in P

′

(x) to compute the ﬁnal result. Note that

the product of C(x) and each of the words in P

′

(x)

is m

+ 1 word wide. Henceforward we will refer to

products of C(x) with the individual words of P

′

(x)

as “partial products”. It should be noted that these

+ 1 word wide partial products need to be aligned

to proper word boundaries before they can be added

together to produce the ﬁnal result. Figure 1 shows

how the partial products are aligned.

2.4 A Modiﬁed Interleaved Galois Field

Multiplier as a Hardware Assist for

Reduction

The discussion in section 2.3 makes it clear that a re-

duction method is only as fast as the underlying multi-

plication operations. Therefore it is obviousthat poly-

nomial multiplication kernels are the candidates for

acceleration in a crypto-system. The simplest way of

accelerating a w × w polynomial multiplication is to

introduce a w-bit polynomial multiplier that produces

A METHOD FOR FLEXIBLE REDUCTION OVER BINARY FIELDS USING A FIELD MULTIPLIER

m −1,m −1

m ,m −1

0,m −1

C(x)P’ (x)x

m −1

(m −1)w

m −1,1

m ,1

C(x)P’ (x)x

1,1

0,1

m −1,0

m ,0

C(x)P’ (x)

1,0

0,0

m −1

C(x)

Figure 1: Arrangement of Partial Products.

w bit

(w−r) bit

w bit

Extra Logic Introduced

Accumulated result

w bit

Shift Operation

Multiplicand

after (r−1) stages

Multiplier[r]

Accumulated result

after (r−1) stages

Select

Mode

Irreducible Polynomial/

Multiplicand

Lower Word of

Operation

Accumulation

Reduction Operation

Multiplicand after

after r stages

Figure 2: One stage of the Modiﬁed IGF Multiplier.

2w-bit results. Therefore each word in the input poly-

nomial C(x) produces a pair of words and these pairs

need to be added (i.e. XORed) with proper alignment

to compute a partial product.

In this section we propose a technique for combin-

ing the addition operations with the polynomial mul-

tiplications. Instead of considering one word of the

polynomial C(x) we focus on one word of the partial

product (i.e. C

′

i, j

(x)). It is evident from equation 11

that to produceC

′

i, j

(x) two words from the polynomial

C(x) and one word from P

′

(x) are necessary. Thus

the intended operation can be described as a 2w × w

polynomial multiplication that produces a w-bit re-

sult. In this section we show that an Interleaved Ga-

lois Field (IGF) Multiplier (Hinkelmann et al., 2009)

can be modiﬁed to support this type of multiplica-

tions. In a shift-and-add IGF multiplier, the multi-

plicand operand is successively left shifted and the

multiplier operand is used to selectively accumulate

the results of the left shift operations. The IGF mul-

tiplier always produces a reduced result. Reduction

over large ﬁelds however, requires support for multi-

plication of polynomials where the result is kept unre-

duced. This can be achieved by setting the irreducible

polynomial to all zeros. This is achieved by mask-

ing the irreducible polynomial input to each stage of

the multiplier with a one bit control signal (Mode Se-

lect signal in ﬁgure 2). In order to emulate the oper-

ations described in equation 11 the MIGF multiplier

inserts the (w− r)-th bit from the second multiplicand

operand to the LSB of the ﬁrst multiplicand at the r-th

stage of the multiplier. This is enabled by introducing

a single AND gate that drives the LSB of the shifted

polynomial. As can be seen from ﬁgure 2, we use the

inverted control signal to mask the (w−r)-th bit from

the second multiplicand operand. This added hard-

ware (shown inside the shaded rectangle in ﬁgure 2)

enables the multiplier to perform two-word shift op-

erations successively which in turn alleviates the need

for adding the individual products of the multiplier to

form the partial product. In the section 3 we discuss

the reduction in instruction-count of the reduction op-

eration using this MIGF multiplier and the ﬂexibility

that this technique offers.

3 PERFORMANCE AND

FLEXIBILITY OF THE

PROPOSED REDUCTION

TECHNIQUE

In this section we analyze the reduction method de-

scribed in section 2.3 to evaluate the improvement in

performance of the method with an MIGF multiplier

as a hardware assist. We also show that this technique

is arbitrarily ﬂexible in terms of ﬁeld order and choice

of irreducible polynomial.

3.1 Performance Improvement due to

use of Modiﬁed IGF Multiplier

In this section we analyze the beneﬁts of using

the MIGF multiplier for multiplication of C(x) with

SECRYPT 2011 - International Conference on Security and Cryptography

′

(x). We proceed by ﬁrst considering the case of

multiplication of the entire C(x) polynomial with just

one word of P

′

(x) and continue the analysis to com-

plete multiplication of C(x) with entire P

′

(x).

3.1.1 Multiplying C(x) with One Word of P

′

(x)

From equation 10 it is evident that each m

-word shift

operation corresponding to a single term in the poly-

nomial P

′

(x) translates to 2m

shift operations and

− 1 logical concatenations. Note that the concate-

nations can be conveniently expressed as either logi-

cal OR operations or logical XOR operations. There-

fore the total number of logical operations necessary

to produce the result of the multi-word multiplication

as described in equation 11 is determined by the num-

ber of terms other than 1, present in the word of the

irreducible polynomial P

′

(x). Assuming there are p

terms present in P

′

(x), the number of logical shifts is

p and the number of concatenation operations is

− 1)p. Note that these operations are required to

produce the shifted polynomials of the form C(x)x

for different values of r < w. In order to accumu-

late these shifted polynomials of the form C(x)x

is necessary to perform at most p

′

− 1 logical XOR

operation for each word of the shifted polynomials

where p

′

is total the number of terms present in P

′

(x)

(including a 1 if any). Since the shifted polynomi-

als span across (m

+ 1) words, the total number of

XORs necessary is (m

+1)(p

′

−1) to accumulate the

shifted polynomials. Thus the total number of basic

logic operations necessary to compute C(x)x

is 2m

shift operations and (m

−1)p+(m

+1)(p

′

−1) log-

ical XORs. Using a polynomial multiplier to produce

the partial product requires m

multiplications and

− 1 XOR operations. Using the MIGF multiplier

for this operation requires m

+ 1 two-word multipli-

cations i.e one extra multiplication for m

− 1 XOR

operations. Thus we have reduced the total number

of arithmetic and logical operations by approximately

4p times, when compared to a purely software real-

ization.

3.1.2 Multiplying C(x) with Entire P

′

(x)

So far we have considered multiplication of the poly-

nomial C(x) with one word of the irreducible poly-

nomial P

′

(x). With the analysis of the previous para-

graph as the basis let us compute the number of oper-

ations involved in realizing the entire multiplication

operation described in equation 9. The number of

shift operations involved is determined by the num-

ber of terms with distinct indices present in the poly-

nomial P

′

(x). In a w-bit architecture C(x)x

tw+r

computed by simply appending t words ﬁlled with

zeros to the right of C(x)x

. Therefore the terms of

′

(x) with indices tw + r are equivalent to one an-

other. Assuming that there are p terms with distinct

indices present in the irreducible polynomial, the to-

tal number of shift operations necessary is given by

p. The number of concatenation operations is

− 1)p. However, the total number of XOR opera-

tions required for accumulation of these shifted poly-

nomials is determined by the number of terms (with

distinct and equivalent indices) present in P

′

(x). In

order to calculate the total number XOR operations

for accumulation it is necessary to examine the candi-

dates for accumulation. Let us denote the number of

terms present in P

′

(x) by p

. Therefore the accumu-

lation of the shifted polynomials of the form C(x)x

produced by these p

terms require (p

− 1)(m

+ 1)

XOR operations. It should be noted that if p

= 0 for

any particular word P

′

(x), no XOR operation is nec-

essary. For simplicity let us assume p

> 0 for all j.

Total number of basic arithmetic-logic instructions to

produce all the polynomials of the form C(x)P

′

(x) is

shown in equation 12

#SHIFT = 2m

#XOR = (m

− 1)p+ Σ

−1

j=0

+ 1)(p

− 1)

(12)

It should be noted that, the number of XOR operations

required to add results produced by different words of

the polynomial P

′

(x) remains unaltered irrespective

of whether the MIGF multiplier is used or not. Thus

we have intentionally not considered such XOR op-

erations in counting the total number of XOR opera-

tions.

Using the MIGF multiplier reduces the number of

operation required to perform the same set of oper-

ations. Note that, an intelligent sequencing of mul-

tiplication operations is necessary to minimize the

number of multiplications. Sequencing of multipli-

cation operation can be done by examining the irre-

ducible polynomial. A set of m

+ 1 multiplications

are necessary to produce a term like C(x)P

′

(x). How-

ever, it should be noted that this set of multiplica-

tions need to be performed for words of the polyno-

mial P

′

(x), with at least one term present. Therefore

the maximum number of such multiplications neces-

sary is (m

+ 1)m

. Clearly (m

+ 1)m

< 2m

p +

− 1)p + Σ

−1

j=0

+ 1)(p

− 1). Let us take this

comparison a little further by making a set of assump-

tions. Let us assume that on an average p

′

terms are

present in each of the words that constitute the irre-

ducible polynomial. In that case the total number of

basic arithmetic-logic operations involvedcan be sim-

pliﬁed to 2m

p+(m

+1)(p

′

−1)m

. Using a conven-

tional w× w polynomial multiplier will require m

A METHOD FOR FLEXIBLE REDUCTION OVER BINARY FIELDS USING A FIELD MULTIPLIER

multiplications and (m

− 1)m

additional XOR op-

erations. Using the MIGF multiplier brings down the

total number of operations to (m

+ 1)m

. Assum-

ing m

is large enough so that m

≈ m

+ 1 the re-

duction in operation count is 2p/m

+ (p

′

− 1) times

when compared to a purely software realization and

2× compared to hybrid realization using a conven-

tional polynomial multiplier.

3.2 Flexibility of the Reduction Method

In this section we will analyze the ﬂexibility of the

reduction method. As discussed in section 2.3 the re-

duction operation is realized as series of polynomial

multiplications. The number of individual multiplica-

tion operationsis determined by two factors: the order

of the ﬁnite ﬁeld in consideration (2

) and the word-

length of the architecture (w). The RMR method of

reduction is ﬂexible by nature, since it does not im-

pose any restriction on the order of the ﬁnite ﬁeld or

the nature of the irreducible polynomial P(x). The

Barrett reduction method, which is be shown to be

a special case of RMR method imposes the restric-

tion deg(P(x) − x

) <

on the irreducible polyno-

mial P(x), in order to improve performance. The el-

liptic curves suggested by NIST follow this restriction

and therefore only two iterations of multiplications

are sufﬁcient for the result to converge. We evaluated

the decrease in number of instructions brought about

by usage of an MIGF multiplier as a hardware as-

sist for multi-word multiplication in section 3.1. This

analysis is completely general in nature and we have

not made any assumption regarding the nature of the

irreducible polynomial. Therefore the speed-up we

computed applies in general to reduction with any ar-

bitrary irreducible polynomial.

3.3 Hardware Complexity of the

Modiﬁed IGF Multiplier

As shown in ﬁgure 2, we introduced a set of two-input

AND gates in each stage of the MIGF Multiplier to

enable two-word shift operations. In a w-bit instance

of the multiplier, two sets of w two input AND gates

are introduced. The ﬁrst set of w two input AND gates

are used for masking the irreducible polynomial input

to the multiplier to zero. The second set of w two in-

put AND gates are used for enabling two-word shift

operation. This increase in hardware complexity is

compensated by the signiﬁcant reduction in the num-

ber of operations brought about by using this multi-

plier as a hardware assist for reduction.

Table 1: Synthesis Results of a IGF Multiplier and an MIGF

Multiplier.

Type of Multi-

plier

Area in µm

Max. Operating

Freq. in MHz

IGF 42228 270

Multiplier

MIGF 42255 256

Multiplier

Table 2: NIST recommended Irreducible Polynomials.

Size Recommended Irreducible Polynomial

163 x

163

+ x

+ 1

233 x

233

+ x

+ 1

283 x

283

+ x

+ 1

409 x

409

+ x

+ 1

571 x

571

+ x

+ 1

4 RESULTS

In this section we present the synthesis results of a 32-

bit MIGF multiplier and evaluate the improvement in

performance of reduction operations over the NIST

curves using the MIGF multiplier.

4.1 Synthesis Results of a 32-bit

Modiﬁed IGF Multiplier

We implemented a 32-bit instance of an MIGF multi-

plier using verilog HDL and synthesized with Faraday

Tech 90nm standard performance library, using Syn-

opsys Design Vision. We compared the increase in

area and drop in maximum operating frequency (due

to addition of 2×32 extra two input AND gates) with

a IGF multiplier synthesized using same parameters.

The comparison is presented in table 1. Since these

results are not post-layout results, they are not accu-

rate, but are indicative of the fact that increase in hard-

ware complexity of the IGF multiplier for enabling

2w× w multiplication, is marginal.

4.2 Performance Improvement of

Reduction over NIST Curves

Table 2 lists the NIST recommended irreducible poly-

nomials over binary ﬁelds of different orders. We

evaluate the improvement in performance of reduc-

tion operation over these ﬁelds using a 32-bit MIGF

multiplier. Note that each of the polynomials ad-

here to the restriction deg(P(x) − x

) <

. Therefore

only two iterations of multiplications are sufﬁcient for

completion of the reduction operation.

SECRYPT 2011 - International Conference on Security and Cryptography

Let us consider the ﬁrst polynomial in the list

163

+ x

+ 1. Since deg(P(x) − x

163

) < 31,

all the terms in P

′

(x) ﬁt within one 32-bit word. Be-

low we evaluate the number of operations involved

in the two iterations of reduction using the aforemen-

tioned polynomial.

• Iteration One:

′

(x) = x

+ x

+ 1

Number of words in C

h,0

(x) is m



163



= 6

Number of words in P

′

(x) is m





= 1

Number of terms present in P

′

(x) other than 1 is

p = 3

Total number of terms present in P

′

(x) is p

′

= 4

#SHIFT = 2m

p = 36

#XOR = (m

− 1)p+ (m

+ 1)(p

′

− 1) = 36

#Multiplications using a conventional Polyno-

mial Multiplier is m

= 6

#XOR using a conventional Polynomial Multi-

plier is (m

− 1) = 5

#Multiplications using an MIGF Multiplier is

given by m

+ 1 = 7

• Iteration Two:

Number of words in C

h,1

(x) is m





= 1

Number of words in P

′

(x) is m





= 1

#SHIFT = 2m

p = 6

#XOR = (m

− 1)p+ (m

+ 1)(p

′

− 1) = 6

#Multiplications using a conventional Polyno-

mial Multiplier is m

= 1

#XOR using a conventional Polynomial Multi-

plier is (m

− 1) = 0

#Multiplications using an MIGF Multiplier is

given by m

+ 1 = 2

Now let us consider the second polynomial x

233

+ 1 from table 2. Clearly (P(x)−x

233

) spans mul-

tiple words in a 32-bit environment. Below we evalu-

ate the number of operations involved in the two iter-

ations of reduction using the aforementioned polyno-

mial.

• Iteration One:

′

(x) = x

+ 1

Number of words in C

h,0

(x) is m



233



= 8

Number of words in P

′

(x) with at least one non-

zero term is m

= 2

Number of distinct terms present in P

′

(x) other

than 1 is p = 1

Total number of terms present in individual words

of P

′

(x) is p

′

= 1 and p

′

= 1

#SHIFT = 2m

p = 16

#XOR = (m

− 1)p+ Σ

+ 1)(p

′

− 1) = 14

#Multiplications using a conventional Polyno-

mial Multiplier is m

= 16

#XOR using a conventional Polynomial Multi-

plier is (m

− 1)m

= 14

Table 3: Number of Operations involved in reduction over

various NIST curves using three different techniques.

Size ITRERATION I ITERATION II

163

#SHIFT 36 #SHIFT 6

#XOR 36 #XOR 6

POLY

#MULT 6 #MULT 1

#XOR 5 #XOR 0

MIGF #MULT 7 #MULT 2

233

#SHIFT 16 #SHIFT 6

#XOR 6 #XOR 2

POLY

#MULT 16 #MULT 6

#XOR 14 #XOR 4

MIGF #MULT 18 #MULT 8

283

#SHIFT 54 #SHIFT 6

#XOR 54 #XOR 6

POLY

#MULT 9 #MULT 1

#XOR 8 #XOR 0

MIGF #MULT 10 #MULT 2

409

#SHIFT 26 #SHIFT 6

#XOR 12 #XOR 2

POLY

#MULT 26 #MULT 6

#XOR 24 #XOR 4

MIGF #MULT 28 #MULT 8

571

#SHIFT 108 #SHIFT 6

#XOR 108 #XOR 6

POLY

#MULT 18 #MULT 1

#XOR 17 #XOR 0

MIGF #MULT 19 #MULT 2

#Multiplications using an MIGF Multiplier is

given by (m

+ 1)m

= 18

• Iteration Two:

Number of words in C

h,1

(x) is m





= 3

Number of words in P

′

(x) with at least one non-

zero term is m

= 2

#SHIFT = 2m

p = 6

#XOR = (m

− 1)p+ Σ

+ 1)(p

′

− 1) = 2

#Multiplications using a conventional Polyno-

mial Multiplier is m

= 6

#XOR using a conventional Polynomial Multi-

plier is (m

− 1)m

= 4

#Multiplications using an MIGF Multiplier is

given by (m

+ 1)m

= 8

Similarly we evaluated the number of operations

in each of the iterations of reduction operation over

the NIST curves. The numbers of operations are listed

in table 3. The ﬁelds SW, POLY and MIGF refer to

implementation of reduction using pure software al-

gorithm, a conventional polynomial multiplier and an

MIGF multiplier respectively. From table 3 it is evi-

dent that total number of operations is the least when

using an MIGF multiplier as hardware accelerator for

reduction.

We present the instruction count in reduction over

the various NIST curves using the three differenttech-

niques in ﬁgure 3. It is evident form ﬁgure 3 that

the advantage of using hardware assists for reduc-

A METHOD FOR FLEXIBLE REDUCTION OVER BINARY FIELDS USING A FIELD MULTIPLIER

163 233 283 409 571

100

150

200

250

Size

Instruction Count

Instruction Count of Reduction Operation

Software Implementation

Hybrid Implementation Using Conventional Polynomial Multiplier

Hybrid Implementation Using Modified IGF Multiplier

Figure 3: Instruction count of Reduction Operation in three

different implementations.

tion is prominent when the there are large number

of terms present in the irreducible polynomial. This

is attributed to the fact that hardware assists in the

form of multipliers combine a number of shift and

XOR operations into a single multiplication. If there

are less number of terms present in P

′

(x), the effects

of this combination is less prominent. In fact, as in

the case of the two polynomials x

409

+ x

+ 1 and

233

+ x

+ 1, the hybrid technique using a polyno-

mial multiplier may perform worse than a simple soft-

ware realization for certain irreducible polynomials.

However, the absence of XOR operations in form-

ing the partial products makes the proposed technique

(using an MIGF multiplier) perform better than both

the other techniques, for all the NIST polynomials.

5 CONCLUSIONS

In the context of efﬁcient realization of elliptic Curve

Cryptography algorithms, we recognized the impor-

tance of an efﬁcient and ﬂexible solution for reduction

operations over binary ﬁelds. In this paper we pre-

sented a method for ﬂexible reduction. The method

is especially suitable for coarse-grained platforms

where, granularity of data and operations play a major

role in the computations. We identiﬁed that efﬁciency

of the underlying polynomial multiplication opera-

tions determines the speed of reduction algorithms

like the Repeated Multiplication Reduction method or

the Barrett Reduction method. In this context we pro-

posed a design of a polynomial multiplier based on

the well-known Interleaved Galois Field (IGF) mul-

tiplier. This MIGF multiplier is shown to achieve a

signiﬁcant improvement in throughput over a purely

software realization or a hybrid implementation using

a conventional polynomial multiplier.

REFERENCES

Barrett, P. (1987). Implementing the rivest shamir and adle-

man public key encryption algorithm on a standard

digital signal processor. In Odlyzko, A., editor, Ad-

vances in Cryptology CRYPTO 86, volume 263 of

Lecture Notes in Computer Science, pages 311–323.

Springer Berlin / Heidelberg. 10.1007/3-540-47721-

7 24.

Eberle, H., Gura, N., Shantz, S. C., and Gupta, V. (2003).

A cryptographic processor for arbitrary elliptic curves

over GF(2

). Technical report, Mountain View, CA,

USA.

Hinkelmann, H., Zipf, P., Li, J., Liu, G., and Glesner, M.

(2009). On the design of reconﬁgurable multipliers

for integer and galois ﬁeld multiplication. Micropro-

cessors and Microsystems - Embedded Hardware De-

sign, 33(1):2–12.

Karatsuba, A. and Ofman, Y. (1963). Multiplication of

multidigit numbers on automata. Soviet Physics—

Doklady, 7(7):595–596.

Knezevic, M., Sakiyama, K., Fan, J., and Verbauwhede, I.

(2008). Modular reduction in GF(2

) without pre-

computational phase. In von zur Gathen, J., Ima˜na,

J. L., and C¸etin Kaya Koc¸, editors, WAIFI, volume

5130 of Lecture Notes in Computer Science, pages

77–87. Springer.

Peter, S., Langend¨orfer, P., and Piotrowski, K. (2007). Flex-

ible hardware reduction for elliptic curve cryptogra-

phy in GF(2

). In Lauwereins, R. and Madsen, J.,

editors, DATE, pages 1259–1264. ACM.

Saqib, N. A., Rodriguez-Henriquez, F., and Diaz-Pirez, A.

(2004). A parallel architecture for fast computation of

elliptic curve scalar multiplication over GF(2

). Par-

allel and Distributed Processing Symposium, Interna-

tional, 4:144a.

Satoh, A. and Takano, K. (2003). A scalable dual-ﬁeld ellip-

tic curve cryptographic processor. IEEE Transactions

on Computers, 52:449–460.

SECRYPT 2011 - International Conference on Security and Cryptography