EMPLOYING MULTI-CORE PROCESSOR ARCHITECTURES

TO ACCELERATE JAVA CRYPTOGRAPHY EXTENSIONS

Mario Ivkovic and Thomas Zefferer

Secure Information Technology Center - Austria, Inffeldgasse 16a, Graz, Austria

Keywords:

Java, Cryptography, JCE, Parallelization.

Abstract:

For many years, the increase of clock frequencies has been the preferred approach to raise computational

power. Due to physical limitations and cost-effectiveness reasons, hardware vendors were forced to change

their strategy. Instead of increasing clock frequencies, processors are nowadays supplied with a growing

number of independent cores to increase the overall computational power. This major paradigm shift needs

to be considered in software design processes as well. Software needs to be parallelized to exploit the full

computing power provided by multi-core architectures.

Due to their intrinsic computational complexity, cryptographic algorithms require efﬁcient implementations.

On multi-core architectures this comprises the need for parallelism and concurrent execution. To meet this

challenge, we have enhanced an existing Java

based cryptographic library by parallelizing a subset of

its algorithms. Made measurements have shown speed-ups from 1.35 up to 1.78 resulting from the applied

modiﬁcations. In this paper we show that regardless of their complexity, several cryptographic algorithms

can be parallelized to a certain extent with reasonable effort. The applied parallelization of the Java

based

cryptographic library has signiﬁcantly enhanced its performance on multi-core architectures and has therefore

made a valuable contribution to its sustainability.

1 INTRODUCTION

Increasing the clock frequency of processors has been

the common approach of processor manufactures to

raise the performance of their products for many

years. This way, processors with operating clock

frequencies of up to several GHz have made their

way to the consumer market. A few years ago, this

evolution has ﬁnally taped off when chip manufac-

tures ﬁgured out that a further increase of clock fre-

quency is not cost-effectively achievable any longer

due to several physical limitations. In order to still

guarantee a continuous increase of computing power

for newly developed processors, vendors were forced

to modify their strategy. Instead of increasing the

maximum clock frequency, hardware manufacturers

have started to supply processors with multiple in-

dependent cores. Nowadays, modern processors are

equipped with four, eight, or even more cores, which

provide an increased computational power by pro-

cessing instructions in parallel.

This fundamental change of the design approach

had a signiﬁcant impact on software development pro-

cesses too. On single-core architectures, the perfor-

mance of programs is directly correlated to the speed

of the processor, on which the software is running.

Increasing the clock frequency of the used processor

immediately leads to a speed-up of the particular soft-

ware too. Unfortunately, this is not true for multi-

core processor architectures. Although a processor’s

computing power is theoretically doubled when be-

ing equipped with a second core, most of the exist-

ing software has originally been developed to run on

single-core architectures. Hence, even though addi-

tional computing power is provided by supplementary

processor cores, it cannot be employed by software

that has originally been optimized to run on a single

core.

This problem has been described in an article by

Herb Sutter (Sutter, 2005). He concludes that soft-

ware that wants to make use of the full computing

power provided by multi-core processors needs to be

adapted accordingly. Only if the software assigns in-

dependent computations to different processor cores,

these computations can be executed concurrently and

the entire computing power provided by multi-core

processors can be employed. Unfortunately, writing

efﬁcient and correct parallel programs and paralleliz-

Ivkovic M. and Zefferer T..

EMPLOYING MULTI-CORE PROCESSOR ARCHITECTURES TO ACCELERATE JAVA CRYPTOGRAPHY EXTENSIONS.

DOI: 10.5220/0003339000050012

In Proceedings of the 7th International Conference on Web Information Systems and Technologies (WEBIST-2011), pages 5-12

ISBN: 978-989-8425-51-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

ing existing sequential programs are non-trivial tasks

that are still subject to ongoing research. Especially

the automatic parallelization of existing programs has

been the topic of numerous publications. Tools for the

automated parallelization of sequential source code

are for instance introduced in (Dig et al., 2009) and

(Bridges et al., 2008). Although some of the sug-

gested techniques appear to be promising, an ultimate

solution to this problem has not been found so far.

Parallelization of existing sequential programs is

especially important for software that performs com-

putationally intensive operations like scientiﬁc com-

putations and simulations. Another ﬁeld of applica-

tion where complex computations have to be carried

out frequently is cryptography. Cryptographic algo-

rithms typically include some kind of secret key in

the processing of any given input data. The security

of cryptographic algorithms is usually proportional to

the size of the used key and relies on the fact that try-

ing out all possible key values is computationally in-

feasible within a reasonable period of time. As at-

tacks are becoming more effective - due to the in-

crease of available computational power and also be-

cause of parallelized and distributed approaches - key

sizes have to be increased too in order to preserve the

same level of security.

In general, cryptographic computations become

more time-consuming when key sizes are increased.

Therefore, it is crucial that existing cryptographic

libraries are adapted to utilize the entire comput-

ing power provided by modern multi-core proces-

sors. Only an appropriate parallelization of these li-

braries guarantees that they retain their level of per-

formance with increasing key sizes and remain usable

and future-proof.

Unfortunately, parallelization of cryptographic al-

gorithms is not a trivial task. For instance, consider

the design of the block cipher AES (Daemen and Rij-

men, 2002): a block of plain data is encrypted by ap-

plying the same set of operations for a speciﬁed num-

ber of times. The ﬁrst iteration takes the plain data

as input, while subsequent rounds take the result of

the preceding round as input. Due to these data de-

pendencies between subsequent iterations, a parallel

execution of different rounds is infeasible.

Being aware of possible difﬁculties of paralleliz-

ing cryptographic algorithms, the goal of our work

was to evaluate whether existing cryptographic li-

braries can be optimized for a use on multi-core pro-

cessors. In this work we focused on the program-

ming language Java

mainly because of two rea-

sons. First, a special API for the development of

concurrent programs (introduced by Doug Lea (Lea,

2005)) is available since version 1.5 of the Java

De-

velopment Kit (JDK). The other reason is that we al-

ready had an existing Java

cryptography library on

hand, which was perfectly suitable for our investiga-

tions.

To evaluate the possible performance boost of

cryptographic libraries on multi-core systems, we

have modiﬁed the existing Java

cryptography li-

brary. Section 2 introduces this library in more de-

tail and shows how selected cryptographic algorithms

of the library have been improved to exploit the com-

puting power of multi-core architectures. In order to

compare the performance of the modiﬁed library with

the unmodiﬁed original version, we have conducted

several measurements on different architectures. The

results of these measurements and a summary of the

most important facts and ﬁndings are provided in Sec-

tion 3. Finally, Section 4 concludes this paper and

identiﬁes further conceivable improvements to speed-

up cryptographic operations on multi-core processor

architectures.

2 JCE MODIFICATIONS

In this work we evaluate whether existing crypto-

graphic Java

libraries can be improved in terms

of performance by applying parallelism. Therefore,

this section gives an short introduction to the Java

Cryptography Extension (JCE) technology ﬁrst. Fur-

thermore, this section provides a brief description of

different parallelization methods in Java

and shows

how these methods have been applied to enhance

the performance of three selected cryptographic algo-

rithms.

2.1 Java Cryptography Extensions

Java

Cryptography Extension (JCE) is a frame-

work for cryptographic operations like data encryp-

tion and decryption, key generation and key agree-

ment, message authentication codes (MAC), and

sealed objects. Regarding data encryption and de-

cryption, symmetric as well as asymmetric stream

and block ciphers are supported. Since version 1.4

of Java

, the JCE is integrated into the SDK and no

longer an optional package.

The JCE uses a so-called provider architecture,

which guarantees implementation and, where possi-

ble, algorithm independence. Any signed provider

can be registered in the framework, which ensures that

the provided algorithms and implementations can be

used seamlessly. Furthermore, a provider from SUN

called SunJCE is supplied with the JDK per default.

For our investigations, we have analyzed the JCE

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

provider IAIK

and manually parallelized a subset of

its supported algorithms.

2.2 Parallelization in Java

Java

has been providing built-in features for paral-

lelization from the very beginning. These low-level

APIs are very useful for simple parallelization tasks.

Since version 5.0 of the Java

platform, a high-level

concurrency API is available for more advanced con-

currency tasks. Most of the functionalities are avail-

able in the java.util.concurrent packages. Data

structures for concurrent programming have also been

added to the collections framework.

We have implemented and compared two differ-

ent methods of parallelization in our work. The

ﬁrst approach was the use of Executors from the

java.util.concurrent packages. Executors are

objects that encapsulate the creation and management

of threads from the executed tasks. ExecutorServices

are supplements to Executors and support Callable

objects that can return a value after parallel execution.

The following listing shows an example usage of the

Executors framework.

final ExecutorService ex =

Executors.newFixedThreadPool(2);

ParallelExp exp1 = new ParallelExp(...);

ParallelExp exp2 = new ParallelExp(...);

Future<?> future = ex.submit(exp1);

Future<?> future2 = ex.submit(exp2);

try {

future.get();

future2.get();

}catch (InterruptedException e) {

...

}

result1 = exp1.getResult();

result2 = exp2.getResult();

Functionalities provided in the java.util.concurrent

packages are mainly suitable for coarse-grained par-

allelization. For ﬁne-grained parallelization a new

API, the ForkJoinTask framework

2,3

, has been devel-

oped and will be included in Java

version 7. The

ForkJoinTask framework is well-suited for the paral-

lelization of recursive divide-and-conquer algorithms.

A given complex problem is divided into two or more

subtasks that are then solved in parallel. These sub-

tasks are in turn divided into parallel subtasks and so

http://jce.iaik.tugraz.at/

http://jcp.org/en/jsr/detail?id=166

http://gee.oswego.edu/dl/concurrency-interest/

on. This is repeated until the task is small enough to

be directly solved. The following listing shows how

this can be achieved in Java.

class SortTask extends RecursiveAction {

final long[] a;

final int lo;

final int hi;

SortTask(long[] a, int lo, int hi) {

this.a = a;

this.lo = lo;

this.hi = hi;

}

protected void compute() {

if (hi - lo < THRESHOLD)

sequentiallySort(a, lo, hi);

else {

int mid = (lo + hi) >>> 1;

invokeAll(new SortTask(a, lo, mid),

new SortTask(a, mid, hi));

merge(a, lo, hi);

}

2.3 Applied JCE Parallelizations

The objective of our work was to apply the two men-

tioned parallelization methods to selected algorithms

of the existing IAIK JCE implementation. This JCE

had not been originally designed with paralleliza-

tion in mind. Therefore, our ﬁrst task was to deter-

mine those sections in the sequential code where par-

allelization is possible and where it actually makes

sense.

In general, the parallelization of sequential code

is no trivial task (Peierls et al., 2005). Researchers

try to solve this issue with automatic paralleliza-

tion tools and refactoring engines (e.g. (Dig et al.,

2009)(Rugina and Rinard, 1999)(Freisleben and Kiel-

mann, 1995)). Such tools are especially useful for

large applications with many lines of code where

manual refactoring becomes tedious and error prone.

In the case of cryptographic libraries, these tools are

often less effective. Cryptographic algorithms are

usually designed such that each calculation step de-

pends on the result of the previous step.

Furthermore, cryptographic algorithms often con-

tain numerous simple operations, like shift, add, or

xor. Although these operations could basically be

easily parallelized, the parallelization of such sim-

ple operations can have counter-productive effects re-

garding the performance gain due to parallelization

overhead.

Having these issues in mind, we have examined

the possible performance gain of cryptographic algo-

rithms through parallelization. Therefore, we have se-

lected the commonly used algorithms ’RSA key-pair

EMPLOYING MULTI-CORE PROCESSOR ARCHITECTURES TO ACCELERATE JAVA CRYPTOGRAPHY

EXTENSIONS

generation’, ’RSA cipher’, and ’ECDSA signature

veriﬁcation’ for manual parallelization. For all in-

vestigated algorithms, both parallelization techniques

being described in Section 2.2 have been applied. In

the following subsections we explain how the three

selected cryptographic algorithms have been paral-

lelized.

2.3.1 RSA Key-pair Generation

The ﬁrst cryptographic operation we have improved

in the course of this work was the RSA key-pair gen-

eration. The investigated JCE implements the key

generation algorithm that was published in (Silver-

man, 1997). According to this algorithm, the inves-

tigated JCE executes the following basic steps to gen-

erate all data required for building a CRT (Chinese

Remainder Theorem) compliant RSA key-pair.

1. Compute strong prime p

2. Compute strong prime q

3. Ensure that p is greater than q

4. p

= p − 1

5. q

= q − 1

6. φ = p

∗ q

7. Choose an appropriate public exponent pubExp

8. modulus = p ∗ q

9. privExpt = pubExp

−1

mod φ

10. dP = privExp mod p

11. dQ = privExp mod q

12. coe f = q

−1

mod p

After completion of these computation steps, all

data required to build an RSA key-pair are available.

A breakdown of the sketched algorithm reveals that

several major computation steps are independent and

hence can be scheduled in parallel. In more speciﬁc

terms, this applies to Step 1 and Step 2, Step 4 and

Step 5, as well as to Step 10 and Step 11. Obviously,

the two independent steps 4 and 5 consist of trivial

computations only. Hence, it can be expected that a

parallelization of these two steps would not increase

the algorithm’s performance signiﬁcantly due to the

inherent parallelization overhead.

In order to parallelize the given RSA key-pair gen-

eration algorithm, we have therefore put the focus on

the computation of the two strong primes p and q,

and on the derivation of the values dP and dQ. We

have re-implemented the given algorithm by apply-

ing the two parallelization methods that have been in-

troduced in Section 2.2. This way, the performance

of the JCE’s RSA key-pair generation algorithm has

been increased signiﬁcantly. More detailed informa-

tion about the achieved performance enhancements

are provided in Section 3 of this paper.

2.3.2 RSA Cipher

After successfully enhancing the performance of the

RSA key-pair generation algorithm we have analyzed

the RSA cipher algorithm. If possible, the RSA im-

plementation of the investigated JCE uses the Chinese

Remainder Theorem (CRT) to speed up the execution

of RSA encryption and decryption operations. The

following computation steps are executed by the JCE

to encrypt a given plain text message m with a given

private key using the Chinese Remainder Theorem.

1. c

= m mod p

2. c

= c

mod p

3. c

= m mod q

4. c

= c

mod q

5. c

= (c

− c

) ∗coe f

6. c

= c

mod p

7. c = (c

∗ q)+ c

After completion of these computation steps, the

obtained result c represents the input data m being

RSA encrypted with the given private RSA key. A

breakdown of the sketched computation steps turns

out that Step 1 and Step 2 can be processed in parallel

to Step 3 and Step 4. Again, we have modiﬁed the

existing JCE in order to take advantage of the local-

ized potential for parallelization. Detailed informa-

tion about the performance improvements that have

resulted from modiﬁcations of the RSA cipher algo-

rithm are provided in Section 3.

2.3.3 ECDSA Signature Veriﬁcation

ECDSA signature veriﬁcation was the third JCE algo-

rithm that has been investigated in the course of this

work. Based on elliptic curve cryptography, ECDSA

allows for much smaller key sizes compared to the

conventional DSA algorithm and is therefore enjoy-

ing increased popularity. For our investigations we

have put the focus on the ECDSA signature veriﬁca-

tion of a message m. To verify a given ECDSA signa-

ture consisting of the pair (r, s) with the given public

key Q

, the investigated JCE executes the following

computation steps.

1. Check if both, r and s are integers in the interval [1, n −

1] for n being the order of the curve’s base point G

2. e = HASH(m)

3. c = s

−1

mod n

4. u

= (e ∗ c) mod n

5. u

= (r ∗ c) mod n

6. Compute point (x

, x

) = u

∗ G + u

∗ Q

7. If r = x

mod n, the given ECDSA signature is valid

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

It is apparent that for instance Step 4 and Step

5 could be executed in parallel as these computa-

tion steps are completely independent. However, the

mathematical operations being executed in these steps

are not very complex. Hence, parallelization of these

steps would not increase the algorithm’s performance

signiﬁcantly. The computationally most intensive op-

eration is actually executed in Step 6. Hence, we have

split this computation step into two independent com-

putations r

= u

∗ G and r

= u

∗ Q

. The results of

these computations are subsequently added in order

to retrieve the ﬁnal result (x

, x

) = r

+ r

. Since the

computations of the intermediate results r

and r

are

independent, they can again be executed in parallel.

Due to the parallelization of these computation

steps, the overall performance of the JCE’s ECDSA

signature-veriﬁcation algorithm could be improved

signiﬁcantly. Detailed information about the gained

speed-up is provided in Section 3 of this paper.

2.3.4 Scalability Considerations

The overall goal of parallelization is to divide a given

computational problem into several sub tasks and ex-

ecute these tasks concurrently on different cores. As

the provided Java

APIs do not make any limitations

regarding the number of existing cores, the achiev-

able speed-up theoretically grows linearly with the

number of available cores. However, in practice the

achievable speed-up actually depends on the paral-

lelized Java

source code.

In all investigated algorithms, only two steps were

executable in parallel. Hence, the applied JCE en-

hancements are especially suitable for processor ar-

chitectures with two cores. Nevertheless, further po-

tential for parallelization could probably be found on

other levels of abstraction. However, as scalability

was not the main objective of our activities, further

optimizations of the JCE in terms of scalability are

regarded as future work.

3 PERFORMANCE ANALYSIS

The basic objective of this work was to evaluate

whether the performance of the investigated JCE can

be improved by employing multi-core architectures.

Therefore, implementations of three different crypto-

graphic algorithms have been manually revised and

parallelized. Details about the applied modiﬁcations

of the investigated JCE have been provided in Section

2. In a subsequent step, several tests have been con-

ducted in order to measure the effective speed-up that

has been gained from the applied modiﬁcations. The

measurement framework and the different measure-

ment environments that have been used for these tests

are introduced in this section. Furthermore, this sec-

tion illustrates the obtained results of the performance

analysis process and discusses basic ﬁndings.

3.1 Measurement Framework

The aim of the performed measurement series was

to measure the efﬁciency of the applied JCE paral-

lelization and the achievable computational speed-

up. To guarantee meaningful measurement results,

a common measurement framework has been devel-

oped. This framework has then been used to evaluate

improvements of different cryptographic algorithms

and to appropriately format the collected measure-

ment data.

Figure 1: Measurement Setup.

Fig. 1 shows the general measurement setup, on

which all measurement runs have been based on. The

core element of the setup is the developed measure-

ment framework, which provides a user interface,

through which measurement runs can be manually

controlled. The measurement framework itself has

access to three different instances of the investigated

JCE. The instance ’IAIK JCE Sequential’ represents

an unmodiﬁed default release of the JCE library and

acts as reference module. Measurements on modiﬁed

instances of the JCE are compared to measurements

on this reference implementation in order to evaluate

the efﬁciency of different modiﬁcations. The two JCE

instances ’IAIK JCE Parallel (Impl. 1)’ and ’IAIK

JCE Parallel (Impl. 2)’ comprise different versions of

parallelized cryptographic algorithms. In order to al-

low meaningful comparisons between the three avail-

able implementations, each modiﬁed cryptographic

algorithm has been tested on all three JCE instances

subsequently.

EMPLOYING MULTI-CORE PROCESSOR ARCHITECTURES TO ACCELERATE JAVA CRYPTOGRAPHY

EXTENSIONS

3.2 Measurement Systems

The modiﬁed JCE implementations and the devel-

oped measurement framework have been deployed

and tested on different measurement environments in

order to minimize the inﬂuence of environment and

system speciﬁc effects. Therefore, all measurements

have been performed on two different machines be-

ing equipped with different central processing units

(CPU) and different operating systems.

The ﬁrst machine (System A) was equipped with

an Intel Mobile Core 2 Duo P8600 CPU (code name

’Penryn’) running at a clock frequency of 2.4 GHz.

Details about this CPU are provided in Table 1. Fur-

thermore, this machine was equipped with 3 GB of

random access memory (RAM). The installed operat-

ing system was Microsoft Windows XP (32bit) with

Service Pack 3. Due to the installed 32bit operating

system, all tests on this machine have been performed

with the 32bit version of the Sun Java

Runtime En-

vironment (JRE) 7 only.

Table 1: System A - CPU characteristics.

Name Intel Mobile Core 2 Duo P8600

Package Socket P (478)

Clock frequency 2.40 GHz

Cores 2

Threads 2

The second machine (System B) was equipped with

an Intel Pentium D930 CPU (code name ’Presler’)

running at a clock frequency of 3 GHz. Further details

about this CPU are provided in Table 2. On System B,

2 GB of RAM were available. The system was run-

ning with the operating system Microsoft Windows 7

Enterprise (64bit). All measurements on this system

have been performed using 32bit as well as 64bit ver-

sions of the Sun Java

Runtime Environment (JRE)

Table 2: System B - CPU characteristics.

Name Intel Pentium D 930

Package Socket 775 LGA

Clock frequency 3.00 GHz

Cores 2

Threads 2

Hence, in total three measurement environments

(ME) were available. The ﬁrst environment (ME1)

was System A and the 32bit version of Sun JRE 7.

The other two measurement environments were Sys-

tem B with the 32bit Sun JRE (ME2) and System B

with the 64bit Sun JRE (ME3), respectively.

3.3 Results

In order to evaluate the efﬁciency of the applied JCE

modiﬁcations, the three parallelized cryptographic al-

gorithms have been tested on all three available mea-

surement environments.

In the ﬁrst measurement run, RSA key-pair gen-

eration operations have been performed on all avail-

able environments. Fig. 2 shows the result of this

measurement run. On all three measurement environ-

ments, usage of the parallelized JCE implementations

has led to a signiﬁcant speed-up. At the same time,

it has turned out that the two alternative parallel im-

plementations basically lead to similar results. De-

pending on the particular measurement environment,

speed-ups between 1.35 and 1.41 have been reached

by using parallelized JCE implementations. Table 3

summarizes the achieved speed-up for RSA key-pair

generation operations in relation to the original se-

quential JCE implementation.

Figure 2: RSA Key-pair Generation (1024 bit) on Different

Measurement Environments.

In a second measurement run, the parallelized

RSA cipher algorithm has been evaluated. Therefore,

RSA cipher operations have been performed on all

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies

Table 3: RSA Key-pair Generation - Speed-up.

JCE P I JCE P II

ME 1 1.39 1.35

ME 2 1.41 1.38

ME 3 1.40 1.37

three available JCE instances. Fig. 3 illustrates the

results of this measurement run. Again, usage of the

two parallelized JCE implementations has led to a sig-

niﬁcant computational speed-up. Similar to the RSA

key-pair generation measurement run, there is no ob-

vious difference in the performance between the two

alternative parallel implementations.

Figure 3: RSA Encryption on Different Measurement Envi-

ronments.

Table 4 summarizes the observed speed-up that

has been gained due to the usage of the two paral-

lelized JCE instances. Depending on the particular

measurement system, the time consumption for RSA

encryption operations could be reduced by up to 43%.

Finally, the third measurement run has evaluated

the efﬁciency of the parallelized ECDSA algorithm.

Therefore, the time consumption of ECDSA signature

veriﬁcation operations has been measured. Again,

measurements have been carried out for all three

available JCE implementations.

Fig. 4 shows the results of this measurement run.

Also for the ECDSA algorithm, the applied modiﬁca-

tions have caused a signiﬁcant computational speed-

up. While there is an obvious improvement compared

to the sequential JCE implementation, the two paral-

lel JCE instances basically led to similar results. The

achieved speed-up for ECDSA signature veriﬁcation

operations on different measurement environments is

summarized in Table 5.

Table 4: RSA Encryption - Speed-up.

JCE P I JCE P II

ME 1 1.65 1.65

ME 2 1.74 1.74

ME 3 1.49 1.62

Figure 4: ECDSA Signature Veriﬁcation on Different Mea-

surement Environments.

In general, all conducted measurement runs have

proven that parallelizing JCE implementations can

signiﬁcantly reduce the processing time of crypto-

graphic algorithms. Depending on the investigated

EMPLOYING MULTI-CORE PROCESSOR ARCHITECTURES TO ACCELERATE JAVA CRYPTOGRAPHY

EXTENSIONS

algorithm and the used measurement environment,

parallel implementations have reduced the time con-

sumption of certain cryptographic algorithms by up to

43.93%.

This observation holds for both, systems with

32bit as well as 64bit Java

Runtime Environ-

ments. Although systems with 64bit JREs have gen-

erally shown a better performance in terms of ex-

ecution time, computations on parallelized JCE in-

stances have been always faster than computations on

the unmodiﬁed sequential reference JCE implementa-

tion. Hence, the taken measurements have shown that

on any system the parallelized JCE instances perform

better than their unmodiﬁed sequential pendants.

Table 5: ECDSA Signature Veriﬁcation - Speed-up.

JCE P I JCE P II

ME 1 1.67 1.67

ME 2 1.73 1.72

ME 3 1.76 1.78

4 CONCLUSIONS

With the emergence of multi-core processor architec-

tures, the demand for parallel software has increased.

Since programmers are used to write sequential soft-

ware for single core architectures, the development of

parallel software is usually challenging.

Due to the computational complexity of cryp-

tographic algorithms, the parallelization of crypto-

graphic implementations could signiﬁcantly increase

their performance. In this work we have shown

that already minor manual adaptations of an exist-

ing sequential Java

cryptography library can sig-

niﬁcantly reduce the computing time of several cryp-

tographic algorithms when being executed on multi-

core architectures. In this paper we have shown how

to improve algorithms of an existing cryptographic li-

brary by applying parallelism. Furthermore, results of

measurements that have been conducted with the un-

modiﬁed JCE as well as with two different manually

parallelized JCE instances have been depicted.

The obtained results show that parallelizing cryp-

tographic Java

libraries does deﬁnitely make sense.

Although only minor manual adaptations have been

applied in this work, speed-up factors of up to 1.78

could be reached. For future work it is planned to op-

timize the applied parallelization of the cryptographic

library. This could be achieved by either applying a

more sophisticated manual parallelization or by using

tools that try to automatically parallelize existing se-

quential source code.

Another potential for further performance increase

is the re-implementation of certain cryptographic al-

gorithms. In many cases, cryptographic algorithms

can be implemented in different ways. By choosing

an implementation that allows a high degree of paral-

lelization, the achievable speed-up on multi-core ar-

chitectures could probably still be increased. This ap-

proach has not yet been followed in the course of this

work but is regarded as topic for future work.

REFERENCES

Bridges, M. J., Vachharajani, N., Zhang, Y., Jablin, T., and

August, D. I. (2008). Revisiting the sequential pro-

gramming model for the multicore era. Micro, IEEE,

28(1):12–20.

Daemen, J. and Rijmen, V. (2002). The Design of Rijndael.

Springer-Verlag New York, Inc., Secaucus, NJ, USA.

Dig, D., Marrero, J., and Ernst, M. D. (2009). Refactoring

sequential java code for concurrency via concurrent li-

braries. In ICSE ’09: Proceedings of the 2009 IEEE

31st International Conference on Software Engineer-

ing, pages 397–407, Washington, DC, USA. IEEE

Computer Society.

Freisleben, B. and Kielmann, T. (1995). Automated

transformation of sequential divide-and-conquer algo-

rithms into parallel programs. Computers and Artiﬁ-

cial Intelligence, 14:579–596.

Lea, D. (2005). The java.util.concurrent synchronizer

framework. Sci. Comput. Program., 58(3):293–309.

Peierls, T., Goetz, B., Bloch, J., Bowbeer, J., Lea, D., and

Holmes, D. (2005). Java Concurrency in Practice.

Addison-Wesley Professional.

Rugina, R. and Rinard, M. (1999). Automatic paralleliza-

tion of divide and conquer algorithms. In PPoPP ’99:

Proceedings of the seventh ACM SIGPLAN sympo-

sium on Principles and practice of parallel program-

ming, pages 72–83, New York, NY, USA. ACM.

Silverman, R. D. (1997). Fast generation of random, strong

rsa primes. CryptoBytes, 3(1):9–13.

Sutter, H. (2005). The free lunch is over: A fun-

damental turn toward concurrency in software.

http://www.gotw.ca/publications/concurrency-

ddj.htm.

WEBIST 2011 - 7th International Conference on Web Information Systems and Technologies