Towards an Overhead Estimation Model for Multithreaded Parallel

Programs

Virginia Niculescu

, Camelia S¸erban

and Andreea Vescan

Babes¸ -Bolyai University, Faculty of Mathematics and Computer Science, Computer Science Department,

Cluj-Napoca, Romania

Keywords:

Parallel Programming, Metrics, Overhead, Multithreading, Synchronization, Estimation, Validation.

Abstract:

The main purpose of using parallel computation is to reduce the execution time. To reach this goal, reducing

the overhead time induced by the additional operations that parallelism implicitly imposes, becomes a ne-

cessity. In this respect, the paper proposes a new model that evaluates the overhead introduced into parallel

multithreaded programs that follows SPMD (Single Program Multiple Data) model. The model is based on

a metric that is evaluated using the source code analysis. Java programs were considered for this proposal,

but the metric could be easily adapted for any multithreading supporting imperative language. The metric

is deﬁned as a combination of several atomic metrics considering various synchronisation mechanisms. A

theoretical validation of this metric is presented, together with an empirical evaluation of several use cases.

Additionally, we propose an AI based strategy to reﬁne the evaluation of the metric by obtaining accurate

approximation for the weights that are used in combining the considered atomic metrics.

1 INTRODUCTION

A general expectation from the parallel program-

ming is that by doubling the hardware resources,

one can reasonably expect a program to run twice

as fast. However, in typical parallel programs, this

is rarely the case, due to many overheads associated

with parallelism. An accurate quantiﬁcation of these

overheads is critical for understanding and improv-

ing the parallel programs performance. The parallel

computation is also limited by overhead costs such

as: threads/processes start, management and termina-

tion costs, synchronization problems – task coordi-

nation, cost of communication among multiple tasks

(threads/processes), the overhead of some software li-

braries, parallel compilers or interpreters and support-

ive OS.

SPMD (Single Program, Multiple Data) program-

ming model (Grama et al., 2003) is characterized by

the fact that all the processes/threads execute a copy

of the same program on different data. SPMD is

widely used in parallel programming, due to the ease

of designing a program that consists of a single code

https://orcid.org/0000-0002-9981-0139

https://orcid.org/0000-0002-5741-2597

https://orcid.org/0000-0002-9049-5726

running on all processing elements. The differenti-

ation between the execution on each processing ele-

ment (thread/process) can be done based on the ID of

each process/thread.

The overhead time (T

) is estimated, in general, as

being the difference between the product of the paral-

lel time with the number of processing elements, and

the sequential time:

= p ∗ T

− T

(1)

This is a very general evaluation and it is difﬁcult to be

used in the design stage of the development. For a cer-

tain class of parallel programs – as SPMD on shared

memory platforms – written in a particular language

(as Java) – where we know the speciﬁc synchroniza-

tion mechanisms that could be used, we may estimate

more accurately the overhead time. We propose in

this paper such a model for SPMD multithreaded Java

programs, based on a synchronization overhead met-

ric. Java programs were considered for this model,

but it could be easily adapted for any multithreading

supporting imperative language.

Thus, the contribution of the paper is three-fold:

(1) a new model for overhead evaluation, based on

a metric deﬁned as an aggregation of several atomic

metrics related to various synchronization mecha-

nisms; (2) the theoretic validation of the proposed

502

Niculescu, V., ¸Serban, C. and Vescan, A.

Towards an Overhead Estimation Model for Multithreaded Parallel Programs.

DOI: 10.5220/0011083400003176

In Proceedings of the 17th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2022), pages 502-509

ISBN: 978-989-758-568-5; ISSN: 2184-4895

metric using Weyuker’s metric properties (Misra and

Akman, 2008), (EJ, 1988), and (3) an empirical eval-

uation of the weights used in the metric deﬁnition,

based on concrete experiments. In addition, an AI

strategy to reﬁne the metric by obtaining a more ac-

curate approximation for its weights is proposed.

Regarding the usability of the proposed metric we

could mention the possibility to estimate the overhead

time at the design stage, compare different design so-

lutions for a given problem, or estimate the need for

refactorization.

The paper is structured as follows. Section 2 dis-

cusses related work regarding theoretical validation

of software metrics and tools for performance assess-

ment in concurrent programs. Section 3 describes the

main synchronization mechanisms in Java, while the

deﬁnition of the new metric is given in Section 4, to-

gether with its theoretical and empirical analysis. The

conclusions and further work are emphasized in the

last section.

2 RELATED WORK

This section describes existing approaches regarding

theoretical and empirical validation of software met-

rics and existing tools for multithreaded programs

performance evaluation.

Evaluation of Software Metrics - Various Criteria.

Measures and way of measuring in Software Engi-

neering (SE) domain are different from that of other

engineering domains. In comparison with measuring

the length of an engineering product, measuring the

size of a software, such as program length, is not so

easy and there are few formal approaches that rigor-

ously apply software measures. Formal approaches

in SE are usually based on different types of metrics

that are used in order to quantify those aspects that

are considered important for the assessment. Many

researchers (EJ, 1988), (H, 1992), (B and N, 1995),

(Briand LC and S, 1995) contributed to lay a foun-

dation for measuring software, both for metric deﬁni-

tion and metrics validation. There are basically two

categories of techniques for metrics’ validation: the

empirical validation which conﬁrms the metrics ac-

tual applicability, and the analytical evaluation de-

ﬁned based on deﬁnite measurement theories. Prior

to the empirical validation (i.e., applying to industry),

every metric has to be analytically evaluated to con-

ﬁrm that it has scientiﬁc pedestal, and it was deﬁned

based on deﬁnite measurement theories.

Tools for Performance Evaluation. Software perfor-

mance is in many cases increased by implying parallel

and concurrent computation. In order to make effec-

tive the development of such programs, the develop-

ers have come to expect tools that are able to augment

the design and execution infrastructure with different

capabilities such as design assistance, performance

evaluation, debugging or execution control. There is

quite a large set of tools like these, and further on we

will mention just a few that were proposed for mul-

tithreaded programming, and which are related to the

overhead time.

Tmon, a tool for monitoring, analyzing and tuning

the performance of multithreaded programs was pro-

posed (Ji et al., 1998). The tool relies on two mea-

sures performed at the run-time of the program; it

uses thread waiting time and constructs thread wait-

ing graphs. A static analysis tool called Iceberg (Shah

and Guyer, 2016) was proposed in order to identify

performance bugs in concurrent Java programs. They

developed an analysis called latency variability anal-

ysis, a ﬂow-sensitive analysis trying to estimate the

range of possible latencies through a block of code.

Iceberg tool (Shah and Guyer, 2018) was improved in

order to perform a dynamic analysis of Java programs.

A tool that automatically detects the inefﬁciency in-

tervals representing time periods when a concurrent

application is not using all its capabilities of the par-

allel system was proposed in (Espinosa et al., 1998).

Another lock proﬁler designed in the context of a new

metric, critical section pressure, is proposed in paper

(David et al., 2014). There are several applications

serving as testing environment for this tool, the re-

sults showing that it can detect phases where a lock

hinders the threads process.

In relation to the existing approaches our proposed

metric can be used early in the development lifecycle,

when the system design is known, to predict the over-

head added by the parallelism. In this way, possible

errors and crash that could appear during the life of

the program can be prevented. As far as we know

there are no references in literature regarding such a

metric.

3 SYNCHRONIZATION

MECHANISMS

In order to evaluate the overhead brought through syn-

chronization we will consider the following synchro-

nization mechanisms:

• critical sections

– using Locks

– using synchronized methods and blocks

• conditional waiting

– using wait/notify calls

Towards an Overhead Estimation Model for Multithreaded Parallel Programs

503

– using Condition variables and the correspond-

ing await/signal calls

• barriers - using CyclicBarrier

• ’rendez-vous’ mechanisms using Exchanger

A lock is a synchronization mechanism for enforcing

the access limitation to a resource in an execution en-

vironment where there are many threads of execution.

A lock is designed to enforce a mutual exclusion con-

currency control policy (Garg, 2004; Raynal, 2013).

In Java, Lock class provides implementation for this

mechanism (G

oetz et al., 2006).

Synchronized methods and blocks in Java are im-

plemented based on the association of each object to

a monitor (G

oetz et al., 2006), and they are mecha-

nisms for deﬁning critical sections. A monitor encap-

sulates: shared data structures, procedures that oper-

ate on the shared data structures, and synchronization

between concurrent procedure invocations. In case of

Java we may consider that the encapsulated data are

the object attributes, and by deﬁning synchronized

methods we may deﬁne the procedures (methods) that

should be called based on mutual exclusion (G

oetz

et al., 2006; Garg, 2004; Raynal, 2013). Monitors

could also include conditional synchronization; Java

provides only one condition queue per monitor using

the corresponding wait/notify methods.

Synchronization conditions (also known as condi-

tion queues or condition variables) provide a means

for one thread to suspend execution (to ”wait”) until

notiﬁed by another thread when some state condition

arrives to be true. The correspondent Java implemen-

tation is the class Condition, and its instances should

always be associated with a lock (G

oetz et al., 2006).

A barrier is a synchronization method that forces

a group of threads or processes to stop at the

point of the barrier, and not proceed until all

other threads/processes reach this barrier. A Java

CyclicBarrier object is an object that implements

a barrier, which can be used (reused) cyclically - in

multiple points of the program execution. The method

await is used to specify the synchronization points

oetz et al., 2006; Garg, 2004; Raynal, 2013).

The Java Exchanger(G

oetz et al., 2006) class im-

plements the “rendez-vous” concept, which speciﬁes

a two-way synchronization and communication be-

tween two threads or processes: if one thread arrives

to the rendez-vous point it waits for the other to ar-

rive and then they exchange information (Garg, 2004;

Raynal, 2013).

There are also many other synchronization mech-

anisms and an example worth to be mentioned is the

semaphore. A semaphore (Garg, 2004; Raynal, 2013)

is a synchronization construct that is typically used

to coordinate the access of multiple threads/processes

to resources. It also has a Java implementation –

class Semaphore (G

oetz et al., 2006). We don’t con-

sider them in this paper because usually they are not

used in SPMD type of programs. More details about

other synchronization mechanisms could be found in

oetz et al., 2006; Garg, 2004; Raynal, 2013).

As noticed, critical sections could be obtained

through multiple mechanisms: - e.g. locks, synchro-

nized blocks.

Do their choice inﬂuence the program performance?

Theoretically, we may assume that the answer is af-

ﬁrmative, and also from some simple experiments

we noticed that there are some differences. In gen-

eral, solving a problem could have different solutions

based on different synchronisation mechanisms.

4 OVERHEAD METRIC

We propose a model to approximate the overhead of

SPMD Java programs by using a metric deﬁned based

on the different characteristics of the synchronization

mechanisms.

4.1 Java SPMD Programs

In order to evaluate the overhead brought by the

synchronization we will consider the synchronization

mechanisms mentioned in the previous section.

We consider SPMD programs formed of three

parts: threads creation, starting all the threads using

the same code/program (in Java the function run()),

and joining all threads.

Combining two such programs P1 and P2 into a

new program P3 = P1+P2 (here we consider + the

combining operator) means that the P3 program is ob-

tained by sequentially combing the two running func-

tions run

=run

; run

In addition, we also consider that into a program

we don’t have adjacent critical sections – if they are

then they will be implicitly merged into one.

4.2 Metric Deﬁnition

The proposed overhead metric is an aggregated met-

ric, which is based on atomic metrics that corresponds

to the synchronization mechanisms described in Sec-

tion 4.1.

The overhead time T

can be estimated based on a

synchronization metric O by using the following def-

initions:

ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering

504

O : P → R, T

: P × S → R,

P is the set of all SPMD Java programs and

S is the set of all execution systems

(P,S) = t

∗



th manag

+ O(P)



,P ∈ P, S ∈ S

(2)

where

• P = (P

,...,P

) , P

: 0 ≤ i < p represents the

ﬁnite set of threads deﬁned inside a SPMD pro-

gram;

• t

represents a time that depends on the per-

formance of the execution system, the number

of processors of the system (w

proc

), and on the

threads per cores ratio (w

load

);

• w

th manag

= the weight for managing the threads.

We have

O(P) =

bar

∗ p ∗ (#bar) ∗ f (w

load

,dis) +

wait

∗ ( #wait) / f (w

load

,dis) +

cond

∗ ( #await) / f (w

load

,dis) +

∗ ( #ex) / f (w

load

,dis) +

sync

∗ p ∗



∑

#sync

i=1



/ f (w

load

,dis) +

lock

∗ p ∗



∑

#lock

i=1



/ f (w

load

,dis) +

(3)

where:

• #bar and w

bar

– the number of synchroniza-

tion barriers based on the calls of await through

CyclicBarriers, and the corresponding weight;

• #wait and w

wait

– the total number of calls of wait

method of Object, and the corresponding weight;

• #await and w

cond

– the total number of await op-

erations called through a Condition variable, and

the corresponding weight;

• #ex and w

– the total number of exchange op-

erations executed through an Exchanger, and the

corresponding weight;

• #sync and w

sync

– the number of critical sections

speciﬁed with synchronized blocks or methods,

and the corresponding weight;

• #lock and w

lock

– the number of critical sec-

tions speciﬁed with Lock objects and lock(), and

unlock() methods, and the corresponding weight;

• cs

the size of a critical section (#i) – this is ex-

pressed in number of statements;

• dis is a measure of the dissimilarity between the

threads estimated as:

dis =

#conditional statements

#total statements

• f (w

load

,dis) = σ ∗w

load

∗(1+dis) is a calibration

function; a barrier is delayed when the w

load

and

dis are increasing, but all the other synchroniza-

tion mechanisms are favored in this case.

In general, a high level of loading (w

load

) increases

the execution time; for example for the barrier execu-

tion if the loading is low the probability that threads

arrive in the same time at the barrier point is very high,

and vice-versa if the loading is high is very probable

the threads arrive at the barrier point at different times.

Also, if the dissimilarity between the threads is higher

then the probability of threads arriving at the barrier

at different moments in time is also higher. This is

why in the overhead metric formula the time due to

barriers should be multiplied by the calibration func-

tion.

For the critical sections we have the opposite situ-

ation: it is desirable that the threads arrive to critical

sections at different times, and high level of loading

determine this implicitly.

When we use Condition and wait-notify for con-

ditional synchronization, we need to use them inside

the critical sections; the overhead due to these critical

sections is included into the corresponding weights

wait

and w

cond

4.3 Theoretical Validation of the Metric

Among numerous metric validation criteria that exist

in the literature, as we have mentioned them in the

Section 2, Weyuker’s properties are most extensively

used for evaluating software metrics; they are guiding

tools for identifying good and complete metrics (EJ,

1988).

Property 1: Non-coarseness. (∃) (P1), (P2) two

distinct programs from P such that O(P1) 6= O(P2).

This is trivially fulﬁlled by O metric due to the

fact that is not deﬁned as a constant map.

Property 2: Granularity. It states that there will be a

ﬁnite number of cases for which the metric value will

be the same.

It is considered that this property is met by any

metric measured at the program level, because the

universe deals with at most a ﬁnite set of programs

and so the set of those programs having the same

value for the proposed metric is also ﬁnite (Chi-

damber and Kemerer, 1994).

Property 3: Non-Uniqueness (Notion of Equiva-

lence). (∃) (P1), (P2) two distinct programs from

P such that O(P1) = O(P2).

We may consider two programs that each update

a variable (based on some formulas) inside a criti-

cal section; for the ﬁrst program we may use with

synchronized and for the second Lock. The values for

sync

and w

lock

could be different but by chosen the

sizes m and n of the two critical sections correspond-

ingly, we arrive to the same value for the metric. It is

possible to ﬁnd n and m such that

Towards an Overhead Estimation Model for Multithreaded Parallel Programs

505

m ∗ w

sync

= n ∗ w

lock



sync

lock



Property 4: Design Details are Important. This

property states that, in determining the metric for an

artifact, its design details also matters. When we con-

sider the designs of two programs P1 and P2, which

are the same in functionality, does not imply that

O(P1) = O(P2).

In our case, we have different means to

achieve similar things; for example we have

synchronized and locks for critical sections, and

Object:wait-notify and Condition:await-signal

for conditional synchronization.

Property 5: Monotonicity.

It states that a component of a program is always

simpler than the whole program. For all (P1), (P2) ∈

P either O(P1) ≤ O(P1+P2) or O(P2) ≤ O(P1+P2)

should hold.

Our metric is deﬁned as a summation of atomic

metrics and based on the combing operation we have

O(P1 + P2) is bigger or equal than O(P1) or O(P2).

Property 6: Non-equivalence of Interaction. If P1,

P2 and P3 are three programs having the property that

O(P1) = O(P2) does not imply that O(P1 + P3) =

O(P2 + P3). This suggests that the interaction be-

tween P1 and P3 may differ from that of P2 and P3.

If the P1 program ends with a critical section, and

P2 doesn’t end with a critical section, while P3 starts

with a critical section, then for P1+P3 the two corre-

sponding critical sections will be implicitly merged,

and this reduces the overhead. Then O(P1 + P3)¡

O(P2 + P3).

Property 7: Permutation. It states that permuta-

tion of elements within the program being measured

may change the metric value. Being given a program

P1 that is transformed into a program P2 by permut-

ing the order of the statements such that the provided

functionalities are preserved, the property states that

O(P1) 6= O(P2).

Considering a parallel program P1 that deﬁnes a

critical section (deﬁned through synchronized) that

contains n statements, but not all the n statements

need mutual exclusion, we can transformed it to ob-

tain P2 by moving one of these statements out of the

critical section; this way the functionality is preserved

and the overhead metric is reduced because the size of

the critical section is reduced. For the initial variant

we have O(P1) = w

sync

∗n and for the second we have

O(P2) = w

sync

∗ (n − 1).

Property 8: Renaming Property. When the name of

the measured artifact changes, the metric should not

change. That is, if program P2 is obtained by renam-

ing program P1 then O(P1) = O(P2).

As the proposed metric is measured at the pro-

gram level and, it does not depend on the name of

the program nor on the name of its classes, methods

and instance variables (through which the synchro-

nization mechanisms are implemented) it also satis-

ﬁes this property.

Property 9: Interaction Increases Complexity. It

states that when two programs are combined, inter-

action between them can increase the metric value.

When two programs P1 and P2 are considered

O(P1) + O(P2) ≤ O(P1 + P2).

In general for the proposed metric we have that

O(P1) + O(P1) = O(P1 + P2) (based on the combin-

ing operator – section 4.1, and the metric deﬁnition

based on summation).

Hence, the proposed metric is effective and can

be utilized for the purpose of overhead estimation in

SPMD programs.

4.4 Empirical Weights Evaluation

Empirical validation is extremely signiﬁcant for the

accomplishment of any software measurement project

(Basili et al., 1996), (Fenton and Pﬂeeger, 1997).

Metrics proposal do not have much value without

demonstrating its practical utilization. Thus, study-

ing the applicability and usefulness of the proposed

metric which has been already validated theoretically

is very important.

A possible approach that can be followed for eval-

uation is provided in the book of Yin (Yin, 2008),

which emphasizes when it is possible to rely on case

studies method and how to do the research design

with this approach.

In order to verify and emphasize how the met-

ric could be used we have considered a use-case that

could be solved using several variants, each of these

using different synchronization mechanisms.

The considered problem is so called reduce oper-

ation: for a given list of elements and an associative

operator deﬁned on the type of the elements we have:

reduce(⊕)[e

,.. .,e

n−1

] = e

⊕ e

⊕ ··· ⊕ e

n−1

If the elements are real numbers and the operator

is the addition operator, reduce operation computes

the sum of all given real numbers. The sequential

computation requires n operations when applied on

a list of n numbers.

An efﬁcient parallel computation is based on a

”tree-like” computation that is derived from the re-

cursive deﬁnition of the reduce function:

reduce(⊕)[e] = e

reduce(⊕)[p|q] = reduce(⊕)[p] ⊕ reduce(⊕)[q]

where | is the concatenation operator. an example is

illustrated in the Figure 1

source: https://docs.nvidia.com/cuda/samples/6 Advanced/reduction/doc/reduction.pdf

ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering

506

Figure 1: The computation of the sum by using tree-like

computation (Mark Harris, 2022).

As it can be seen in the Figure 1 the program cre-

ates a number of threads equal to the number of ele-

ments – n, and the computation is done in k = log

steps. At each step the threads with an ID divis-

ible with the stride ∗ 2 will add to its correspond-

ing element E[ID] the value on the position equal to

ID − stride. In order to obtain the correct answer

is mandatory that this operation to be executed only

after the value on the position ID − stride was al-

ready updated in the previous step. The thread ex-

ecution is done theoretically completely in parallel

but the execution of operations executed by differ-

ent threads is non deterministic (i.e. implicitly not all

the threads execute the operations on one level in the

same time). In order to impose this synchronization

different mechanisms could be used and lead to 4 vari-

ants: with a barrier after each level,or with synchro-

nization between pairs that need to exchange informa-

tion using wait-notify, Condition, or Exchange.

If we consider also multithreading variants de-

rived from the sequential algorithm, the numbers are

split between the threads and each updates of the sum

variable with its own value inside a critical section.

Obviously, it is not an efﬁcient approach since the

computations is still done sequentially due to the crit-

ical section. These variants are considered here just

as means to evaluate the corresponding weights w

sync

and w

lock

Metric Evaluation for Reduce Variants

For each variant presented in the previous section we

evaluate the O metric, and then we will compare the

results with the concrete execution times obtained for

the variants execution on two different systems. The

goal is to obtain an estimation of the weights deﬁned

in the metric.

V1 Barrier

– a barrier is used after each level in the tree execu-

tion;

#bar = log

n = k

DIS = k/(3k) = 0.33

V2 synchronized + wait

At each level there are pairs of threads (thread ID

should use the value of thread (ID−stride) only after

this one ﬁnished previous update) that needs to syn-

chronize one to another through wait-notify mecha-

nism. All these pairs are disjunctive, and at each step

the number of pairs that need to synchronize in pairs

decreases by 2.

#wait =

∑

log

i=1

n/2

= n

DIS = 2k/(5k) = 0.4

V3 locks + await from Condition

At each level there are pairs of threads that needs to

synchronize one to another through conditional vari-

ables. (Similar to variant V2.)

#cond =

∑

log

i=1

n/2

= n

DIS = 2k/5k = 0.4

V4 Exchanger

At each level there are pairs of threads that needs to

synchronize one to another through an Exchanger. All

these pairs are disjunctive.

#ex =

∑

log

i=1

n/2

= n

DIS = 2k/(5k) = 0.4

V5 Critical section with synchronized

There is one critical section of size 1.

#sync = 1; cs

= 1

DIS = 0

V6 Critical section with Lock

There is one critical section of size 1.

#lock = 1; cs

= 1

DIS = 0

Depending on the value of the weights we may es-

timate which variant could be better on different sys-

tems. For all the variants, the source code of the run()

function was analysed.

The evaluation of the weights is not an easy task,

and we propose, as a ﬁrst step, an empirical strategy

based on the execution time for all previously speci-

ﬁed variants, on different arrays’ size and on different

machines.

4.5 Experiments and Weights

Evaluation

Our replication strategy considers experiments of the

variants of reduction operation implementation on

different computation machines – (C1 and C2). The

two machines systems have the following character-

istics: C1 – a computer with 1 processor Intel(R)

Towards an Overhead Estimation Model for Multithreaded Parallel Programs

507

Figure 2: The execution time for n=32 on both C1 and C2

machines.

Core(TM) i7-7500U CPU @ 2.50GHz, 4 Core(s) with

hyperthreading, running Java 8 (macOS); C2 – an

x3750 M4 machine, with 4 processors Intel Xeon E5-

4610 v2 @ 2.30GHz CPUs, 8 cores per CPU with

hyperthreading, running Java 8 (CentOS 7). Due to

stochastic nature of the programs’ execution, they

must be repeated several times in order to mitigate

against the effect of random variation. We have used

in our evaluation 30 executions and we took their av-

erage.

The results are shown in Figure 2 and Figure 3

where time is expressed in microseconds. It can be

noticed that the barrier variants for the both cases are

much higher, and this is why the variations on the

1024 threads cases are not quite visible in the ﬁgure.

For n = 1024 the execution time for the barrier ver-

sion is much higher, but the other variants preserve

almost the same shape as for n = 32 case.

The strategy that we followed in order to approxi-

mate the weights was based on computing the differ-

ences between the execution times, and relies on the

observed fact that the execution time of the ﬁrst vari-

ant (V1- Barrier) it is always the highest.

So, the steps of our evaluation strategy were:

1. Measure the execution times for the six describe

variants on the two machines C1 and C2 for two

cases: U1: n = 32; U2: n = 1024.

2. Compute for each case Ci-Uj (1 ≤ i, j ≤ 2) the

following differences:

• T (V 1)−T (V 2);

• T (V 1)−T (V 3);

• T (V 1)−T (V 4);

• T (V 1)−T (V 5);

• T (V 1)−T (V 6)

In this way, we eliminate the thread management

overhead time.

3. Each variant (V1-V6) depends on only one type of

weight and we can start from a ﬁx value of one of

these. We started by considering w

bar

= 1 +

load

1000

Figure 3: The execution time for n=1024 on both C1 and

C2 machines.

4. Compute the values of the others weights by ap-

plying the formulas determined at step 3 based on

the metric for each variant.

In order to estimate the ﬁnal value for weights we also

used values for t

that have been approximated based

on the systems characteristics. Also, the approxima-

tion of the function f was done based on observation

(σ = 100); more different values have to be tried for

further evaluations.

In Table 1 we added the results of the evaluation

of the weights following the described strategy. The

values show that the differences are not very big and

so they are promising as a ﬁrst step for the estimation.

Table 1: The estimated values for the weights based on the

empirical evaluation.

Variants

32 1024

weights C1 C2 C1 C2 mean

bar

1.004 1.008 1.128 1.03 1.04

wait

0.93 1.59 1.03 1.71 1.31

cond

2.29 2.56 1.81 1.94 2.15

1.80 2.38 2.06 2.15 2.09

sync

0.97 1.38 1.43 1.76 1.38

lock

2.02 1.53 1.50 1.77 1.70

In order to arrive to a validated estimation, we pro-

pose, as a further work, to apply a more complex eval-

uation strategy that is based on artiﬁcial intelligence

methods. The estimated values could be the starting

point in the evaluation.

AI based strategies for weights evaluations. Such a

strategy could be formed of the followings steps: (1)

consider a set of already implemented parallel pro-

grams having different complexities; (2) evaluate the

metric for each program; (3) measure the execution

times of each program with different numbers of ex-

ecution threads; (4) use a Machine Learning-based

method (linear regression, neural network) or a Ge-

netic Algorithm-based approach in order to tune the

ENASE 2022 - 17th International Conference on Evaluation of Novel Approaches to Software Engineering

508

values for the weights; (5) validate the results on new

programs

The current paper offers preliminaries measure-

ments and estimation for the overhead metric, which

was theoretically validated.

5 CONCLUSIONS

An important desiderata in writing parallel programs

is to reduce the overhead time which in some cases

may cover the advantages of doing some computa-

tion in parallel. We have proposed a metric that esti-

mates the overhead time for parallel Java SPMD mul-

tithreaded programs and we theoretically validated it;

here Java programming language was considered but

the metric can be easily adapted for other languages,

since the synchronization mechanisms are very simi-

lar.

• estimate the overhead time at the development

stage;

• compare different design solutions for a given

problem, and choose the most optimal from the

overhead point of view;

• estimate the need for refactorization of a given

program by evaluating the improvements that

could be achieved by changing the design method

or only by changing the synchronization mecha-

nisms.

We have shown that the metric deﬁnition corresponds

to a proper deﬁnition of a software metric, and we

provided a theoretical validation using Weyuker’s

properties. A ﬁrst evaluation of the weights associ-

ated to the aggregated atomic metrics was done based

on different implementation variants of the reduce op-

eration; the results proved to be promising. Still, for a

proper empirical validation this should be improved,

and so, we proposed an AI based strategy that could

be followed as a next step for attaining this goal.

REFERENCES

B, K. and N, F. (1995). Towards a framework for software

measurement validation. IEEE Transactions on soft-

ware Engineering, 21(12):929–943.

Basili, V., Briand, L., and Melo., W. (1996). A validation

of object-oriented design metrics as quality indicators.

20(10):751–761.

Briand LC, E. K. and S, M. (1995). On the application of

measurement theory in software engineering. Techni-

cal report, ISER Technical Report.

Chidamber, S. and Kemerer, C. (1994). A Metric Suite for

Object- Oriented Design. IEEE Transactions on Soft-

ware Engineering, 20(6):476–493.

David, F., Thomas, G., Lawall, J., and Muller, G. (2014).

Continuously measuring critical section pressure with

the free-lunch proﬁler. SIGPLAN Not., 49(10):291–

307.

EJ, W. (1988). Evaluating software complexity mea-

sure. IEEE Transactions on software Engineering,

14(9):1357–1365.

Espinosa, A., Margalef, T., and Luque, E. (1998). Auto-

matic performance evaluation of parallel programs. In

Euromicro Workshop on Parallel and Distributed Pro-

cessing, pages 43—-49.

Fenton, N. and Pﬂeeger, S. (1997). Software Metrics:

A Rigorous and Practical Approach. International

Thomson Computer Press, London, UK, second edi-

tion.

Garg, V. K. (2004). Concurrent and Distributed Computing

in Java. John Wiley fsSons, Inc., USA.

oetz, B., Peierls, T., Bloch, J., Bowbeer, J., Holmes, D.,

and Lea, D. (2006). Java Concurrency In Practice.

Addison Wesley Professional.

Grama, A., Gupta, A., Karypis, G., and Kumar, V. (2003).

Introduction to Parallel Computing, Second Edition.

Addison-Wesley.

H, Z. (1992). On weyuker’s axioms for software complexity

measures. Software Quality Journal, 1(4):225–260.

Ji, M., Felten, E. W., and Li, K. (1998). Performance mea-

surements for multithreaded programs. In ACM SIG-

METRICS Joint IC on Measurement and Modeling of

Computer Systems, pages 161–170. ACM.

Mark Harris. Optimizing Parallel Reduction in CUDA.

https://docs.nvidia.com/cuda/. Online; accessed 25

January 2022.

Misra, S. and Akman, I. (2008). Applicability of weyuker’s

properties on oo metrics: Some misunderstandings.

Computer Science and Information Systems, 5(1):17–

23.

Raynal, M. (2013). Concurrent Programming: Algorithms,

Principles, and Foundations. Springer-Verlag Berlin

Heidelberg.

Shah, M. D. and Guyer, S. Z. (2016). Iceberg: A tool for

static analysis of java critical sections. In ACM SIG-

PLAN International Workshop on State Of the Art in

Program Analysis, pages 7–12. ACM.

Shah, M. D. and Guyer, S. Z. (2018). Iceberg: Dy-

namic analysis of java synchronized methods for in-

vestigating runtime performance variability. In IS-

STA/ECOOP Workshops, pages 119–124. ACM.

Yin, R. K. (2008). Case Study Research: Design and Meth-

ods (Applied Social Research Methods). Sage Publi-

cations, fourth edition. edition.

Towards an Overhead Estimation Model for Multithreaded Parallel Programs

509