raceability and Factorization in Class Diagrams:

an Experimentation of their Correlation

Cam-Ngan Tran

and Michel Dao

France T

ecom R&D DTL/TAL

2 avenue Pierre Marzin

22307 Lannion Cedex – France

France T

ecom R&D DTL/TAL

38-40, rue du g

eral Leclerc

92794 Issy Moulineaux Cedex 9 – France

Abstract. In this article, we present a study of the correlation between factoriza-

tion and the quality criterion of traceability. Our work is based on a set of new

factorization metrics and a speciﬁc deﬁnition of traceability. The results of our

experiment show a good correlation between the increase of the factorization of

a UML class diagram and its traceability.

1 Introduction

A quality model deﬁnes a methodology and a set of measures in order to evaluate the

quality of a software system (source code, analysis or conception models, etc.) [1, 2].

MACAO project

has speciﬁed its own quality model, based on existing ones [3, 4]. It

uses, besides “classical” object-oriented metrics [5,6], a body of speciﬁc metrics ded-

icated to the measurement of factorization

in UML class diagrams [7]. In our quality

model, the results of the computation of the metrics on a given UML class diagram are

combined in order to obtain an evaluation of several quality criteria such as encapsula-

tion, consistency, testability, etc.

Assessing that a quality model actually measures what it is supposed to measure

is an essential task in order to ensure conﬁdence in its results. This article presents a

study of a limited aspect of the validation of MACAO project quality model, namely

the correlation between the degree of factorization of a UML class diagram and the

evaluation of the traceability criterion.

This article is organized as follows:

In section 2 we brieﬂy present quality model validation and explain the validation

we have carried out, which is based on a set of factorization metrics presented in section

CAO is a common project of France Telecom R&D, SOFTEAM and LIRMM (CNRS and

University Montpellier 2.) supported by the french national network on software technologies

RNTL

We deﬁne factorization as the amount of features (attributes and methods) that are shared

through inheritance. The more features are shared, the better the factorization.

Tran C. and Dao M. (2004).

Traceability and Factorization in Class Diagrams: an Experimentation of their Correlation.

In Proceedings of the 1st International Workshop on Software Audits and Metrics, pages 38-48

DOI: 10.5220/0002673500380048

 SciTePress

3 and on a speciﬁcation of traceability speciﬁed in section 4. We then analyze the results

of our study in section 5 before drawing some conclusions.

2 Quality model validation

Validation of quality models may be roughly divided into two types of activities [8,9]:

–

the validation that the metrics used in the model actually measure what they claim

to measure. This activity is known as “internal validation”;

– the determination of correlations between an external quality criterion (such as

traceability) and an internal measurement. This activity is known as “external vali-

dation”.

We have chosen to study the set of metrics speciﬁc to factorization [7] deﬁned in

MACAO project and we have made a limited internal validation of thoses metrics [10].

Our internal validation was focused on 11 properties extracted from litterature: non-

negativity, null value, simplicity, non-coarseness, non-uniqueness, minimum factori-

sation, maximum factorisation, monotonicity, non-equivalent of interaction, tracking

property, importance of implementation and one property speciﬁc to factorization: re-

dundancy. This internal validation is studied and discussed in [10].

Regarding external validation, which is the subject of this article, we have carried

out a study of the correlation between factorization measured by our set of metrics and

the traceability criterion.

Our aim during this study was to (in)validate the following assumption:

The increase of factorization of a UML class diagram leads to an increase

of the traceability criterion

The problem regarding the evaluation of such a correlation is that one must mea-

sure traceability in a way that is not directly based on the factorization. We propose

a deﬁnition and a means to compute traceability in section 4 that we believe to be a

good indicator of tracebility’s estimation and to be independent (in its deﬁnition) of

factorization.

3 Factorization measurement

Our set of factorization metrics is based on our previous work on class hierarchy restruc-

turation [11]. We have designed and implemented several algorithms that reconstruct

class hierarchies in order to maximize factorization. Those algorithms use a structure

known as Galois sub-hierarchy which is a restriction of Galois lattice. The Galois sub-

hierarchy provides an organization of classes in a hierarchy where features are max-

imally factorized (they are introduced by only one class and inherited by the others),

thus improving generalization and the number of classes introduced is minimal.

The metrics we have deﬁned measure the “distance” between the existing class

hierarchy and the “ideally factorized” hierarchy that results from the application of one

of our algorithms. Metrics are deﬁned at different levels: feature (attribute and method),

generic feature (set of semantically close features such as all methods with the same

name), class and hierarchy.

As our algorithms reorganize a class hierarchy so that factorization metrics are op-

timal, we used reorganized hierarchies as a benchmark in our study.

4 Traceability

Traceability is a key issue to guarantee consistency among software artifacts of the de-

velopment cycle phases. Together with changeability, modiﬁability and stability, trace-

ability is a quality criterion closely tied to the software maintenance. The improvement

of those quality criteria will reduce maintenance cost. Their assessment relies, to a large

extent, on software evolution and change impact analysis and is the subject of numerous

researches.

Changeability, as “the capacity of the software product to enable a speciﬁed mod-

iﬁcation to be implemented”, was studied by [12]. The author showed the existence

of a correlation between coupling and changeability and the absence of a signiﬁcant

correlation between changeability and cohesion.

Modiﬁability, which is similar to changeability is deﬁned by [13] as “the facility

with which a system can be adapted to the changes in the functional speciﬁcation, the

environment or the expression of the requirements”. This work is in connection with

the architecture of the system. It argues that modiﬁability has to be taken into account

in earlier phases of the systems development life cycle.

Stability is also a criterion on which maintainability depends. It is deﬁned as “the

capacity of the software product to avoid the unexpected effects of the modiﬁcations

of the software”. Studies on stability are mainly based on the comparison of various

versions of the software [14]. These studies allow to determine the most unstable com-

ponents.

4.1 Traceability concept

The traceability quality criterion deals with the software modiﬁcations which consist in

updating and/or expanding a system during the implementation or maintenance phases.

It is classiﬁed in two categories [15]:

–

internal traceability shows the dependency between objects within a development

phase. It is based on tangible links of the same abstraction level. These tangible

links make the internal traceability easier to compute than the external traceability.

In software design, the internal traceability can be divided into three main types

[16]: object to object, association to association and use case to object;

–

external traceability concerns relations between objects in different phases of de-

velopment cycle. For example, it could be the relationship between speciﬁcations

and testing. External traceability is ensured by cognitive links. These links, by

which objects are connected to others objecs at different abstraction levels, are

determined by an expert who is familiar with the system. Thus, the external trace-

ability links must be manually created, based on the expertise and the intuition of

the designers [17].

In this study, we consider traceability as the facility to ﬁnd elements impacted by a

modiﬁcation. A number of research studies deals with code traceability [17–19] and

there are various ongoing works on external design traceability. However, just few

works still focus on internal design traceability.

Our work addresses internal traceability by examining links between elements with-

in a class hierarchy (features, links and classes). In this way, we deal only with the

modiﬁcation of class diagrams.

The traceability issue regarding class diagrams is to discover which classes, or at

the lower level, which features will be affected when a modiﬁcation occurs. On the one

hand, traceability does indicate the dependency level of an element in a class diagram.

On the other hand, the dependency level of the element reveals a certain facility of trac-

ing in the class diagram. Therefore, internal traceability is closely connected to com-

plexity and comprehensibility and other quality criteria as stated above. For example,

if a diagram is too complex or too difﬁcult to understand and to analyze, modiﬁcations

will be more difﬁcult to trace. In other words, traceability provide designers with an

indicator predicting the consequences of a modiﬁcation within a class diagram.

4.2 Deﬁnitions

p : string

x : real

p : string

q : string

AS1

r : real

CO2

p : string

r : real

0..1

p : string

q : char

interface

p : string

p : string q : char

0..1

AS2

s : real

CO1

s : real

x : real

p : string

q : string

p : string

Fig.1. Trace and impacted elements of a modiﬁcation.

Before presenting the calculation method of the traceability concept, we present

hereunder some terms and deﬁnitions used in this article.

Modiﬁed Element: is a class or a feature where a modiﬁcation occurs. A modiﬁca-

tion could be :

1. an internal modiﬁcation such as a modiﬁcation of an element within a class (for

example: feature redeﬁnition, feature addition/deletion, etc.);

2. an external modiﬁcation such as a modiﬁcation of a relation between classes (for

example : class redeﬁnition/addition/deletion or links addition/deletion etc.).

Impacted Element: is a class or a feature that must be adapted after implementing a

modiﬁcation.

In ﬁgure 1, a modiﬁcation of feature q in class C demands adaptations of features x

and p in class D, p and q in class E and s in class AS2. So, features x, p, q, s are deﬁned

as impacted elements by a modiﬁcation of feature q.

If x/A denotes

the feature x of the class A¨ and SET

ELEM

IMP(X) is deﬁned as

the set of all impacted elements of X, then SET

ELEM

IMP(q/C) will be {x/D;p/D;

p/E;q/E;s/AS2}.

Dependency: is a link between a “modiﬁed element” and an “impacted element”.

We say that there exists a dependency between them.

4.3 Impact analysis

(a) (b) (c)

m()

m()m()

m()

p()

q()

p()

q()

p()

q()

Fig.2. Dependency in an inheritance relationship.

Firstly, our analysis is based on the dependency analysis during which we list down

all the possible cases which are called dependency rules. 15 dependency rules were

deﬁned in our project.

Hereunder is an example of the dependency rule ”features of a child class depend

potentially on all the features of its super class” illustrated in ﬁgure 2. In a), feature

m of child class B depends on feature m of super class A. In b) and c) feature q of B

depends on features p and q of A. There exists no explicit dependency of a feature of A

on any feature of B.

Secondly, we used modiﬁcation scenarios. Three groups of modiﬁcation scenarios

were delimited:

–

manipulation of features (addition, deletion, redeﬁnition);

– manipulation of links (association link, inheritance link, etc.);

–

manipulation of classes (addition, deletion, redeﬁnition). The manipulation of fea-

tures and links may be a sub-group of classes manipulation.

The two last groups of scenarios are based on the ﬁrst one. According to the depen-

dency rules, we found that:

– features of the same class are impacted;

–

features of the children classes are impacted;

– features of the associated classes (direct and indirect) are impacted. (Based on the

dependency rules deﬁned for the association links, and taking into account the nav-

igability, the composition, aggregation, etc.);

–

features of the dependent classes are impacted.

For example, in ﬁgure 3, the features impacted by a suppression or an addition of

the feature “nom:string” of class Utilisateur are:

–

all features of class Utilisateur

– all features of class Chauffeur

– all features of class TypeVehicule

– all features of class Vehicule

–

all features of class Agence

Altogether, 8 features are impacted by a suppression, an addition or a modiﬁcation

of feature nom:string of class Utilisateur. This impact covers all the possible scenarios

for the feature.

4.4 Computation method

Two kinds of measures are used to quantify a quality criterion: direct or indirect mea-

sures. Direct measures are the measures of an attribute which does not depend on the

measures of another attribute. Indirect measures are measures which must be calculated

by other measures.

We chose to use direct measures in our project because of the following. Several ex-

isting methods use checklists or interviews to evaluate a quality criterion. Such methods

are based on a cognitive system where information is extracted from the experiences and

expertise parameters. These methods require a signiﬁcant number of tests, samples and

human resources. The context of our project did not enable us to carry out the external

validation by such methods. Thus, we tried to ﬁnd internal measures of the class dia-

grams to estimate objectively their traceability. Moreover, computation of these internal

measures may be more easily automated.

Roue

Attribute : undefined

Attribute1 : undefined

Attribute2 : undefined

Operation()

Attribute1 : undefined

Attribute2 : undefined

Pneu

Jante

Attribute : undefined

Operation()

Utilisateur

Nom : string

Vehicule

Operation()

nomImmatriculation : string

kilometrage : real

estLibre : boolean

Operation()

est de type

Agence

nom : stringnom : string

nomImmatriculation : string

kilometrage : real

estLibre : boolean

*r7

0..1

Chauffeur

permis : char

Ressource

permis : char

Mission

reference : string

dateDepartPrevue : string

dateArriveePrevue : string

commentaire : string

reference : string

dateDepartPrevue : string

dateArriveePrevue : string

commentaire : string

1r2

est affecté à

0..1

TypeVehicule

nom : string

hauteur : real

largeur : real

nom : string

hauteur : real

largeur : real

Fig.3. Example of impact analysis in a class diagram.

The distance between the modiﬁed elements and the elements impacted was used to

calculate the traceability. The distance from element X to Y is calculated by all the ele-

ments we have to pass from X to Y (ﬁgure 4). The relationship between the traceability

and the distances is: the longer the distance, the more difﬁcult to trace and therefore the

worse the traceability.

If there exists more than one track between modiﬁed element and impacted element,

the average of distances will be computed.

To calculate the degree of traceability for a given feature, we look for a set of im-

pacted elements of this features. This set includes all the features in the impacted classes

and all the empty impacted classes (an empty class is a class without feature). The dis-

tance between this feature and all impacted elements will be added together. The trace-

ability is the inverse of the distance. If the distance is equal to 0, the traceability is equal

to 1. This means that if the set of impacted elements is null, there is no effort to trace

when a modiﬁcation is implemented to this feature.

modified element

impacted element

Fig.4. Distance trace between modiﬁed element and impacted element.

The traceability of a class will be the sum of the traceabilities of all its features plus

the reverse of the distance between this class with other empty impacted classes.

Suppose that X is an modiﬁed element and SET

ELEM

IMP(X) contains all the

impacted elements of X. We can resume our computation of the traceability of X like

following:

Find SET

ELEM

IMP(X)=Y

..Y

Initiate the variable distance DIS=0

For each i

Calculate DIS(X,Y

)

DIS=DIS+DIS(X,Y

)

EndFor

If DIS==0 then

TRA(X)=1

else

TRA(X)=1/DIS

EndIf

5 Analysis of experiment results

The traceability measurement is implemented in the UML CASE tool Objecteering

in J language. J is the Objecteering internal object-oriented language that allows to

implement new behaviors on the UML meta-model. The module created allows to ﬁnd

the impacted elements and to compute the distance and the traceability accordingly.

5.1 Test models

To analyze the correlation between traceability and factorization, we used one of the

algorithms of factorization of class hierarchies (see section 3). This algorithm trans-

forms a hierarchy into an optimally factorized hierarchy which comprises new classes

allowing to factorize features. As a result, the factorization metrics of the elements after

Classes and

features

BEFORE

Class hierarchy

restructuration

Traceability

BEFORE

Factorization metrics

BEFORE

Factorization metrics

AFTER

Classes and

features

AFTER

Traceability

AFTER

FACTORIZATION

Statics analysis

factorization metrics after the

transformation is greater than before

BEFORE

FACTORIZATION

Fig.5. Test protocol.

the transformation are necessarily better than before, so a computation of factorization

metrics is not needed.

We use a method combining the replicated and synthetic methods [20]. The compu-

tation was realized on three projects with 100 classes and 264 properties. If a signiﬁcant

change between the traceability before and after the transformation is found, we will be

able to identify a correlation between the traceability and the factorization of the ele-

ments in the class hierarchy.

5.2 Experiment results

The results of the tests summarize as follows:

For classes:

Only one class among 100 has a worse traceability after the restructuring and the re-

duction for this class is -0,0032 whereas the average increase is 0,1457. The overall

results are:

– 88% of classes have a better traceability after the transformation;

– 11% of classes have the same traceability after the transformation;

– 1% of classes have a worse traceability after the transformation.

the average of the traceability degree before the transformation is 0,1023 and the av-

erage of the traceability’s degree after the transformation is 0,4051 (4 times more).

For features

1. Only 5 features out of 264 features have a worse traceability after the transforma-

tion and the average reduction for these ﬁve features is -0,0052 where as the average

increase is 0,1077. The overall results are:

–

95% of features have a better traceability after the transformation;

–

2% of features have the same traceability after the transformation;

– 3% of features have a worse traceability after the transformation.

2. the average of the traceability before the transformation is 0,0560 and the average

of the traceability after the transformation is 0,1638 (3 times enhancement).

As a result, this experiment shows that, on our examples, it is easier to trace modiﬁ-

cations within UML class hierarchies reorganized by one of our algorithms. Therefore,

applying the algorithm of reorganization, allows to increase tracebility, thereby making

maintenance easier.

6 Conclusion

In this article, we have presented a limited experiment of the validation of the correla-

tion between the factorization of a UML class hierarchy and the traceability within this

hierarchy. We have proposed a deﬁnition of the concept of traceability that relies on the

fact that the majority of the impacts of a modiﬁcation in a UML class hierarchy may

be found using the different relations between the classes of the hierarchy (inheritance

and associations). We have implemented an algorithm to compute this traceability and

compared the results of this computation on class hiearchies before and after their re-

structuration by an algorithm that optimizes factorization. On those examples, we could

observe a deﬁnite improvement of traceability once the restructuration of the class hi-

erarchy has been carried out.

This work should be extended in two directions. Firstly, we should realize other ex-

periments in order to conﬁrm the positive results of this study. Ideally, we should be able

to have enough numerical results in order to make a statistical analysis. Secondly, there

exists many more quality criteria on which an external validation should be performed

in order to validate our entire quality model.

References

1. IEEE1061: Software Quality Metrics Methodology. IEEE Standard. 1061-1992 edn. Insti-

tute of Electrical and Electronics Engineers, Inc., New York (1998)

2. ISO9126: Information Technology - Software Product Evaluation - Software Quality Char-

acteristics and Metrics. International Organization for Standardization, Geneva, Switzerland

(1998)

Bansiya, J., Davis, C.: A Hierarchical Model for Object-Oriented Design Quality Assess-

ment. IEEE Transactions on Software Engineering 28 (2002) 4–17

4. Dromey, R.G.: A Model For Software Product Quality. IEEE Transactions On Software

Engineering 21 (1995) 146 –162

Chidamber, S.R., Kemerer, C.F.: A Metrics Suite for Object Oriented Design. IEE Transac-

tions on software engineering 20 (1994) 476–493

Lorenz, M., Kidd, J.: Object-Oriented Software Metrics, a Practical Guide. Prentice Hall

(1994)

Dao, M., Huchard, M., Leblanc, H., Libourel, T., Roume, C.: A New Approach to factor-

ization – Introducing Metrics. In: Proc. Metrics 2002 – 8th International Software Metrics

Symposium, Ottawa, Canada (2002)

Fenton, N.E.: Software Metrics: A Rigorous Approach. Chapman & Hall, London (1992)

9. Zuse, H.: Software Complexity: Measures and Methods. 1st edn. Walter de Gruyter, Berlin

(1991)

10. Rasheed, T.M., Tran, C.N., Dao, M.: Validation th

eorique des m

etriques de factorisation

(Internal Validation of Factorization Metrics). MACAO project report 5.2.1, France T

ecom

R&D (2003)

11.

Dao, M., Huchard, M., Libourel, T., Roume, C.: Evaluating and Optimizing Factorization

in Inheritance Hierarchies. In Black, A.P., Ernst, E., Grogono, P., Sakkinen, M., eds.: The

Inheritance Workshop at ECOOP 2002, University of Jyv

askyl

a (2002) 38 – 43

12.

Kabaili, H.: Changeability of Object-Oriented Software Systems: Architectural Properties

and Quality Indicators. Doctoral thesis, Universit de Montr

eal, Montr

eal, Qubec, Canada

(2002)

13. Bengtsson, P., Lassing, N., Bosch, J., van Vliet, H.: Architecture-Level Modiﬁability Anal-

ysis (ALMA). Journal of Systems and Software 69 (2003) 129 – 147

14.

Sahraoui, H.A., Grosser, D., Valtchev, P.: Predicting software stability using Case-Based

Reasoning. In: Proceedings of the 17th IEEE International Conference on Automated Soft-

ware Engienering (ASE ’02). (2002)

15. Lanubile, F., Visaggio, G.: Decision-driven maintenance. Journal of Software Maintenance:

Research and Practice 7 (1995) 91 – 115

16.

Lindvall, M., Sandahl, K.: Practical Implications of Traceability. Software Practice and

Experience 26 (1996) 1161 – 1180

17. D. Kung, J. Gao, P.H., Wen, F.: Change Impact Identiﬁcation in Object Oriented Software

Maintenance. Proceedings of the International Conference on Software Maintenance (1994)

18. G. Antoniol, G. Canfora, A.d.L.: Maintaining Traceability During Object-Oriented Software

Evolution, a Case Study. Proceedings of the IEEE International Conference on Software

Maintenance (1999)

19.

P. Clarke, B. Malloy, P.G.: Using a Taxonomy Tool to Identify Changes in OO Software

. Proceedings of the European Conference on Software Maintenance and Reengineering

(2003)

20.

Emam, K.E.: A methodology for validating software product metrics. Technical report,

National Research Coucil, Canada (2000)