Involving End-users in Domain-Speciﬁc Languages Development

Experiences from a Bioinformatics SME

∗

Maria Jose Villanueva, Francisco Valverde and Oscar Pastor

PROS Research Center, Universitat Politècnica de València, 46022 Valencia, Spain

Keywords:

Software Engineering, Domain-Speciﬁc Languages, Agile Development.

Abstract:

Involving end-users in software development is a goal envisioned by the Software Engineering community.

As they have the domain knowledge, it is feasible to develop software applications that really fulﬁl their

requirements. Domain-speciﬁc languages (DSL) are widely applied to accomplish this vision. However,

end-users collaboration in DSL development is also important to ensure that their needs are well understood

and represented. This research work proposes a DSL development process that combines methodological

guidelines for DSL development with good practices from agile methods to encourage end-user involvement.

In this paper, we overview the complete process and we focus on the two ﬁrst stages: Decision and Analysis.

In order to illustrate the proposal, it is applied in the development of a DSL for a bioinformatics SME that

works on genetic disease diagnosis.

1 INTRODUCTION

The importance of the end-user involvement in soft-

ware development has been a topic widely discussed

in the academia (Ko et al., 2011; Fischer et al., 2004).

The role of end-user experts in any development pro-

cess is essential, especially in complex domains as

bioinformatics, where most of the domain knowledge

is difﬁcult to be translated by developers to a suitable

implementation.

Domain-speciﬁc languages (Fowler, 2010) have

been proposed as a solution to reduce the knowledge

gap between end-users (or domain experts) and devel-

opers. According to Spinellis (Spinellis, 2001), some

of their beneﬁts are: 1) the concrete expression of do-

main knowledge; 2) the direct involvement of the do-

main expert in the software life cycle; and 3) the better

expressiveness to describe domain implementations.

According to Van Deursen et al. (Van Deursen et al.,

2000), DSLs can abstract the underlying software im-

plementation as a set of domain concepts that end-

users can easily understand. If an executable DSL is

provided, end-users can use it to write their own pro-

grams.

Current DSL development methodologies (Strem-

beck and Zdun, 2009; Ceh et al., 2011) take into ac-

count end-users in requirements gathering and in de-

ployment. Usually, developers create DSLs from a re-

∗

Small and Medium Enterprise

quirements speciﬁcation without additional end-users

participation. As a consequence, the discovery of any

domain misunderstanding is delayed until a ﬁrst ver-

sion is delivered. This fact substantially increases the

development time of the DSL if, for instance, the lan-

guage editor or the execution environment must be

reimplemented accordingly.

Agile Software Development (Highsmith and

Cockburn, 2001) and the Agile Manifesto (Beck et al.,

2001) have praised for a set of good practices to im-

prove software development. Among these practices,

they have encouraged the involvement of end-users

(or stakeholders) throughout the project, and the short

delivery of prototypes for early error detection. We

believe that some of these practices can be proﬁtable

in the context of DSL development.

The contributions of this research work are: 1) the

identiﬁcation of good practices from agile methods

that apply in the DSL development context; and 2) the

inclusion of these practices inside a DSL development

process to improve end-user involvement.

Our DSL development process is based on the

methodological guidelines proposed by Mernik et al.

(Mernik et al., 2005)—to deﬁne the different pro-

cess stages and activities— and the agile methods

XP (Beck and Andres, 2004), Scrum (Schwaber and

Beedle, 2002) and Agile Modeling (Ambler, 2002)—

to analyse different agile practices. Brieﬂy, the pro-

cess is conﬁgured to create the DSL incrementally by

Villanueva M., Valverde F. and Pastor O..

Involving End-users in Domain-Speciﬁc Languages Development - Experiences from a Bioinformatics SME.

DOI: 10.5220/0004450000970108

In Proceedings of the 8th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE-2013), pages 97-108

ISBN: 978-989-8565-62-4

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

means of small releases that satisfy a set of require-

ments. End-users participate in some activities during

the development of each release—so they can provide

constant feedback— and also evaluating the resulting

DSL release implementation—so any detected error

can be ﬁxed in the next iteration.

In this paper we provide an overview of the pro-

cess and we focus on the two ﬁrst stages: Decision

and Analysis. To illustrate both stages, we describe

the development of a DSL for genetic disease diagno-

sis, which has been elaborated in close collaboration

with experts from a bioinformatics SME. Due to the

complexity of this evolving domain, this real scenario

highlights the need of sound methodological practices

to support a good communication with domain ex-

perts.

The rest of the paper is structured as follows. Sec-

tion 2 reviews other approaches that research how to

involve end-users or how to use agile practices in DSL

development. Section 3 overviews the proposal: ﬁrst,

the methodological base for DSL development (pro-

posed by (Mernik et al., 2005)); second, the agile

practices that apply in DSL context; and third, the ag-

ile DSL development process proposed. Section 4 and

5 describe in detail the Decision and Analysis stages,

respectively, and several examples from a DSL for ge-

netic disease diagnosis illustrate each stage. Section

6 discusses the main ﬁndings while applying the pro-

posal. Finally, section 7 explains the conclusions and

the future work.

2 RELATED WORK

Agility and end-user involvement in DSLs develop-

ment have been previously addressed by the Software

Engineering community.

On the one hand, several authors have addressed

end-users involvement within software development

processes.

The authors of (Pérez et al., 2011) propose a new

method that encourages the analysis of end-users role

in the development process. This approach takes

into account good practices of end-user development,

analyses user-requirements and develops the DSL ac-

cordingly. As a result, their method generates a visual

domain-speciﬁc language by means of a tool. This ap-

proach adopts end-user development good practices

to improve DSL development, which better satisfy

user needs, with the aim to involve them afterwards

in the future Model-Driven development process. On

the contrary, our approach adopts good practices from

agile methods to promote end-user involvement dur-

ing the DSL development and to ensure their prefer-

ences are fulﬁlled within the DSL.

Also, the authors of (Izquierdo and Cabot, 2012)

provide a collaborative infrastructure for their in-

volvement in every stage of DSL development. Using

their proposal end-users and developers interact un-

til artefacts of each stage satisfy end-users’ require-

ments. This approach supports an active involvement

of end-users in all the stages of the development pro-

cess. On the contrary, our approach promotes the use

of agile practices as a way to make end-users involve-

ment easier, but avoids their participation in the def-

inition of artefacts outside their domain knowledge,

such as conceptual models, which sometimes may be

useless, if not detrimental.

On the other hand, several authors have addressed

the application of agile principles on software devel-

opment processes.

The authors of (Grigera et al., 2012) propose to

bridge agile practices with model-driven development

of Web applications by using Test-Driven Develop-

ment. Starting with a set of user stories, authors

propose the use of graphical user interface mock-

ups for improving stakeholders involvement and re-

quirements gathering. Authors also introduce a DSL,

WebSpec, for specifying the interaction and naviga-

tion requirements. Following an agile iterative cycle

and model-driven principles, a WebML model is de-

rived that represents a Web application. Our approach

shares some agile practices from this proposal, but

it is mainly focused on DSL development instead of

software generation from requirements.

Also, the work of (Visser, 2008) proposes the use

of good practices from agile methods in DSLs devel-

opment, such as carrying out the DSL development

incrementally, addressing one concern at a time. This

approach addresses the design and implementation of

the DSL from abstractions obtained after analysing

domain technologies, instead from a previous domain

analysis. Likewise, our approach applies agile prac-

tices for DSL development and adopts an iterative cy-

cle. However, our approach follows a deductive pro-

cess (where a domain analysis is used in the design

and this design is also used in the implementation)

and promotes end-users involvement to improve their

satisfaction about requirements coverage.

3 OVERVIEW OF AN AGILE DSL

DEVELOPMENT PROCESS

Mernik et. al. discusses (Mernik et al., 2005) when

and how to develop a DSL and provides a set of guide-

lines for future developers. The authors describe a

development process made up of ﬁve stages: Deci-

ENASE2013-8thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

sion, Analysis, Design, Implementation and Deploy-

ment; and several patterns that aid developers in each

of them. In a recent work (Ceh et al., 2011), they have

extended the process to seven stages adding Testing

and Maintenance. This approach has been widely ap-

plied in different DSL development projects, for ex-

ample (Jr. et al., 2011; Arora et al., 2009; Visser,

2008). Because of these reasons, we have selected

this work as a base of our proposal.

Figure 1 (taken and adapted from (Ceh et al.,

2011)) depicts the stages—with the corresponding in-

puts and outputs— and the proposed patterns for the

the ﬁrst four stages: First, a Decision (1) is made

about developing or not the DSL taking into account

domain experts demands and the current state of the

domain; If so, an Analysis (2) is performed,using dif-

ferent assets that represent the domain, to obtain a set

of formal artefacts that represents the domain. Once

the domain is clearly represented it is time to Design

(3) the target DSL constructs; those designs are used

to create an executable Implementation (4); which, in

order to identify its correctness, is the subject of a

Testing (5); if everything is correct, a Deployement (6)

is carried out to be used by end-users and, optionally,

a Maintenance (7) will perform any required change.

Agile methods have become widely popular for

achieving end-user involvement and improving the

time-to-market of software products (Beck et al.,

2001). Next, we explain the practices selected from

XP, Scrum and Agile Modeling suitable for DSL de-

velopment:

• Architectural Envisioning. This practice, pro-

posed by the Agile Modeling approach, encour-

ages the early identiﬁcation of a viable technical

strategy. Many times DSL development is guided

by domain concepts, and how the DSL is going

to be executable is decided in late stages. Be-

cause of that, sometimes is difﬁcult to translate

concepts to a working implementation, as some

authors have stated (Visser, 2008). Applying this

practice, developers can select and adapt domain

concepts according to the DSL execution environ-

ment expected.

• User Stories. In order to manage requirements

in agile methods, they are divided in user stories,

which are brief descriptions related by end-users

about a demand that contributes to add value. The

set of user stories provides a simpliﬁed view of

the functional features to be developed. In the

context of DSL development, user stories are an

effective instrument to discover the language con-

structs and concepts to be introduced in the DSL.

Hence, user stories guide the development and

provide traceability between initial requirements

Ontology Driven Development of Domain-Specific Languages

ComSIS Vol. 8, No. 2, Special Issue, May 2011

321

specification, attribute grammars, denotational semantics, or algebraic

specifications for semantic specification).

Decision

Analysis

Design

Implementation

Deployment

Maintenance

Technical literature

Existing

implementations

Custom surveys

Expert advice

Current and future

requirements

Terminology

Concepts

Commonalities

Variations

Syntax

Semantics

DSLDSL

Possible

existing

implementations

Requirements

Yes

Testing

DSL

Fig. 1. DSL development phases

Implementation. Different approaches for DSL development can be used,

such as: interpreter, compiler/application generator, embedding,

(2)

(3)

(4)

(5)

(6)

( )

(1)

DSL

Domain analysis

method

Implementation pattern

i.e. Interpreters,

Compilers/application

generators, Emdedding, ...

Informal, Formal (DSSA,

FODA, ODM), Extract from

code

Design pattern

Language exploitation,

Language invention

Decission pattern

i.e. Notation, AVOPT,

Task automation, Product

line, ...

Figure 1: Original DSL development process(Ceh et al.,

2011).

and the ﬁnal DSL constructs. Also, user stories

are useful to aid end-users to establish their prior-

ities about the requirements to be introduced into

the DSL.

• Acceptance Tests or Usage Scenarios. In order

to check the fulﬁlment of requirements in agile

methods, end-users brieﬂy describe scenarios that

must be accomplished by the software to be devel-

oped. In DSL development such scenarios are in-

teresting not only for testing purposes but also for

improving the understanding about the domain.

In our approach, acceptance tests are the primary

validation source to ensure that the DSL satisﬁes

end-users’ expectations.

• Iteration Planning or Sprint. The majority of ag-

ile methods manage the development in a set of

iterations in which only a small set of features is

implemented. This practice encourages faster re-

InvolvingEnd-usersinDomain-SpecificLanguagesDevelopment-ExperiencesfromaBioinformaticsSME

leases that can be evaluated by stakeholders, or

end-users in our context. DSL development can

beneﬁt from this practice by checking if the con-

cepts required by end-users are included in incre-

mental subsets of the DSL. Also, errors can be

detected without developing the complete DSL in-

frastructure (editor, execution environment, etc.).

• Incremental Design. Together with a development

based on iterations, agile practices promote the

incremental design of the software (models, UI,

components, etc.). Many DSL development ap-

proaches design the whole abstract language from

a previous domain analysis. The design of a full

DSL is a time-consuming task and it is not ﬁn-

ished until end-users evaluate the result. We be-

lieve that if DSL developers and end-users are fo-

cused on an small set of constructs or user stories,

DSL can be built more easily and it is more ﬂexi-

ble to changes.

Our work proposes an agile DSL development

process that combines the aforementioned method-

ological guidelines with the chosen agile practices,

to provide faster small releases of the DSL and to

involve end-users. Figure 2 overviews the proposal.

The most relevant feature of the process is the in-

clusion of an iterative cycle for the stages Analy-

sis, Design, Implementation and Testing. The stages

Deployment and Maintenance have been avoided for

simpliﬁcation purposes.

In the Decision stage all decision patterns—

Notation, AVOPT, Task automation, Product line,

Data structure representation, Data structure traver-

sal, System front-end, Interaction and GUI construc-

tion from Mernik et. al guidelines—are adopted to

decide if developing the DSL is worth. End-users pro-

vide information about existing tools and about their

requirements. Following the agile practice architec-

tural envisioning, we propose to make a deeper use

of this information to elaborate a review (1) where

available technology such as DSLs, software tools,

frameworks, etc., are assessed regarding end-users’

requirements. This review is useful to better justify

the decision to develop the DSL and, eventually, to

make early decisions about the architecture and the

implementation platform.

After this stage, following the agile practices it-

eration planing and incremental design, the original

sequential process, which includes Analysis, Design,

Implementation and Testing, is transformed into an

iterative process. Decision Stage remains outside be-

cause the decision to design the DSL is only made

once.

The proposed iterative cycle gathers:

• Analysis Stage. Following the agile practices user

stories, acceptance tests and iteration planning,

this stage is extended with two steps:

– Requirements Speciﬁcation (2). This step re-

ﬁnes the preliminary requirements speciﬁcation

detailing all requirements expected for a com-

plete DSL. To comply with agile practices, end-

user’s requirements are divided in a set of user

stories (2a) and acceptance tests (2b), in a sim-

ilar way than XP and Scrum.

– Iteration Planning (3). This step completes the

speciﬁed user stories with their priorities and

settles a list (or sprint backlog using the Scrum

terminology) that deﬁnes their implementation

order according to end-users’ preferences. In

order to optimize the iterative cycle, in each it-

eration a set of user stories are chosen to be im-

plemented, instead of addressing one user story

per iteration. An agreed release deadline is

scheduled and an acceptance test (3a) that cov-

ers all these user stories is deﬁned.

– Domain Analysis (4). This step follows the in-

formal pattern from Mernik et al. guidelines to

create a set of artefacts that precisely describe

the DSL domain. We choose this pattern as

a way to avoid the overloading that carrying

out a complete formal method may introduce.

We adopt the deﬁnition of a feature model (4a)

(ﬁrstly introduced by (Kang et al., 1990)) to ex-

press the variabilities and commonalities of the

DSL, and the deﬁnition of a conceptual model

(4b) (a UML Class Diagram) to precisely de-

scribe the terminology, the concepts involved

in the DSL and the relations among them. To

comply with the agile practice incremental de-

sign, these artefacts are created incrementally,

so they represent only the sub-domain related

with the selected user stories of the current iter-

ation (and the previous ones).

Due to end-users are more interested in talking

about their domain problems than working in soft-

ware development issues that address them (Fis-

cher et al., 2009), we have promoted their partic-

ipation in steps directly related with such prob-

lems: In the Requirements Speciﬁcation step they

explain their requirements and tests to be satisﬁed,

and in the Iteration Planning step they point out

their priorities and specify a scenario that gath-

ers several user stories. On the contrary, their in-

volvement in the development of feature and class

models is avoided. In our experience, end-users

are not comfortable with this kind of models, and

the lack of understanding could create mistakes

(Costabile et al., 2008).

ENASE2013-8thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

100

Im p lem entation

A n alysis

en d-users

D ecisio n

Re q uirem en ts

Techn o lo g ical

re v ie w

en d-use rs

S e m a n tic s

S y n ta x

A b stra ct S y n ta x

C o re Syn ta x

(te xtual)

E n d -U ser

S y ntax (vis u a l)

Requirem en ts

Spec ificatio n

en d-use rs

U se r

Story 3

U se r

Story 2

U se r

Story 1

A cce p ta n ce

te st

A cce p ta n ce

te st'

Ite ra tio n x Ite ra tio n x '

Variab ilitie s/

Com m on alitie s

D om ain

Conce p ts

Sce na rio

Ite ra tio n i

Te sts

Sto rie s

en d-use rs

Valid a tio n

Pro to ty pe

4a 4b

Figure 2: Overview of the agile DSL development process proposed.

• Design Stage (5). Following the agile practice in-

cremental design, the syntax and semantics de-

signs represent only the user stories of the cur-

rent iteration (and the previous ones). As an op-

tional step, end-users can express their prefer-

ences about the end-user oriented (concrete) syn-

tax (5a), which normally is a visual projection of

the core (concrete) textual syntax. Additionally,

developers validate that the Abstract syntax can be

instantiated to support all acceptance tests speci-

ﬁed in the Analysis stage (5b).

• Implementation Stage (6). Following the agile

practices incremental design and architectural en-

visioning, the DSL is implemented taking into ac-

count the partial syntax and semantics from the

Design stage and the technological review per-

formed in the Decision stage. As a result, a pro-

totype (6a) DSL that complies with the set of user

stories (from this iteration and the previous ones)

is created (or evolved from previous iterations).

End-users are not involved in this stage.

• Testing Stage (7): In this stage the prototypical

implementation is tested against the requirements

from the current iteration. First, the speciﬁed ac-

ceptance tests are checked. And then, the pro-

totype is shown to end-users so they check the

acceptance tests and provide their opinion (7a):

agreeing with the functionality they expected and

rejecting the functionality they don’t like. Addi-

tionally, end-users can provide insights about new

requirements they are interested in.

After the last stage, if there are some requirements

left, a new iteration is started. Because the Testing

stage captures feedback from end-users, requirements

may have been updated, removed, added or changed

their priority, so next iteration may need to perform

some adjustments. Concretely, all these changes are

addressed in the Requirements Speciﬁcation and Iter-

ation Planning steps from the Analysis stage. Since

requirements and priorities are revisited each itera-

tion according to end-users feedback, the develop-

ment planning is constantly adapted to satisfy their

ongoing necessities.

4 DECISION STAGE

In this section, we apply the Decision stage of our

proposal to develop a DSL for customizing genetic

disease diagnosis software. First, we explain a real

scenario in the genetics domain where geneticists face

a set of problems related with end-user software de-

velopment. As a solution, we propose the develop-

ment of a DSL and we explain how it ﬁts to solve

their problems. Second, we provide a preliminary de-

scription of geneticists’ requirements, a review about

technologies for the development of genetic disease

diagnosis software and the suitable decision patterns

that apply within the context of the DSL.

InvolvingEnd-usersinDomain-SpecificLanguagesDevelopment-ExperiencesfromaBioinformaticsSME

101

4.1 Genetic Disease Diagnoses

In the genetics domain, DNA structures and their

relationships with human traits are too difﬁcult and

heterogeneous to be understood by people outside

the domain. For this reason, geneticists with some

programming knowledge have been traditionally de-

veloping their own databases, analysis algorithms,

scripts, frameworks and development tool-kits over

the years.

Even so, as not every geneticist is experienced

in software development, they create scientiﬁc work-

ﬂows to reuse available software applications. Unfor-

tunately, these software applications are often created

to fulﬁl speciﬁc requirements rather than being de-

signed for reuse. Customization and data exchange, if

possible, often requires advanced programming skills

that geneticists don’t usually have.

A common scenario to face this problem is that

geneticists create spread sheets to perform data oper-

ations and copy results into text ﬁles themselves to

exchange data among different applications. This sit-

uation highly impacts on their productivity, and it has

become even more critical since the recent improve-

ments in sequencing technologies, which now pro-

duce more genetic data than experts are able to anal-

yse (Rusk, 2009).

An example of this situation is in IMEGEN

(IMEGEN, ), a bioinformatics SME working on the

diagnosis of genetic diseases. In order to understand

its context, Figure 3 overviews its process for genetic

disease diagnosis:

1. Patient’s Variation Gathering Phase: A patient’s

DNA sample is inspected with a software tool

(chosen according to the diagnosis technique per-

formed) to obtain a report of genetic variations.

The term variation is used to express how the con-

tent of the patient’s DNA in a speciﬁc position

is different from the content of a “disease-free”

DNA reference sequence in the same position.

2. Disease Knowledge Phase: Each variation de-

tected in the previous phase is searched in differ-

ent databases and websites until success. If a vari-

ation has been previously described, additional in-

formation related to a genetic disease is retrieved.

If not, it is characterized as an unknown variation.

3. Report Generation Phase: All genetic variations

and their associated information are gathered in a

report according to geneticists’ preferences. To do

this, they use a spread sheet or a word-document

template to represent all the information.

If the analysis outcome is only a single gene, ge-

neticists from IMEGEN are able to perform this pro-

R e p o rt G e n era tio n

V a ria tion G a the rin g D ise a se K n o w led g e

A lig n m en t

T o o l

S a m p le

A n aly sis

G D B

V a ria tions

D ete cte d

V ariatio n s

R ep o r t

U n k n o w n

V aria tio ns

B iblio grap y

& D ise a se

C G H A rray

T o o l

R e sults

K n o wn

V aria tio ns

D ise as e

V aria tio ns

U n h arm fu l

V aria tio ns

...

Figure 3: IMEGEN’s Genetic Disease Diagnosis Process.

cess manually: interacting with different tools and

searching information through different databases.

However, when they must manage bigger amounts of

genetic data and take into account a wider array of

knowledge resources (websites, databases, scientiﬁc

papers, etc.), their current procedure is no longer fea-

sible.

Geneticists from IMEGEN need to create their

own software applications that support different ge-

netic disease diagnoses, but the software mechanisms

to do it must avoid programming issues and hide tech-

nological details. In this context, a DSL is a good

choice to provide such mechanisms.

Working with IMEGEN, we have realized that the

genetic disease diagnoses observed share the same

process, but also, each disease diagnosis introduces

slight variations. For example, both Medullary Thy-

roid Cancer and Achondroplasia diagnoses require

identifying genetic variations that geneticists asso-

ciate with the target disease. However, the Cancer

is analysed with a technology that provides the pa-

tient’s sequence in SEQ format, while the technology

for the Achondroplasia provides the patient’s varia-

tions in VCF format. Once obtained the variations, for

the cancer, only the exons (regions) 10 and 11 from

the gene RET are validated with different databases

while for the Achondroplasia, locating the variations

“p.Gly380Arg” or “p.Gly375Cys” in the gene FGFR3

is enough to conﬁrm the disease. All these common-

alities and variabilities that are directly related with

the genetics domain are well understood by geneti-

cists so they are perfect candidates to become con-

structs of the DSL.

In summary, geneticists can be provided with a

“programming language” for specifying genetic dis-

ease diagnoses. This new language will provide ab-

stractions at the genetic domain knowledge level, that

is, related to genetic concerns that geneticists under-

ENASE2013-8thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

102

stand, in order to manage the underlying technologi-

cal implementation.

4.2 A DSL for Customizing Genetic

Disease Diagnosis Software

With this goal in mind, we address the deﬁnition of

a DSL to compose and customize genetic disease

diagnosis software. The DSL will support mono-

genic and multigenic disease diagnoses. At the mo-

ment, among all kinds of diseases, the DSL ad-

dresses the ones where a cause-effect relation can

be established between a genetic variation and a

disease (an example list can be downloaded from

http://www.imegen.es/cms_genetic_diagnosis.php).

A brief summary of the requirements they expect

to create a customized tool are:

• R1. Retrieve all variations from a patient once

its sample has been analysed chemically: because

there are different techniques and technologies

that use different outputs, this requirement is de-

composed in three:

– R1.1 Get variations from a ﬁle that contains the

patient’s sequence data.

– R1.2 Get variations from a ﬁle that contains the

alignment between the patient’s sequence and a

reference sequence.

– R1.3 Get variations from a ﬁle that contains the

patient’s variations.

• R2. Search the patient’s variations in differ-

ent genetic databases: each analysis will require

a different set of genetic databases, sometimes

databases that gather knowledge from different

diseases and others disease-speciﬁc databases.

• R3. Filter the list of patient’s variations for a spe-

ciﬁc region or look for a set of interesting varia-

tions: when a disease is being diagnosed not every

variation is always relevant.

• R4. Gather in a report only the essential informa-

tion for the target disease: using their expertise,

geneticists choose the meaningful variations and

the proper knowledge related with them.

Now, end users enumerate the tools they know in their

domain to “create” themselves their own software ap-

plications. Following our approach, we review these

tools to evaluate the support of the named require-

ments. Our goals are to assess if the current state of

the domain technology satisﬁes geneticists’ expecta-

tions, to detect if a DSL is useful in the domain, and to

identify what could be reused for its implementation

environment.

Table 1: Technological Support of Geneticists Require-

ments.

Biojava Taverna Galaxy Alamut

R1.1 Partially No No No

R1.2 Partially No Yes No

R1.3 No No Yes Yes

R2 No Partially Partially Yes

R3 No Partially Partially No

R4 No Yes No Partially

Some examples of these tools are: Biojava (ver-

sion 3.0.5) (Holland et al., 2008), Taverna (version

2.4) (Hull et al., 2006), Galaxy (online version)

(Goecks et al., 2010) and Alamut (version 2.2) (Ala-

mut, ). Table 1 summarizes the coverage of the afore-

mentioned requirements by the suggested tools.

Biojava is a framework, based on the Java lan-

guage, composed by a set of programming libraries

to create Java programs for analysing genetic data.

Biojava supports the reading of several ﬁle formats

and the execution of analysis algorithms; however,

in order to combined them to identify the patient’s

variations, geneticists should have Java programming

skills. The rest of requirements are not supported.

Taverna is a desktop tool for the design, edition

and execution of workﬂows based on the integra-

tion of web services specially focused on the biolog-

ical domain. It allows retrieving data from databases

that comply with Biomart technology (Haider et al.,

2009), and saving results in XML or XLS format.

However, some programming knowledge is required

to search variations inside the data obtained by re-

trieval services and to ﬁlter them. It does not support

to identify variations from patient’s data.

Galaxy is a web environment that integrates a

set of genetic services and allows running those ser-

vices independently or creating workﬂows combining

them. It integrates services that combined, eventually,

are able to identify variations from alignments or im-

port a list of variations, retrieve data from a set of ge-

netic databases and ﬁlter the obtained data sets. How-

ever, it requires a deeper knowledge of Galaxy struc-

ture and programming knowledge to identify varia-

tions from patient’s sequence data and search patient’s

variations in genetic databases. It does not support to

create customized reports.

Alamut is a desktop tool that provides a user-

friendly interface for the interpretation of genetic

variations. Variations can be retrieved by reading a

ﬁle that contains a list or adding them, one by one,

manually. Once in the system, variations can be

searched into a set of predeﬁned genetic databases.

However, in order to create customized reports users

must program templates and perform post processing

InvolvingEnd-usersinDomain-SpecificLanguagesDevelopment-ExperiencesfromaBioinformaticsSME

103

operations. It is not possible to read variations from

the patient’s sequence or alignments, neither to ﬁlter

variations.

This technological review envisions the need for

more streamlined and easier to use software cus-

tomization mechanisms, because although almost all

requirements can be satisﬁed by the analysed tools,

geneticists are forced to learn a programming lan-

guage and deal with technological details (XML con-

ﬁguration ﬁles, workﬂow design, database connec-

tions, etc.) to fully accomplish their necessities.

Mostly, they are reluctant to learn a new technology

because they are not sure if the time invested is worth.

A DSL will hide these low-level details and

its technological implementation will integrate some

functionalities of these tools under higher level con-

structs related with diagnosis concerns. For exam-

ple, from the reviewed tools it can be reused: 1) ﬁle

readers from Biojava; 2) genetic data retrieval mecha-

nisms from Taverna, Galaxy or Alamut; 3) ﬁlters from

Galaxy; and 4) ﬁle writers from Taverna or Alamut.

After the technological review of bioinformatic

tools, we go on with the decision. Among all deci-

sion patterns adopted from the original methodology,

we consider that the following ones justify the deci-

sion to proceed with the DSL development:

1. Task Automation. The DSL can automate a set

of activities that geneticists are performing man-

ually. For example, nowadays geneticists are re-

sponsible for the interoperability among software

applications, a difﬁcult task due to the wide array

of genetic data formats available. Thus, a set of

DSL constructs will manage the format interoper-

ability among different software artefacts.

2. Product Line. The DSL can be used to conﬁg-

ure highly similar genetic disease diagnoses that

conform to the general process. Geneticists will

express the speciﬁc properties depending of the

disease and the diagnosis approach.

3. Interaction. The DSL can hide technological de-

tails that are outside the scope of geneticists’

knowledge. For example, geneticists are respon-

sible to discern the most suitable DNA alignment

algorithm from a wide array of choices. This

should be decided using genetic and diagnosis

features instead of algorithm-related implementa-

tions. For example, geneticists should choose be-

tween “protein alignment” and “nucleotide align-

ment” instead of between “blastp” (blast algo-

rithm for proteins) and “blastn” (blast algorithm

for nucleotides) algorithms.

4. GUI Construction. The DSL can be used to cus-

tomize how geneticists interact with data. They

will be able to describe their own interfaces by

indicating the inputs they want to provide to the

diagnosis and the reports to be generated.

5 ANALYSIS STAGE

In this section, we apply the Analysis stage of our

proposal to develop a DSL for customizing genetic

disease diagnosis software. We explain the Require-

ments Speciﬁcation, Iteration Planning and Domain

Analysis steps of the ﬁrst iteration and provide a set

of examples.

5.1 Requirements Speciﬁcation

In this step, the preliminary requirements speciﬁca-

tion is extended with the elaboration of user stories

and acceptance tests.

As we have discussed in Section 3, user stories

and acceptance tests are the agile practices selected

for requirements speciﬁcation. In order to describe

user stories, agile methods encourage the use of few

sentences written by end-users avoiding the use of dif-

ﬁcult technical syntax. These sentences usually de-

scribe the users role, the action to be accomplished

and the goal they pursue. A set of examples from our

DSL for are:

1. User story 1 (associated with requirement R1.1):

“I want choose a ﬁle with SEQ format from the

computer, so that the patient’s sequence is read”.

2. User story 2 (associated with requirement R1.1):

“I want choose a reference sequence from the

computer, so that it can be used afterwards to per-

form comparisons”.

3. User story 3 (associated with requirement R1.1):

“I want to get a reference sequence from the NCBI

database by indicating its RefSeq identiﬁer, so

that it can be used afterwards to perform compar-

isons”.

4. User story 4 (associated with requirement R1.1):

“I want to align a patient sequence against a refer-

ence sequence, so that I can see how they match”.

5. User story 5 (associated with requirement R1.3):

“I want to choose a ﬁle with VCF format from

the computer, so that the patient’s variations are

read”.

6. User story 6 (associated with requirement R2):”I

want to search the variations against the dbSNP

database, so that patient’s variations that are SNPs

can be identiﬁed”.

ENASE2013-8thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

104

7. User story 7 (associated with requirement R3):

“I want to indicate a list of variations in

HGVS_Protein notation, so that only those vari-

ations are taken into account”.

For each user story, several acceptance tests are de-

ﬁned. An acceptance test is a description of usage

(real inputs and expected outputs) of a user story that

is deﬁned with the aim to check the correctness of

its implementation. When a user story is delivered, it

can only be considered complete after it has passed

all its acceptance tests. These tests are different from

unit tests because they check the functionality taking

into account end-user’s experience, nor the expected

behaviour of internal implementation details. In order

to describe them, it is encouraged the same guidelines

than user stories, to use of few sentences written by

end-users that describe the user role, the action they

will be performed within the system and the response

they expect. The acceptance tests related with the user

story 5 are:

1. Acceptance Test 5.1: “I will choose the ﬁle

3Variations.vcf to check that the three varia-

tions {g.1234A>C, g.4567C>G, g.8910insAA}

are read”.

2. Acceptance Test 5.2: “I will choose the ﬁle

Empty.vcf and I expect to get an error that says

that the ﬁle is empty”.

3. Acceptance Test 5.4 “I will choose the ﬁle NoRef-

erence.vcf and I expect to get an error that says

that the reference has not been indicated”.

4. Acceptance Test 5.3: “I will choose the ﬁle Vari-

ations.sam and I expect to get an error that says

that the ﬁle has the wrong format”.

The major effort invested by end-users in this step is

performed in the ﬁrst iteration. In the following it-

erations, end-users will reﬁne those user stories and

acceptance test that, for any reason, were incorrect or

unambiguous, remove the ones that are not required

anymore and include the new ones.

5.2 Iteration Planning

In this step, end-users prioritize user stories and

choose the ones that are implemented in the current

iteration. In each iteration, end-users deﬁne one or

more acceptance test that gather several user stories.

These acceptance tests, which we name scenarios, are

different from previous acceptance tests because they

are designed to check user stories correctness work-

ing as a unit. An scenario is a reﬂection for end-users

about what they will be able to accomplish with the

prototypical implementation at the end of the itera-

tion.

In our example, in the ﬁrst iteration, end-users to-

gether with developers have established that the most

important user stories are the number 5 and 7 because:

1) they are the most easier and simpler to be trans-

lated into a DSL; and 2) they provide a remarkable

improvement over their current practices. An exam-

ple of scenario that gathers both user stories is:

Scenario 1: “To diagnose the Achrondoplasia Dis-

ease, I will chose the ﬁle “variations.vcf” (which

I obtained from my sequencing machine in the

VCF format) to see if the patient has the variations

p.Gly375Cys and p.Gly380Arg”

This acceptance test reﬂects what geneticists ex-

pect to accomplish at the end of the ﬁrst iteration: 1)

To provide a VCF ﬁle that is saved in their computer

(user story 5); 2) to see the variations once the ﬁle

is selected (user story 5); 3) to provide a list of ex-

pressions in HGVS_Protein notation (user story 7);

and 4) to see only the variations that match in their

HGVS_Protein notation with any of the expressions

provided (user story 7).

5.3 Domain Analysis

The goal of this step is to represent unambiguously

the domain of the DSL. Our process adopts the in-

formal pattern proposed by the original methodology

and proposes the deﬁnition of a feature model and a

conceptual model (as explained in section 3).

According to the agile practice incremental de-

sign, the domain to be represented by both models

corresponds only to the set of user stories addressed

in the current iteration. Because both models are in-

crementally created in each iteration, this step adds

new elements, such as attributes or relationships, or

reﬁnes the existing ones.

Following with our example, the models that rep-

resent the user stories 5 and 7 are:

• The Feature Model (Figure 4) has as a root feature

Genetic Diagnosis, which represents the type of

applications that can be expressed using the DSL.

Taking into account the addressed user stories,

these applications gather two capabilities, which

are represented in the model as two branches:

– Patient Data (left branch): Represents the in-

formation of a patient analysed to diagnose a

genetic disease. In the current iteration, the

only kind of patient’s data is a set of Variations

expressed in the VCF format. This branch rep-

resents the capabilities related to the user story

– Variation Analysis (right branch): Represents

the activities to be performed in order to iden-

tify any evidence of a disease in a patient. In

InvolvingEnd-usersinDomain-SpecificLanguagesDevelopment-ExperiencesfromaBioinformaticsSME

105

VCF

Genetic Diagnosis

Patient Data Variation Analysis

Variations

Filter

Optional

Mandatory

HGVS_Protein

Figure 4: Feature Model from Iteration One.

the current iteration, this analysis consists of

applying a Filter over the list of patient’s Vari-

ations. The ﬁlter criteria is expressed using

the HGVS_Protein notation. This branch rep-

resents the features related to the user story 7.

We must remark that all features explained in this

model are mandatory because in this ﬁrst itera-

tion user stories do not describe any alternative

feature. It is expected that next iterations will pro-

vide new features that will reveal: new branches,

optional features, or new choices under a feature.

For example, there is a user story that requires ﬁl-

tering the variations using the HGVS_Coding no-

tation. To represent this user story, a new child of

Filter will be deﬁned and the mandatory property

of HGVS_Protein will be changed for a choice be-

tween the two notations.

• The Conceptual Model (Figure 5) gathers the

main concepts involved in genetic disease diag-

noses: Disease, Patient and Variation:

– The entity Patient represents a human being

that has taken a genetic test to ﬁnd out if he

or she is genetically predisposed to get a cer-

tain disease. It has an attribute id that identiﬁes

a patient from others. This entity represents a

concept of the user story 5.

– The entity Variation describes the content of the

patient’s DNA in a speciﬁc position or region,

but it is called variation because it is expressed

in regards of a reference DNA. To perform the

diagnosis, geneticists are interested in the Vari-

ations that a Patient has, represented by a com-

position association between both entities. The

attribute HGVS_ProteinNotation is a notation

created by the genetics community to express

unambiguously what is occurring in the patient

at the protein level (which protein should be

created and which one is being created by the

patient). The entity and the relationship repre-

-HGVS_ProteinNotation

Variation

-Name

Disease

1 *

predisposes

-id

Patient

has

Figure 5: Conceptual Model from Iteration One.

sent the concepts involved in the user story 5,

and the attribute HGVS_ProteinNotation rep-

resents a concept involved in the user story 7.

– The entity (Genetic) Disease is a condition re-

lated with the DNA chemicals that causes a

body malfunction. The attribute name identi-

ﬁes the disease inside the medical community.

A diagnose is made when it is found a relation-

ship between a patient’s Variation and a Dis-

ease, which predisposes the patient to get the

disease. This relationship is made thanks to ge-

neticists knowledge about the disease. The en-

tity, attribute and association represent the con-

cepts involved in the user story 7.

6 DISCUSSION

After applying the two ﬁrst stages described to de-

velop a DSL for a bioinformatics SME, we have

found that combining methodological guidelines for

DSL development with agile practices improves end-

user involvement in several ways. This feedback has

been gathered in the meetings with two geneticists,

three bioinformaticians and two developers—while

carrying out the Decision stage and the Analysis stage

of the ﬁrst iteration of the process.

First, end-users are able to understand the artefacts

which they are required to interact and contribute. For

example, they use their own words to specify user sto-

ries and acceptance tests. This fact is important be-

cause a good communication between end-users and

DSL developers reduces misunderstanding about end-

users’ requirements and aids in the detection of errors.

Second, end-users are able to lead the process and

make decisions. The ongoing process can be adapted

to their immediate necessities because they are re-

sponsible to decide which user stories must be accom-

plished in each iteration. This situation encourages

end-users to actively participate because they feel rel-

evant for the DSL development.

Additionally, we have found that end-users ac-

quire a better knowledge of the current state of the

ENASE2013-8thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

106

DSL development process. As they participate in

the creation of the iteration planning, they are aware

about what user stories have been already accom-

plished, what is being addressed and when user stories

are planned to be ﬁnished. This situation improves the

relationship between end-users and developers: end-

users know exactly what to expect from developers

and they do not create themselves false expectations

that lead to future disappointments.

We should remark that these facts are not thor-

oughly validated as we have detected them during the

ﬁrst iteration planning. In future work, we plan to

carry out more iterations to support these ﬁndings and

also to asses the complete process. For example, ask-

ing end-users about their opinions regarding the difﬁ-

culties found, the time they invested and the support

to react against their changing needs.

In summary, the beneﬁts observed up now show

promising results of the use of good practices from

agile methods in DSL development to involve end-

users.

7 CONCLUSIONS

This research work proposes an agile DSL develop-

ment process that adopts a set of practices from agile

methods, with the main goal of improving end-users

involvement in DSL development. The agile practices

included are: 1) an iterative cycle, which addresses

only a set of end-users’ requests each iteration; 2) a

requirements speciﬁcation activity based on the spec-

iﬁcation of user stories and acceptance tests, which

ensures that end-users communicate their ongoing re-

quirements in each iteration; and 3) an iteration plan-

ning step based on priorities and scenarios, which en-

sures that end-users express their ongoing priorities

and expectations. Thanks to these agile practices, a

better involvement of end-users in DSLs development

is expected as the preliminary results show.

We must remark that end-users involvement sup-

poses an overloading for some of them: they do not

appreciate the direct beneﬁts or they are only inter-

ested in the ﬁnished DSL.This fact highlights the ne-

cessity of an agile DSL development process, which

is our main goal.

As future work, we will detail the stages not in-

cluded in this paper: Decision, Implementation and

Testing. Our goal is to clearly deﬁne each stage, se-

lecting the agile practices that improve end-user in-

volvement. Simultaneously, we will continue with the

development of the DSL for genetic disease diagnosis

in order to observe the practical beneﬁts. Addition-

ally, geneticists from the SME will test the ﬁnal DSL

with different disease diagnosis in order to assess the

completeness of the DSL developed.

ACKNOWLEDGEMENTS

The authors would like to thank IMEGEN for all

these years of collaboration, providing both useful

genetic knowledge and a real environment for re-

search, and also GEMBiosoft, especially to Dr. Ana

M. Levin, for its support in the development of

this paper. This work has been developed with the

support of MICINN under the FPU grant AP2009-

1895, the project PROS-Req (TIN2010-19130-C02-

02), and co-ﬁnanced with ERDF.

REFERENCES

Alamut. Interactive biosoftware.

Ambler, S. (2002). Agile modeling: Effective practices.

John Wiley and Sons.

Arora, R., Mernik, M., Bangalore, P., Roychoudhury, S.,

and Mukkai, S. (2009). A domain-speciﬁc language

for application-level checkpointing. In Proceedings of

the 5th International Conference on Distributed Com-

puting and Internet Technology, ICDCIT ’08, pages

26–38, Berlin, Heidelberg. Springer-Verlag.

Beck, K. and Andres, C. (2004). Extreme programming

explained: embrace change. Addison-Wesley Profes-

sional.

Beck, K., Beedle, M., van Bennekum, A., Cockburn, A.,

Cunningham, W., Fowler, M., Grenning, J., High-

smith, J., Hunt, A., Jeffries, R., et al. (2001). The agile

manifesto. http://agilemanifesto.org/principles.html

Accesed 2013, 7(08).

Ceh, I., Crepinšek, M., Kosar, T., and Mernik, M. (2011).

Ontology driven development of domain-speciﬁc lan-

guages. Computer Science and Information Systems,

8(2):317–342.

Costabile, M., Mussio, P., Parasiliti Provenza, L., and Pic-

cinno, A. (2008). End users as unwitting software

developers. In Proceedings of the 4th international

workshop on End-user software engineering, pages 6–

10. ACM.

Fischer, G., Giaccardi, E., Ye, Y., Sutcliffe, A., and

Mehandjiev, N. (2004). Meta-design: a manifesto for

end-user development. Communications of the ACM,

47(9):33–37.

Fischer, G., Nakakoji, K., and Ye, Y. (2009). Metadesign:

Guidelines for supporting domain experts in software

development. Software, IEEE, 26(5):37–44.

Fowler, M. (2010). Domain-speciﬁc languages. Addison-

Wesley Professional.

Goecks, J., Nekrutenko, A., Taylor, J., and Team, T. (2010).

Galaxy: a comprehensive approach for supporting ac-

cessible, reproducible, and transparent computational

research in the life sciences. Genome Biol, 11(8):R86.

InvolvingEnd-usersinDomain-SpecificLanguagesDevelopment-ExperiencesfromaBioinformaticsSME

107

Grigera, J., Rivero, J., Robles Luna, E., Giacosa, F., and

Rossi, G. (2012). From requirements to web applica-

tions in an agile model-driven approach. Web Engi-

neering, pages 200–214.

Haider, S., Ballester, B., Smedley, D., Zhang, J., Rice, P.,

and Kasprzyk, A. (2009). Biomart central portal: uni-

ﬁed access to biological data. Nucleic acids research,

37(suppl 2):W23–W27.

Highsmith, J. and Cockburn, A. (2001). Agile software de-

velopment: The business of innovation. Computer,

34(9):120–127.

Holland, R., Down, T., Pocock, M., Prli

c, A., Huen, D.,

James, K., Foisy, S., Dräger, A., Yates, A., Heuer, M.,

et al. (2008). Biojava: an open-source framework for

bioinformatics. Bioinformatics, 24(18):2096–2097.

Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock,

M., Li, P., and Oinn, T. (2006). Taverna: a tool for

building and running workﬂows of services. Nucleic

acids research, 34(suppl 2):W729–W732.

IMEGEN. Instituto de medicina genomica.

Izquierdo, J. and Cabot, J. (2012). Community-driven lan-

guage development. MiSE’12.

Jr., I. F., Fister, I., Mernik, M., and Brest, J. (2011). De-

sign and implementation of domain-speciﬁc language

easytime. Computer Languages, Systems and Struc-

tures, 37(4):151–167.

Kang, K., Cohen, S., Hess, J., Novak, W., and Peter-

son, A. S. (1990). Feature-Oriented Domain Analysis

(FODA) Feasibility Study. Technical report.

Ko, A. J., Abraham, R., Beckwith, L., Blackwell, A., Bur-

nett, M., Erwig, M., Scafﬁdi, C., Lawrance, J., Lieber-

man, H., Myers, B., Rosson, M. B., Rothermel, G.,

Shaw, M., and Wiedenbeck, S. (2011). The state of

the art in end-user software engineering. ACM Com-

puting Surveys (CSUR), 43(3):21:1–21:44.

Mernik, M., Heering, J., and Sloane, A. M. (2005). When

and how to develop domain-speciﬁc languages. ACM

Computing Surveys (CSUR), 37(4):316–344.

Pérez, F., Valderas, P., and Fons, J. (2011). Towards the

involvement of end-users within model-driven devel-

opment. End-User Development, pages 258–263.

Rusk, N. (2009). Focus on next-generation sequencing data

analysis. Nature Methods, 6(11s):S1.

Schwaber, K. and Beedle, M. (2002). Agile software de-

velopment with Scrum, volume 18. Prentice Hall PTR

Upper Saddle Riverˆ eNJ NJ.

Spinellis, D. (2001). Notable design patterns for domain-

speciﬁc languages. Journal of Systems and Software,

56(1):91–99.

Strembeck, M. and Zdun, U. (2009). An approach

for the systematic development of domain-speciﬁc

languages. Software: Practice and Experience,

39(15):1253–1292.

Van Deursen, A., Klint, P., and Visser, J. (2000). Domain-

speciﬁc languages: An annotated bibliography. ACM

Sigplan Notices, 35(6):26–36.

Visser, E. (2008). Webdsl: A case study in domain-speciﬁc

language engineering. Generative and Transforma-

tional Techniques in Software Engineering II, pages

291–373.

ENASE2013-8thInternationalConferenceonEvaluationofNovelSoftwareApproachestoSoftwareEngineering

108