Genero, Poels, & Piattini, 2010). In a Delphi study by
Erickson and Siau (2007) identifying the kernel of
“essential” UML class diagram use cases, sequence
and state chart are found to have the highest usability
ranks by practitioners and educators from software
industry and academic field. Furthermore, these are
also among the top used diagrams present in the
context of educational material such as books, tools,
courses and tutorials, with percentages of 100% (class
diagram, use case, sequence) and over 96% (State
chart), while also being among the top diagramming
techniques that support conceptual modelling.
3.2 Text Mining Techniques Applied
for Requirements Engineering
Requirements analysis (RA) has a key role in the
process of development of information systems. It is
a challenging task that requires complex knowledge
and skills. The quality of a requirements analysis
affects the quality of the information systems to be
engineered. Nowadays models are largely used for
constructing an organization’s information system
thus also serving the first artefact the quality of which
can be formally tested. UML uses a graphical notation
that can represent both the structure and business
logic of the system to be engineered. Usually,
requirements documents and domain descriptions are
provided in natural language. Processing these
documents using text mining applications (TMA) can
help to extract information that may be useful in a
learning context (e.g. by hinting candidate model
constructs) to assist novices, or in the context of
processing large requirement documents by experts
(e.g. by identifying potential constructs that can be
confirmed or discarded by a human modeler). For
instance, TMA can assist transforming an
unstructured data set into a structured format or a
medium that can be used to generate business model
constructs. Finally, a formal diagram can be drawn
for the system from business requirements
automatically. Earlier research proposed various
methods to draw UML diagrams from using text
mining approach. We will briefly discuss some of
those we consider to be of particular interest.
Montes, Pacheco, Estrada, and Pastor (2008)
presented a natural language processing based
method to generate UML diagrams using a plain text
as an input. The method analyses the given script/text
to extract relevant information, based on which UML
diagrams are drawn. The process of creating,
arranging, labeling and finalizing the UML diagrams
is performed using the following steps: 1. Text input
acquisition, to read and obtain input text scenario, 2.
Syntactic Analysist, to categorize words into various
classes as verbs, helping verbs, nouns, pronouns,
adjectives, prepositions, conjunctions, etc. 3. Text
understanding to infer the meanings of the given text
by using semantic rules (Malaisé, Zweigenbaum, &
Bachimont, 2005), 4. Knowledge extraction, which
extracts required data attributes using a set of rules
(Van Rijsbergen, 1977), 5. UML diagram generation
uses UML notation symbols to draw a UML diagram.
Shahzadi, Ahmad, Fatima, Sarwar, and Mahmood
(2013) proposed a method to identify domain entities
and their relationships from text documents that can
be transformed into a UML diagram. The method
performs a linguists processing on a given text using
open source tool named GATE (Cunningham et al.,
2009) It allows marking entities and relationships
between entities. Their system includes the following
steps: 1. document acquisition, 2. document
processing, 3. XML modeling. Document
Acquisition step obtains an input from a textual
document. Document processing step applies
linguistic processing (e.g., sentence splitter,
tokenizer, part-of-speech tagger). XML modeling is
used to convert a textual data into a formal data-
model. Harmain and Gaizauskas (2000) developed a
method which produces an object-oriented model
from textual documents. Montes et al. (2008) present
a method of generating an object-oriented conceptual
model (e.g., UML class diagrams) from natural
language text. Hasegawa, Kitamura, Kaiya, and Saeki
(2009
) also introduced a tool that extracts
requirements models from natural language texts.
Gelhausen, Derre, and Geiß (2008) and Gelhausen
and Tichy (2007) presented a method to create UML
domain models directly from a textual document. The
authors employ a graph technique as an intermediate
representation of a text. The nodes in the graph
represent sentences and words. Edges indicates
thematic roles and are the core component of the
method as they represent the semantic information in
the text. Graph transformation rules are then used to
build a UML representation. Mala and Uma (2006)
use an NLP pipeline to create a model without the
intervention of a domain expert. The authors claim
that the yielded results are at least as good as or
exceeding human-made class diagrams. Bajwa and
Choudhary (2006) extract nouns and verb
combinations from input texts and map the nouns and
verbs to UML class elements and relations,
respectively. However, to our knowledge, little is
known about how these text mining techniques and
applications can support the task of deriving business
requirements and automatic generation of formalized
requirements documents, such as UML models, from
Text-To-Model (TeToMo) Transformation Framework to Support Requirements Analysis and Modeling
131