Text-To-Model (TeToMo) Transformation Framework to Support

Requirements Analysis and Modeling

Gayane Sedrakyan

1,2

, Asad Abdi

, Stéphanie M. Van Den Berg

, Bernard Veldkamp

and

Jos Van Hillegersberg

Faculty of Behavioral, Management and Social Sciences, Section Industrial Engineering and Business Information

Systems (IEBIS), University of Twente, Enschede, The Netherlands

Faculty of Behavioral, Management and Social Sciences, Section Cognition, Data & Education (CODE),

University of Twente, Enschede, The Netherlands

{g.sedrakyan, s.abdiesfandani, stephanie.vandenberg, b.p.veldkamp, J.vanHillegersberg}@ utwente.nl

Keywords: Requirements Engineering, Requirements Analysis, Conceptual Modeling, Text Mining, Natural Language

Processing, Requirement Analysis Automation, Model Generation.

Abstract: Requirements analysis and modeling is a challenging task involving complex knowledge of the domain to be

engineered, modeling notation, modelling knowledge, etc. When constructing architectural artefacts experts

rely largely on the tacit knowledge that they have built based on previous experiences. Such implicit

knowledge is difficult to teach to novices, and the cost of the gap between classroom knowledge and real

business situations is thus reflected in further needs for post-graduate extensive trainings for novice and junior

analysts. This research aims to explore the state-of-the art natural language processing techniques that can be

adopted in the domain of requirements engineering to assist novices in their task of knowledge construction

when learning requirements analysis and modeling. The outcome includes a method called Text-To-Model

(TeToMo) that combines the state-of-the-art natural language processing approaches and techniques for

identifying potential architecture element candidates out of textual descriptions (business requirements). A

subsequent prototype is implemented that can assist a knowledge construction process through (semi-)

automatic generation and validation of Unified Modeling Lnaguage (UML) models. In addition, to the best

of our knowledge, a method that integrates machine learning based method has not been thoroughly studied

for solving requirements analysis and modeling problem. The results of this study suggest that integrating

machine learning methods, word embedding, heuristic rules, statistical and linguistic knowledge can result in

increased number of automated detection of model constructs and thus also better semantic quality of outcome

models.

1 INTRODUCTION

Our current ICT-driven world of economies depends

hugely on information systems. A tiny failure in an

interconnected web of information systems of various

organizations may result in significant and sometimes

disastrous consequences. The quality of information

systems and thus the process of designing and

producing high quality systems becomes critical. The

process of designing information systems is a highly

complex and challenging task that requires rigorous

analytical skills and experience. The first step in this

process includes externalizing business requirements

into formal model representations that serve the first

artefact for formal evaluation of quality. With the

growing importance of compliance between business

strategy and ICT realizations, as well as

popularization of low/no code platforms that allow

business and citizen developers to design and produce

business applications, conceptual models gain

relevance. Conceptual models use highly abstract

representations and can significantly reduce the

complexity of a problem domain, thus making it

easier to integrate business domain and ICT expertise

in the system design process. Conceptual models also

contain the critical information for designing and

applying effective organizational strategies and a

necessary foundation for constructing an

organization’s information system. Important key

factors affecting the quality of a conceptual model are

knowledge of modeling concepts, of the modeling

language and of the domain to be modelled. Teaching

Sedrakyan, G., Abdi, A., Van Den Berg, S., Veldkamp, B. and Van Hillegersberg, J.

Text-To-Model (TeToMo) Transformation Framework to Support Requirements Analysis and Modeling.

DOI: 10.5220/0010771600003119

In Proceedings of the 10th International Conference on Model-Driven Engineering and Software Development (MODELSWARD 2022), pages 129-136

ISBN: 978-989-758-550-0; ISSN: 2184-4348

129

such knowledge and skills to novice modelers is a

challenging task considering that system analysis is

by nature an inexact skill (Sedrakyan, 2016).

Good

modelers rely mainly on their personal experience,

and the tacit knowledge that they have developed

over time, which is difficult to transfer to junior

modelers. Transferring the academic knowledge and

skills to real world businesses is yet another concern

as the classroom and real-world situations are not

identical. As stated by Schenk, Vitalari, and Davis

(1998), in their early careers novice modelers produce

incomplete, inaccurate, ambiguous, and/or incorrect

information requirements. Several reasons make

conceptual modeling skills very difficult to teach. For

instance, studies on comparing model quality

checking approaches of novices and experts indicate

the poorly adapted cognitive schemata of novice

modelers to identify relevant triggers for verifying the

quality of models. Previous research on observing the

differences in modeling process indicates the linear

problem-solving pattern of novices focusing on one

task at a time vis-a-vis experts’ frequent switches

between modeling activities and simultaneous cross-

validation cycles (80% of their design activity)

(Wang & Brooks, 2007). Tools that assist a modeling

and verification process include techniques that

provide support at the level of modeling outcome,

such simulation for verifying and validating the

semantic quality of models as well as automated

feedback facilitating the interpretation of the results

and/or model verification. However, there is little

support in the process of making design choices when

transferring requirements into models, such as the

choice of candidate components and their

relationships in a model. Our research aims to explore

state-of-the-art text mining techniques and

capabilities in supporting requirements analysis and

modeling task for novices. We start with mapping

existing text mining concepts and techniques with the

modeling technique (such as diagram type) and the

constructs (e.g. elements supported by the diagram

type) to come up with a framework that can guide the

design and development of text-to-model (semi-)

automated generation instrument. Testing the

capabilities with respect to the most standard

modeling techniques used in designing information

systems constitutes a future research direction,

however initial results (as a demo) from a pilot study

is included which compares model constructs

extracted by an experienced human modeler and

those produced by using different text mining

approaches. The remainder of the paper is structured

as follows. Section 2 describes the research

methodology used for the work. Section 3 gives an

overview of related work. Section 4 describes the

proposed method for text-to-model construction

support. Our system evaluation is explained in

Section 5. Finally, section 6 concludes the work with

a discussion of results as well as proposing some

future research directions.

2 RESEARCH METHOD

Theories are based on a systematic view of

phenomena. Kerlinger (1979) proposes a method

based on specifying relations among variables that

use a set of interrelated constructs, variables,

definitions, and propositions. Conceptualizations in

the form of a framework guide research by providing

a visual representation of theoretical constructs,

variables of interest, and their relationships as

suggested by Creswell (1994). Based on such

frameworks, concrete applications can be developed

Morgan (2018). In this paper, we aim to derive a

framework based on earlier literature and previous

empirical research, to guide the design and

development of text-to-model transformation

application to support (semi-)automatic requirements

analysis and conceptual modeling process. This is

achieved by conducting a literature study of existing

text mining and/or machine learning techniques and

further mapping these techniques with corresponding

concepts in a modeling process, such as diagram

types used for conceptual modeling task and their

corresponding constructs such as class, attribute,

association.

3 LITERATURE REVIEW

3.1 Teaching and Learning Context

In a teaching context, model comprehension

difficulties, in addition to the lack of modeling

knowledge, are also associated with the insufficient

level of experience of novices and as a result their

limited cognitive resources to identify relevant

triggers for model verification. Unified Modeling

Language (UML) is a standard language that is used

to document and model business requirements.

According to a complexity analysis by Siau and Cao

(2001) UML (Unified Modeling Language) class

diagram ranks the highest in complexity among the

structural diagrams followed by state chart among the

dynamic diagrams because of their high cognitive

and structural complexity (Cruz-Lemus, Maes,

MODELSWARD 2022 - 10th International Conference on Model-Driven Engineering and Software Development

130

Genero, Poels, & Piattini, 2010). In a Delphi study by

Erickson and Siau (2007) identifying the kernel of

“essential” UML class diagram use cases, sequence

and state chart are found to have the highest usability

ranks by practitioners and educators from software

industry and academic field. Furthermore, these are

also among the top used diagrams present in the

context of educational material such as books, tools,

courses and tutorials, with percentages of 100% (class

diagram, use case, sequence) and over 96% (State

chart), while also being among the top diagramming

techniques that support conceptual modelling.

3.2 Text Mining Techniques Applied

for Requirements Engineering

Requirements analysis (RA) has a key role in the

process of development of information systems. It is

a challenging task that requires complex knowledge

and skills. The quality of a requirements analysis

affects the quality of the information systems to be

engineered. Nowadays models are largely used for

constructing an organization’s information system

thus also serving the first artefact the quality of which

can be formally tested. UML uses a graphical notation

that can represent both the structure and business

logic of the system to be engineered. Usually,

requirements documents and domain descriptions are

provided in natural language. Processing these

documents using text mining applications (TMA) can

help to extract information that may be useful in a

learning context (e.g. by hinting candidate model

constructs) to assist novices, or in the context of

processing large requirement documents by experts

(e.g. by identifying potential constructs that can be

confirmed or discarded by a human modeler). For

instance, TMA can assist transforming an

unstructured data set into a structured format or a

medium that can be used to generate business model

constructs. Finally, a formal diagram can be drawn

for the system from business requirements

automatically. Earlier research proposed various

methods to draw UML diagrams from using text

mining approach. We will briefly discuss some of

those we consider to be of particular interest.

Montes, Pacheco, Estrada, and Pastor (2008)

presented a natural language processing based

method to generate UML diagrams using a plain text

as an input. The method analyses the given script/text

to extract relevant information, based on which UML

diagrams are drawn. The process of creating,

arranging, labeling and finalizing the UML diagrams

is performed using the following steps: 1. Text input

acquisition, to read and obtain input text scenario, 2.

Syntactic Analysist, to categorize words into various

classes as verbs, helping verbs, nouns, pronouns,

adjectives, prepositions, conjunctions, etc. 3. Text

understanding to infer the meanings of the given text

by using semantic rules (Malaisé, Zweigenbaum, &

Bachimont, 2005), 4. Knowledge extraction, which

extracts required data attributes using a set of rules

(Van Rijsbergen, 1977), 5. UML diagram generation

uses UML notation symbols to draw a UML diagram.

Shahzadi, Ahmad, Fatima, Sarwar, and Mahmood

(2013) proposed a method to identify domain entities

and their relationships from text documents that can

be transformed into a UML diagram. The method

performs a linguists processing on a given text using

open source tool named GATE (Cunningham et al.,

2009) It allows marking entities and relationships

between entities. Their system includes the following

steps: 1. document acquisition, 2. document

processing, 3. XML modeling. Document

Acquisition step obtains an input from a textual

document. Document processing step applies

linguistic processing (e.g., sentence splitter,

tokenizer, part-of-speech tagger). XML modeling is

used to convert a textual data into a formal data-

model. Harmain and Gaizauskas (2000) developed a

method which produces an object-oriented model

from textual documents. Montes et al. (2008) present

a method of generating an object-oriented conceptual

model (e.g., UML class diagrams) from natural

language text. Hasegawa, Kitamura, Kaiya, and Saeki

(2009

) also introduced a tool that extracts

requirements models from natural language texts.

Gelhausen, Derre, and Geiß (2008) and Gelhausen

and Tichy (2007) presented a method to create UML

domain models directly from a textual document. The

authors employ a graph technique as an intermediate

representation of a text. The nodes in the graph

represent sentences and words. Edges indicates

thematic roles and are the core component of the

method as they represent the semantic information in

the text. Graph transformation rules are then used to

build a UML representation. Mala and Uma (2006)

use an NLP pipeline to create a model without the

intervention of a domain expert. The authors claim

that the yielded results are at least as good as or

exceeding human-made class diagrams. Bajwa and

Choudhary (2006) extract nouns and verb

combinations from input texts and map the nouns and

verbs to UML class elements and relations,

respectively. However, to our knowledge, little is

known about how these text mining techniques and

applications can support the task of deriving business

requirements and automatic generation of formalized

requirements documents, such as UML models, from

Text-To-Model (TeToMo) Transformation Framework to Support Requirements Analysis and Modeling

131

a theoretical point of view that builds on evidences

from prior studies in the domain. In addition, methods

that integrate machine learning-based method, word

embedding, heuristic rules, statistical and linguistic

knowledge also included in this method, have not

been thoroughly studied for solving requirements

analysis and modeling problem. This research aims to

propose a method that 1. builds on the analysis and

summary of the literature on the applications of text

mining in the domain of requirements analysis and

modeling and 2. generalizes to the next level by

introducing the gaps in this domain for future

research needs. We argue that, while the application

of text mining in the business domain is still not

mature enough to produce requirements documents

and models that are of high semantic quality (such as

accurate, complete, etc.) there is a good potential for

these techniques to assist a human analyst/modeler to

facilitate some steps in the process, but also assist a

learning/training process of a novice analyst. The

latter can be achieved not only by automated

discovery and proposal of candidate constructs such

as entities, attributes, (mandatory and optional)

relationships, activities, sequences, etc., but also

learning from expert analysis and modeling

process/behavior, or a business domain document

generate recommendations and feedback for a

learning context.

Summing up, the problem in the domain discussed

in the literature so far deals with the fact that there is

a need for supporting techniques that can assist the

analysts for routine tasks of systematically producing

a model appropriately representing the expected

structure and functionality of a system with less

effort. To achieve this, most of the earlier works used

shallow linguistic techniques, while the method

proposed in this work aims to in addition solve and

address some of the complex linguistic problems

applied within the domain of business requirements.

The contribution of this work can be thus addressed

as follows:

1) The work is the first attempt of integrating a

machine learning-based method, word embedding,

heuristic rules, statistical and linguistic knowledge,

which has not been thoroughly studied for solving

software requirements analysis.

2) The TeToMo based implementation employs

several meaningful resource-information latent in a

sentence to a) learn a better sentence representation;

b) create augmented vector; c) obtain significant

performance. A hybrid vector is created to represent

each sentence using the statistical, linguistic

knowledge-based and word embedding-based feature

vectors.

3) The TeToMo based implementation generates an

XMI file (Specification, 2006) as output (currently

limited to modeling constructs of class and state chart

models) that is generic enough to allow to be

imported and visualized in any UML modeling tool.

Furthermore, a human analyst can interfere in the

generation process and/or further refine and extend

them.

4) Finally, we conduct experiments for performance

evaluation and comparison. We report our results on

the benchmark dataset. In the other words, we

measure the performance of the TeToMo against

human judgment to confirm the suitability of the

proposed method for analysing software

requirements documents.

4 TeToMo FRAMEWORK

Models of a system-to-be often combine structural

and behavioral views. This can be achieved by

combining different diagrams such as class diagram,

process flows represented as business process

models, flowcharts, activity diagrams, state chart, etc.

Potential elements that a modeler may need to

tag/derive from business requirements during early

stages of analysis (which we will further refer to as

model constructs) include for instance:

- Entities (class diagram).

- Attributes (class diagram).

- Relationships/associations between entities (class

diagram).

- Type of relationship/association, e.g. mandatory vs.

optional (class diagram).

o wording like may, can, … show modality,

wording like must, should,… may denote

mandatory relationship.

- Events (activity diagram, state chart).

- Sequences of events (activity diagram, state chart).

o event A may occur after or before event B, etc.,

event X is a starting event, event Y is an ending

event.

- States (state chart) that are usually verbs in passive

form (e.g. approved, blocked, ended, started, etc.).

The architecture of TeToMo based implementation of

model extraction is displayed in Figure 1. It illustrates

the steps and functionality of the system to transform

user requirements such as user stories into UML

diagrams. There are six main modules in the TeToMo

system. It is worth noting, because of space limitation,

we explain each of them in summary: Input document

includes the document with business requirements for

processing. The Natural Language Analysis (NLA)

MODELSWARD 2022 - 10th International Conference on Model-Driven Engineering and Software Development

132

module is applied to the document and the processed

information is sent to the Relationship, Object,

Method, Attribute (ROMA) module to extract

modeling constructs such as the attributes,

object/concepts and relationships between objects

using both NLP module, heuristic rules and machine

learning-based method. The findings are further

proposed to the user as candidate model constructs

through the Interference Human Interface which can

be used to approve, discard, make duplicate constructs

based on e.g. synonyms, etc. Model Generator Module

is used to compose a model and subsequent structural

and dynamic views such as a class diagram and state

charts. XMI format is employed to store the

information of finalized model constructs and

relationships. Finally an XMI file can be imported to

a UML modeling tool to draw the required diagram.

Detailed explanation of these modules is as follows:

Figure 1: The Architecture of the method.

Input Document: The goal of this module is to collect

a textual document (business requirements articulated

in natural language either in unstructured or

structured medium format) as an input to the system.

Natural Language Analysis (NLA) Module: This

module uses basic linguistic functions to analyse the

text syntactically and semantically, and stores all the

intermediate processing results for further analysis.

This stage includes the following tasks: The raw text

is first processed. Then, the document is decomposed

into several paragraphs. Next, the paragraphs are

further decomposed into sentences. Secondly, the

tokenization, a basic approach of the text pre-

processing, splits the sentences into words. We also

employ the stemming procedure to reduce a word to

its root form. Part of Speech (POS) tagging is also

used to classify the words of text on the basis of part

of speech category (‘noun’, ‘verbs’, ‘adverb’,

‘adjectives’) they belong. The POS tagging provides

useful lexical information.

The syntactic analysis is used to identify subject,

verbs, objects, adverbs, adjectives and various other

complements. The NLA also uses other basic

linguistic functions such as co-reference resolution,

stemming process, semantic role labeling (active &

passive sentences), semantic word similarity (word

embedding & statistical method), WordNet

dictionary to enhance the semantic analysis. The

results generated by the NLA is passed to the ROMA

module (see Figure 1).

Sentences segmentation function is used to split a

paragraph into sentences. Stemming technique is used

to reduce word to its stem form (e.g., ‘went’---> ‘go’).

Stanford Parser is used to produce a parse tree for

each sentence to identify the ‘phrases’, the object or

subject of a verb.

WordNet is an English language lexical database. It

can be employed to obtain semantically similar terms

and acquire synonyms. The synonyms are used to

extract words that are semantically related to each

other (e.g., term frequency (TF) method uses

synonym to find a word with high frequency in a text

file). Co-reference Resolution process is used to

determine linguistic expressions that refer to the same

entity in a text. This is useful to link pronominal

references (e.g., she, it) to earlier referent in the text.

ROMA module has two sub-modules for

identification of model constructs.: a) Objects,

attributes and methods identification: The main task

of the current sub-module is extracting the important

words that can be considered as a model construct

such as entity/concept, corresponding attributes

related to each concept, the method/operations and

involvement of entity instances using the calculation

of various measure parameters of the words (e.g., TF

(term frequency), TF*IDF (term frequency × inverse

document frequency), Entropy, C-value (Hasegawa

et al., 2009)), Heuristic rules and the basic linguistic

functions (e.g., parser). b) Relationship

identification: A machine learning-based method

employs word embedding method, various heuristic

rules (Narawita, 2017), specific words, various

measure parameters, statistical and linguistic

knowledge as input features to identify different types

of relationships.

Interference Human Interface: TeToMo provides an

interactive user interface that allows an end-user to

manage and analyze business requirements. S/he can

choose to interfere in the process of initial tagging as

Text-To-Model (TeToMo) Transformation Framework to Support Requirements Analysis and Modeling

133

well as the selection of the final model constructs out

of those proposed by the ROMA module. The system

presents a list of candidate constructs a user can make

a decision whether or not to include or exclude a

particular item (e.g., ‘redundant classes’, ‘irrelevant

classes’).

Model Generator Module: This module generates

different UML diagrams using pre-processed

requirements. It uses the extracted Information of

‘ROMA’ module and Interference Human interface

to extract model views such as class diagram and state

chart diagram. Algorithms are used to identify and

determine the class and State chart models elements

(e.g., concepts/objects/entities, attributes,

signal/operation/method, associations, relationships,

Is-a relationship (generalization), Has-a

(aggregation), multiplicities on relationships among

the objects, states, events, sequences).

Heuristic rules include a set of rules to identify

objects, attributes, method, the multiplicity of roles in

associations and relationship between the objects. To

do this, we collected several heuristic rules from

previous studies (Deeptimahanti & Sanyal, 2011)

such as syntactic reconstruction rules to split a

complex sentence into simple sentences to extract all

possible information from the requirements

document. An XMI file is produced as the final output

of the TeToMo. This file includes the information

about the identified concept/entity/object, their

attributes, and the relationships among them, their

typology and relations linked to UML notation at

meta-data level. It is used to visualize generated

models in any UML modeling tool which supports the

XMI import facility or XML compatible.

5 SYSTEM EVALUATION

The evaluation of TeToMo will be conducted through

the performance evaluation by comparing the output

of the TeToMo with the model generated manually

(by an expert). For this purpose different case studies

from different domains have been used. The aim of

the system evaluation is to assess the extracted

constructs and model views with respect to its

semantic quality measured by semantic conformance

with the business requirements, completeness, and

accuracy. Furthermore, this section aims to answer

the following question: how close the model

generated by the TeToMo is to the one produced by

an expert analyst.

Standard Data Set — to analyze the performance of

TeToMo, we also need a gold standard model (a set

of all correct results). For this purpose, an expert with

sufficient skill, experience and domain knowledge is

asked to produce the corresponding UML diagram for

each use case. The produced models are then

compared with the model generated by TeToMo at

the level of model constructs such as attribute,

relationship, object/concept. In other words, we

compare which parts of models generated by the

expert and TeToMo are the similar, complementary

or conflicting.

Evaluation Metrics — we use three standard metrics

to evaluate the performance of the TeToMo. These

metrics known as Recall (R), Precision (P) and

F_measure. Precision is a set of selected items that

are true, while recall is a set of correct items that are

selected. In this study, the model elements (e.g.,

attribute, relationship, object/concept) identified by a

human refers to a set of ideal items, and the model

elements identified by the TeToMo refers to a set of

system items. In other words, precision is used to

assess the fraction of the system items that the

TeToMo correctly identified and recall is used to

assess the fraction of the ideal items that the algorithm

identified. The precision is calculated using Eq. (1):

the division of identified model elements by TeToMo

and human expert over the number of model elements

identified by TeToMo only. The recall is calculated

using Eq. (2): the division of identified model

elements by TeToMo and human expert intersection

over the number of model elements identified by a

human expert.

 =



 + 

(1)

 =



+

(2)

Where, M = The number of model elements identified

by TeToMo and Human expert. N = The number of

model elements identified by TeToMo only. K = The

number of model elements identified by Human

expert only. Furthermore, the F-measure is used to

merge both precision and recall. It is computed as

follows:

_measure =

2 ×  × 

 + 

(3)

Sample Case Study: Figure 2 presents a

conceptual model of simplified a ticket sale system

generated by an expert. “A company sells airline

tickets, for which customers can make reservations

either on a company’s online booking system or by

calling a phone operator. To meet customer demand

the company uses a flexible pricing policy based on

which a seat prices is decided during the reservation

by a manager. Furthermore, the price for a seat is not

MODELSWARD 2022 - 10th International Conference on Model-Driven Engineering and Software Development

134

fixed and can change depending on the changes of

several conditions, e.g. after a reservation is

cancelled, and there are not many seats available for

the flight while having a high demand (e.g. due to

popular vacation destination), the price for the seat

can increase”.

Figure 2: Sample intermediate diagram produced by

TeToMo (missing to detect elements noted in red).

Table 1: Precision, Recall and F-score results.

The model shows classes/concepts (e.g., ‘Customer’,

‘Seat’, ‘Reservation’, etc.). These concepts are linked

to each other through relationships. In addition, each

class includes several attributes, methods/operations.

On the other hand, a TeToMo produced sample of the

corresponding conceptual model from the above

problem statement produced the same output (with

the missed elements compared to human modeler

noted in red). To evaluate the method, we measure the

performance of the TeToMo against human judgment

on the level of the extracted model elements. For

instance, Table 1 presents the three main columns:

number of concepts/class name derived from Figures

2 (e.g., the expert extracted 7 concepts and the

TeToMo extracted 6 concepts). To calculate the P, R,

and F-measure, we determine the values of M, N, K.

Then, the equations of (1), (2) and (3) are applied to

get the value of each evaluation metric. The averages

of precision, recall and F-measure are considered as a

performance of the TeToMo. We also consider using

the same format presented in Table 1 to evaluate the

detection of other elements such as attributes,

methods, relationships. In this case, the column name

“Concepts/Class Name” is replaced by the

“Attribute”, “Method” or “Relationship”, etc.

6 CONCLUSION

The work proposes a method that can be used to

design and develop a text mining based support for

requirements analysis and modeling process. A

prototype solution was designed for a conceptual

modeling case using well-known standard diagrams

for designing structural and behavioral aspects of a

system. In this paper, we present a novel method that

integrates machine learning based-method, word

embedding, heuristic rules, statistical and linguistic

knowledge to solve requirements analysis and

modeling problem (called TeToMo).

The effectiveness will be evaluated by comparing

model solutions designed by human modeler and

those constructed with the help of the prototype, e.g.

based on the candidate constructs proposed by text

mining approach. In addition, we also aim to compare

the performance of TeToMo with those of the other

existing and well-known methods used for generating

UML diagram such as (Landhäußer, Körner, &

Tichy, 2014). While this work outlines some findings

on the method performance from a pilot study,

evaluating the effectiveness of the method with an

enhanced prototype and replication studies using a

larger corpus constitutes a further research direction.

Bi-directional verification of requirements from

models to text is yet another direction for further

work. Finally, discourse analysis module can be

added to the NLA to allow determining contextual

information. Addressing this points will lead to

improved versions of the TeToMo framework and

applications. While in this work the framework is

limited to two specific types of diagrams, expanding

the scope to experiment with more UML diagram

types will be a step towards a more generic approach.

REFERENCES

Bajwa, Imran Sarwar, & Choudhary, M Abbas. (2006).

Natural language processing based automated system

for uml diagrams generation. Paper presented at the

The 18th Saudi National Computer Conf. on computer

science (NCC18). Riyadh, Saudi Arabia: The Saudi

Computer Society (SCS).

Creswell, John W. (1994). Research design: Qualitative and

quantitative approach. London: Publications.

Class name

The value of Evaluation metrics

Extracted By

Case study

Human

TeToMo

M N K P R

F_measure

0.45

0.50

0.47

0.44

0.50

0.47

…

0.53

0.45

0.49

Average:

0.47

0.48

0.47

Text-To-Model (TeToMo) Transformation Framework to Support Requirements Analysis and Modeling

135

Cruz-Lemus, José A, Maes, Ann, Genero, Marcela, Poels,

Geert, & Piattini, Mario. (2010). The impact of

structural complexity on the understandability of UML

statechart diagrams. Information Sciences, 180(11),

2209-2220.

Cunningham, Hamish, Maynard, Diana, Bontcheva, Kalina,

Tablan, Valentin, Ursu, Cristian, Dimitrov, Marin, . . .

Li, Yaoyong. (2009). Developing Language Processing

Components with GATE Version 5:(a User Guide):

University of Sheffield.

Deeptimahanti, Deva Kumar, & Sanyal, Ratna. (2011).

Semi-automatic generation of UML models from

natural language requirements. Paper presented at the

Proceedings of the 4th India Software Engineering

Conference.

Erickson, John, & Siau, Keng. (2007). Can UML be

simplified? Practitioner use of UML in separate

domains. Paper presented at the proceedings EMMSAD.

Gelhausen, Tom, Derre, Bugra, & Geiß, Rubino. (2008).

Customizing grgen. net for model transformation.

Paper presented at the Proceedings of the third

international workshop on Graph and model

transformations.

Gelhausen, Tom, & Tichy, Walter F. (2007). Thematic role

based generation of UML models from real world

requirements. Paper presented at the International

Conference on Semantic Computing (ICSC 2007).

Harmain, Harmain Mohamed, & Gaizauskas, R. (2000).

CM-Builder: an automated NL-based CASE tool. Paper

presented at the Proceedings ASE 2000. Fifteenth IEEE

International Conference on Automated Software

Engineering.

Hasegawa, Ryo, Kitamura, Motohiro, Kaiya, Haruhiko, &

Saeki, Motoshi. (2009). Extracting conceptual graphs

from Japanese documents for software requirements

modeling. Paper presented at the Proceedings of the

Sixth Asia-Pacific Conference on Conceptual

Modeling-Volume 96.

Kerlinger, Fred N. (1979). Behavioral research: A

conceptual approach: Holt, Rinehart and Winston New

York.

Landhäußer, Mathias, Körner, Sven J, & Tichy, Walter F.

(2014). From requirements to UML models and back:

how automatic processing of text can support

requirements engineering. Software Quality Journal,

22(1), 121-149.

Mala, GS Anandha, & Uma, GV. (2006). Automatic

construction of object oriented design models [UML

diagrams] from natural language requirements

specification. Paper presented at the Pacific Rim

International Conference on Artificial Intelligence.

Malaisé, Véronique, Zweigenbaum, Pierre, & Bachimont,

Bruno. (2005). Mining defining contexts to help

structuring differential ontologies. Terminology.

International Journal of Theoretical and Applied Issues

in Specialized Communication, 11(1), 21-53.

Montes, Azucena, Pacheco, Hasdai, Estrada, Hugo, &

Pastor, Oscar. (2008). Conceptual model generation

from requirements model: A natural language

processing approach. Paper presented at the

International Conference on Application of Natural

Language to Information Systems.

Morgan, David L. (2018). Themes, theories, and models.

Qualitative health research, 28(3), 339-345.

Narawita, Chamitha Ramal. (2017). UML Generator-Use

Case and Class Diagram Generation from Text

Requirements. ICTer, 10(1).

Schenk, Karen D, Vitalari, Nicholas P, & Davis, K Shannon.

(1998). Differences between novice and expert systems

analysts: What do we know and what do we do? Journal

of management information systems, 15(1), 9-50.

Sedrakyan, Gayane. (2016). Process-oriented feedback

perspectives based on feedback-enabled simulation and

learning process data analytics, PhD Thesis, KU

Leuven.

Shahzadi, Iram, Ahmad, Qanita, Fatima, Kiran, Sarwar,

Imran, & Mahmood, Waqar. (2013). UMagic! THE

UML modeler for text documents. International

Journal of Computer Theory and Engineering, 5(1),

166.

Siau, Keng, & Cao, Qing. (2001). Unified modeling

language: A complexity analysis. Journal of Database

Management (JDM), 12(1), 26-34.

Specification, OMG Business Process Modeling Notation.

(2006). Object management group. Needham, MA, USA,

2(2).

Van Rijsbergen, Cornelis Joost. (1977). A theoretical basis

for the use of co‐ occurrence data in information

retrieval. Journal of documentation.

Wang, Wang, & Brooks, Roger J. (2007). Empirical

investigations of conceptual modeling and the modeling

process. Paper presented at the 2007 Winter Simulation

Conference.

MODELSWARD 2022 - 10th International Conference on Model-Driven Engineering and Software Development

136