over lower levels and improving the quality of the 
project development. The patterns are fundamental 
reuse components that identify common 
characteristics between elements of a domain and 
can be incorporated into models or defined 
structures that can represent the knowledge in a 
better way. 
B.  Natural Language Processing 
The need for implementing Natural Language 
Processing techniques arises in the field of the 
human-machine interaction through many cases such 
as text mining, information extraction, language 
recognition, language translation, and text 
generation, fields that requires a lexical, syntactic 
and semantic analysis to be recognized by a 
computer (Cowie et al., 2000). The natural language 
processing consists of several stages which take into 
account the different techniques of analysis and 
classification supported by the current computer 
systems (Dale, 2000). 
1)  Tokenization: The tokenization corresponds to a 
previous step on the analysis of the natural 
language processing, and its objective is to 
demarcate words by their sequences of characters 
grouped by their dependencies, using separators 
such as spaces and punctuation (Moreno, 2009). 
Tokens are items that are standardized to 
improve their analysis and to simplify 
ambiguities in vocabulary and verbal tenses.  
2)  Lexical Analysis: Lexical analysis aims to obtain 
standard tags for each word or token through a 
study that identifies the turning of vocabulary, 
such as gender, number and verbal irregularities 
of the candidate words. An efficient way to 
perform this analysis is by using a finite 
automaton that takes a repository of terms, 
relationships and equivalences between terms to 
make a conversion of a token to a standard 
format (Hopcroft et al., 1979). There are several 
additional approaches that use decision trees and 
unification of the databases for the lexical 
analysis but this not covered for this project 
implementation (Trivino et al., 2000). 
3) Syntactic Analysis: The goal of syntactic 
analysis is to explain the syntactic relations of 
texts to help a subsequent semantic interpretation 
(Martí et al., 2002), and thus using the 
relationships between terms in a proper context 
for an adequate normalization and 
standardization of terms. To incorporate lexical 
and syntactic analysis, in this project were used 
deductive techniques of standardization of terms 
that convert texts from a context defined by 
sentences through a special function or finite 
automata. 
4)  Grammatical Tagging: Tagging is the process of 
assigning grammatical categories to terms of a 
text or corpus. Tags are defined into a dictionary 
of standard terms linked to grammatical 
categories (nouns, verbs, adverb, etc.), so it is 
important to normalize the terms before the 
tagging to avoid the use of non-standard terms. 
The most common issues of this process are 
about systems' poor performance (based on large 
corpus size), the identification of unknown terms 
for the dictionary, and ambiguities of words 
(same syntax but different meaning) (Weischedel 
et al., 2006). Grammatical tagging is a key factor 
in the identification and generation of semantic 
index patterns, in where the patterns consist of 
categories not the terms themselves. The 
accuracy of this technique through the texts 
depends on the completeness and richness of the 
dictionary of grammatical tags. 
5) Semantic and Pragmatic Analysis: Semantic 
analysis aims to interpret the meaning of 
expressions, after on the results of the lexical and 
syntactic analysis. This analysis not only 
considers the semantics of the analyzed term, but 
also considers the semantics of the contiguous 
terms within the same context. Automatic 
generation of index patterns at this stage and for 
this project does not consider the pragmatic 
analysis. 
C. RSHP Model 
RSHP is a model of information representation 
based on relationships that handles all types of 
artifacts (models, texts, codes, databases, etc.) using 
a same scheme. This model is used to store and link 
generated pattern lists to subsequently analyze them 
using specialized tools for knowledge representation 
(Llorens et al., 2004). Within the Knowledge Reuse 
Group at the University Carlos III of Madrid RSHP 
model is used for projects relevant to natural 
language processing. (Gomez-Perez et al., 2004); 
(Thomason, 2012); (Amsler, 1981);(Suarez et al., 
2013) The information model is presented in Figure 
1. An analysis of sentences and basic patterns are 
shown in Figure 2.