Gemma Bel-Enguix, Veronica Dahl, M. Dolores Jimenez-lopez



We present, discuss and exemplify a fully implemented model of text mining that can be applied to spoken languages as well as to molecular biology languages. This is based in the model presented in (Zahariev et al., 2009) oriented to discovering DNA barcodes for sequences. The novelty of our methodology is the use of Constraint Based Reasoning to detect string repetitions through unification, by introducing a new general rule for matching. We claim that the same method can be succesfully applied to mining natural language texts.


