a more compressed index. The second technique can 
obviously offer higher efficiency especially when 
handling a big amount of data. Moreover this new 
approach for handling DNA sequences as a 
geometrical problem could possibly lead in future to 
new and efficient ideas about DNA algorithms. 
REFERENCES 
Alatabbi, A., Crochemore, M., Iliopoulos, C. S., and 
Okanlawon, T. A. (2012). Overlapping repetitions in 
weighted sequence. In International Information 
Technology Conference (CUBE), pp. 435-440. 
Bernstein, Y., & Zobel, J. (2004, January). A scalable 
system for identifying co-derivative documents. 
In String Processing and Information Retrieval (pp. 55-
67). Springer Berlin Heidelberg.  
Christodoulakis, M., Iliopoulos, C. S., Mouchard, 
L.,Perdikuri, K., Tsakalidis, A. K., and Tsichlas, 
K.(2006). Computation of repetitions and regularities 
of biologically weighted sequences. In Journal of 
Computational Biology (JCB), Volume 13, pp. 1214-
1231. 
Diamanti, K., Kanavos, A., Makris, C., & Tokis, T.(2014) 
Handling Weighted Sequences Employing Inverted 
Files and Suffix Trees, 
Grechko, V. V. (2011). Repeated DNA sequences as an 
engine of biological diversification. Molecular 
Biology, 45(5), 704-727. 
Grumbach, S. and Tahi, F., A new challenge for 
compression algorithms: genetic sequences, J. 
Information Processing and Management, 30(6):875-
866, 1994. 
Kim, M.-S., Whang, K.-Y., and Lee, J.-G. (2007). 
ngram/2l-approximation: a two-level n-gram inverted 
index structure for approximate string matching. In 
Computer Systems: Science and Engineering, Volume  
22, Number 6. 
Kim, M.-S., Whang, K.-Y., Lee, J.-G., and Lee, M.-J. 
(2005). n-gram/2l: A space and time efficient twolevel. 
n-gram inverted index structure. In International. 
Conference on Very Large Databases (VLDB), 
pp. 325-336. 
Krawinkel, U., Zoebelein, G., & Bothwell, A. L. M. (1986). 
Palindromic sequences are associated with sites of 
DNA breakage during gene conversion.Nucleic acids 
research, 14(9), 3871-3882. 
Kurtz, S., & Schleiermacher, C. (1999). REPuter: fast 
computation of maximal repeats in complete genomes. 
Bioinformatics, 15(5), 426-427. 
Lee, J. H. and Ahn, J. S. (1996). Using n-grams for korean. 
text retrieval. In ACM SIGIR, pp. 216-224. 
Mayfield, J. and McNamee, P. (2003). Single n-gram 
stemming.In ACM SIGIR, pp. 415-416. 
Millar, E., Shen, D., Liu, J., & Nicholas, C. (2006). 
Performance and scalability of a large-scale n-gram 
based information retrieval system. Journal of digital 
information, 1(5). 
Navarro, G., & Baeza-Yates, R. (1998). A practical q-gram 
index for text retrieval allowing errors. CLEI Electronic 
Journal, 1(2), 1. 
Ogawa, Y. and Iwasaki, M. (1995). A new characterbased 
indexing organization using frequency data for 
japanese documents. In ACM SIGIR, pp. 121-129. 
Rivals, E., Delahaye, J.-P., Dauchet, M., and Delgrange, O., 
A Guaranteed Compression Scheme for ´ Repetitive 
DNA Sequences, LIFL Lille I University, technical 
report IT-285, 1995. 
Smith, T. F., & Waterman, M. S. (1981). Identification of 
common molecular subsequences. Journal of 
molecular biology
, 147(1), 195-197. 
Sun, Z., Yang, J., and Deogun, J. S. (2004). Misae: A new 
approach for regulatory motif extraction. In 
Computational Systems Bioinformatics Conference 
(CSB), pp.173-181. 
Welch, T. A. (1984). A technique for high-performance 
data compression computer, 6(17), 8-19.. 
Ziv, J., & Lempel, A. (1977). A universal algorithm for 
sequential data compression. IEEE Transactions on 
information theory, 23(3), 337-343.