Handling Weighted Sequences Employing Inverted Files and Suffix Trees

Klev Diamanti, Andreas Kanavos, Christos Makris, Thodoris Tokis

2014

Abstract

In this paper, we address the problem of handling weighted sequences. This is by taking advantage of the inverted files machinery and targeting text processing applications, where the involved documents cannot be separated into words (such as texts representing biological sequences) or word separation is difficult and involves extra linguistic knowledge (texts in Asian languages). Besides providing a handling of weighted sequences using n-grams, we also provide a study of constructing space efficient n-gram inverted indexes. The proposed techniques combine classic straightforward n-gram indexing, with the recently proposed two-level n-gram inverted file technique. The final outcomes are new data structures for n-gram indexing, which perform better in terms of space consumption than the existing ones. Our experimental results are encouraging and depict that these techniques can surely handle n-gram indexes more space efficiently than already existing methods.

Download


Paper Citation


in Harvard Style

Diamanti K., Kanavos A., Makris C. and Tokis T. (2014). Handling Weighted Sequences Employing Inverted Files and Suffix Trees.In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST, ISBN 978-989-758-024-6, pages 231-238. DOI: 10.5220/0004788502310238


in Bibtex Style

@conference{webist14,
author={Klev Diamanti and Andreas Kanavos and Christos Makris and Thodoris Tokis},
title={Handling Weighted Sequences Employing Inverted Files and Suffix Trees},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,},
year={2014},
pages={231-238},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004788502310238},
isbn={978-989-758-024-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,
TI - Handling Weighted Sequences Employing Inverted Files and Suffix Trees
SN - 978-989-758-024-6
AU - Diamanti K.
AU - Kanavos A.
AU - Makris C.
AU - Tokis T.
PY - 2014
SP - 231
EP - 238
DO - 10.5220/0004788502310238