Syllabification with Frequent Sequence Patterns - A Language Independent Approach

Adrian Bona, Camelia Lemnaru, Rodica Potolea

Abstract

In this paper we show how words represented as sequences of syllables can provide valuable patterns for achieving language independent syllabification. We present a novel approach for word syllabification, based on frequent pattern mining, but also a more general framework for syllabification. Preliminary evaluations on Romanian and English words indicated a word level accuracy around 77% for Romanian words and around 70% for English words. However, we believe the method can be refined in order to improve performance.

References

  1. Barbu, A.-M. (2008). Romanian lexical data bases: Inflected and syllabic forms dictionaries. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/.
  2. Bartlett, S., Kondrak, G., and Cherry, C. (2008). Automatic syllabification with structured svms for letterto-phoneme conversion. In ACL 2008, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, June 15-20, 2008, Columbus, Ohio, USA, pages 568-576.
  3. Bartlett, S., Kondrak, G., and Cherry, C. (2009). On the syllabification of phonemes. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 7809, pages 308-316, Stroudsburg, PA, USA. Association for Computational Linguistics.
  4. Daelemans, W., Van Den Bosch, A., and Weijters, T. (1997). Igtree: Using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review, 11(1):407-423.
  5. Dinu, L. (2004). Despartirea automata in silabe a cuvintelor din limba romana. aplicatii in constructia bazei de date a silabelor limbii romane.
  6. Dinu, L. P., Niculae, V., and Sulea, O.-M. (2013). Romanian syllabication using machine learning. In Text, Speech, and Dialogue, pages 450-456. Springer.
  7. Goslin, J. and Frauenfelder, U. H. (2001). A comparison of theoretical and human syllabification. Language and Speech, 44(4):409-436.
  8. Kahn, D. (1976). Syllable-Based Generalizations in English Phonology. PhD thesis, Indiana University Linguistics Club.
  9. Li, C. and Wang, J. (2008). Efficiently mining closed subsequences with gap constraints. In SDM, pages 313- 322. SIAM.
  10. Marchand, Y., Adsett, C. R., and Damper, R. I. (2007). Evaluating automatic syllabification algorithms for english. In 6th International Speech Communication Association (ISCA) Workshop on Speech Synthesis, pages 316-321.
  11. Marchand, Y. and Damper, R. I. (2000). A multi-strategy approach to improving pronunciation by analogy. Computational Linguistics, 26(2):195-219.
  12. Rogova, K., Demuynck, K., and Van Compernolle, D. (2013). Automatic syllabification using segmental conditional random fields. Computational Linguistics in the Netherlands Journal, 3:34-48.
  13. Selkirk, E. O. (1984). On the major class features and syllable theory.
Download


Paper Citation


in Harvard Style

Bona A., Lemnaru C. and Potolea R. (2016). Syllabification with Frequent Sequence Patterns - A Language Independent Approach . In Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016) ISBN 978-989-758-203-5, pages 352-359. DOI: 10.5220/0006069703520359


in Bibtex Style

@conference{kdir16,
author={Adrian Bona and Camelia Lemnaru and Rodica Potolea},
title={Syllabification with Frequent Sequence Patterns - A Language Independent Approach},
booktitle={Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)},
year={2016},
pages={352-359},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006069703520359},
isbn={978-989-758-203-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR, (IC3K 2016)
TI - Syllabification with Frequent Sequence Patterns - A Language Independent Approach
SN - 978-989-758-203-5
AU - Bona A.
AU - Lemnaru C.
AU - Potolea R.
PY - 2016
SP - 352
EP - 359
DO - 10.5220/0006069703520359