NEW TIME-FREQUENCY VOWEL QUANTIZATION ENHANCED BY SUBBAND HIERARCHY

Fraihat Salam, Glotin Hervé

Abstract

Speech dynamics may not well be addressed by the conventional speech processing. We analyse here a new quantization paradigm for vowel coding. It is based on simple Allen temporal interval algebra applied on subband voicing levels, yielding to a compressed speech representation of only 21 integers for a speech window up to 32 ms long. Experiments show that we take advantage of the ranking of the average values of the voicing interval accross the various subbands. Theses new features are evaluated for vowel recognition (1 hour, 6 vowels) on a referenced multispeaker radio broadcast news used during evaluation campaign ESTER. We work on the subset of the most frequent french vowels. We get 62% class error rate adding the ranking information to the Allen’s relations, instead of 70% using Allen relations alone, and 57% the set of the raw 48 floats. We then discuss on the advantage of using more subbands, and we finaly propose a strategy to tackle the combinatorial complexity of Allen relations.

References

  1. Allen, J. (1981). An interval-based representation of temporal knowledge. In 7th IJCAI, pages 221-226.
  2. Allen, J. (1994). How do humans process and recognise speech. In IEEE Trans. on Speech and Signal Processing 2(4), pages 567-576.
  3. Divenyi, P., Greenberg, S., and Meyer, G. (2006). Dynamics of Speech Production and Perception. IOS Press Inc.
  4. Fletcher, H. (1922). The nature of speech and its interpretation. J. Franklin Inst., 193 6:729-747.
  5. Fraihat, S., Aloui, N., and Glotin, H. (2008). Parsimonious time-frequency quantization for phoneme and speaker classification. In IEEE Conference on Electrical and Computer Engineering (CCECE).
  6. Galliano, S., Geoffrois, E., a. M. D., Choukri, K., Bonastre, J.-F., and Gravier, G. (2005). The ester phase 2 : Evaluation campaign for the rich transcription of french broadcast news. European Conf. on Speech Communication and Technology, pages 1149-1152,.
  7. Glotin, H. (2001). Elaboration and comparatives studies of robust adaptive multistream speech recognition using voicing and localisation cues. In Inst. Nat. Polytech Grenoble & EPF Lausanne IDIAP.
  8. Glotin, H. (2006). When allen j.b. meets allen j.f.: Quantal time-frequency dynamics for robust speech features. Technical report, Research Report LSIS 2006, Lab Systems and Information Sciences UMR-CNRS.
  9. Glotin, H., Vergyri, D., Neti, C., Potamianos, G., and Luettin, G. (2001). Weighting schemes for audio-visual fusion in speech recognition. In IEEE int. conf. Acoustics Speech & Signal Process. (ICASSP).
Download


Paper Citation


in Harvard Style

Salam F. and Hervé G. (2008). NEW TIME-FREQUENCY VOWEL QUANTIZATION ENHANCED BY SUBBAND HIERARCHY . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008) ISBN 978-989-8111-60-9, pages 189-192. DOI: 10.5220/0001933601890192


in Bibtex Style

@conference{sigmap08,
author={Fraihat Salam and Glotin Hervé},
title={NEW TIME-FREQUENCY VOWEL QUANTIZATION ENHANCED BY SUBBAND HIERARCHY},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)},
year={2008},
pages={189-192},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001933601890192},
isbn={978-989-8111-60-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)
TI - NEW TIME-FREQUENCY VOWEL QUANTIZATION ENHANCED BY SUBBAND HIERARCHY
SN - 978-989-8111-60-9
AU - Salam F.
AU - Hervé G.
PY - 2008
SP - 189
EP - 192
DO - 10.5220/0001933601890192