A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD

Adam Kupryjanow, Andrzej Czyzewski

2011

Abstract

An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were analysed. Subjective tests were performed in order to compare the quality of the proposed method with the output of the standard SOLA algorithm. Accuracy of the ROS estimation was assessed to prove its robustness.

References

  1. Demol, M., Verhelst W., Struye K., Verhoeve P., 2005. Efficient Non-Uniform Time-Scaling of Speech with WSOLA. Speech and Computers (SPECOM).
  2. Demol, M., Verhelst W., Struye K., Verhoeve P., 2005. Efficient Non-Uniform Time-Scaling of Speech with WSOLA. Speech and Computers (SPECOM).
  3. Grofit, S., Lavner, Y., Jan. 2008. Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients.IEEE Trans. On audio, speech, and language processing, vol. 16, no. 1.
  4. Grofit, S., Lavner, Y., Jan. 2008. Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients.IEEE Trans. On audio, speech, and language processing, vol. 16, no. 1.
  5. Kupryjanow, A., Czyzewski, A., London, 2010. Real-time speech-rate modification experiments. Audio Engineering Society Convention Paper, preprint No. 8052.
  6. Kupryjanow, A., Czyzewski, A., London, 2010. Real-time speech-rate modification experiments. Audio Engineering Society Convention Paper, preprint No. 8052.
  7. Kupryjanow, A., Czyzewski, A., Poznan 2009. Time-scale modification of speech signals for supporting hearing impaired schoolchildren. Proc. of the International Conference NTAV/SPA, New Trends in Audio and Video, Signal Processing: Algorithms, Architectures, Arrangements and Applications, pp. 159-162.
  8. Kupryjanow, A., Czyzewski, A., Poznan 2009. Time-scale modification of speech signals for supporting hearing impaired schoolchildren. Proc. of the International Conference NTAV/SPA, New Trends in Audio and Video, Signal Processing: Algorithms, Architectures, Arrangements and Applications, pp. 159-162.
  9. Le Beux, S., Doval, B., d'Alessandro, C., 2010. Issues and solutions related to real-time TD-PSOLA implementation. Audio Engineering Society Convention Paper, Preprint No. 8085.
  10. Le Beux, S., Doval, B., d'Alessandro, C., 2010. Issues and solutions related to real-time TD-PSOLA implementation. Audio Engineering Society Convention Paper, Preprint No. 8085.
  11. Mirghafori, N., Fosler, E., Morgan, N. 1996. Towards Robustness to Fast Speech in ASR. Proc. ICASSP'96, pp. I335-338.
  12. Mirghafori, N., Fosler, E., Morgan, N. 1996. Towards Robustness to Fast Speech in ASR. Proc. ICASSP'96, pp. I335-338.
  13. Moattar, M., Homayounpour, M., Kalantari, N., 2010. A new approach for robust realtime voice activity detection using spectral pattern. ICASSP.
  14. Moattar, M., Homayounpour, M., Kalantari, N., 2010. A new approach for robust realtime voice activity detection using spectral pattern. ICASSP.
  15. Morgan, N., Fosler-Lussier ,E., Seattle, 1998. Combining multiple estimators of speaking rate.Seattle. ICASSP.
  16. Morgan, N., Fosler-Lussier ,E., Seattle, 1998. Combining multiple estimators of speaking rate.Seattle. ICASSP.
  17. Moulines, E., Laroche, J., 1995. Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16(2): 175-205.
  18. Moulines, E., Laroche, J., 1995. Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16(2): 175-205.
  19. Narayanan, S., Wang D., 2005. Speech rate estimation via temporal correlation andselected sub-band correlation. ICASSP.
  20. Narayanan, S., Wang D., 2005. Speech rate estimation via temporal correlation andselected sub-band correlation. ICASSP.
  21. Pesce, F., Italy, 2000. Realtime-stretching of speech signals. DAFX.
  22. Pesce, F., Italy, 2000. Realtime-stretching of speech signals. DAFX.
  23. Pfau, T., Ruske, G., 1998. Estimating the speaking rate by vowel detection. IEEE.
  24. Pfau, T., Ruske, G., 1998. Estimating the speaking rate by vowel detection. IEEE.
  25. Tallal ,P. et.al, 5 January, 1996. Language Comprehension in Language-Learning Impaired Children Improved with acoustically modified speech.Science, Vol. 271.
  26. Tallal ,P. et.al, 5 January, 1996. Language Comprehension in Language-Learning Impaired Children Improved with acoustically modified speech.Science, Vol. 271.
  27. Verhelst, W., Roelands, M., 1993. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech.
  28. Verhelst, W., Roelands, M., 1993. An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech.
  29. Zheng, J., Franco, H., Stolcke, A., 2000. Rate of Speech Modeling for Large Vocabulary Conversational Speech Recognition.
  30. Zheng, J., Franco, H., Stolcke, A., 2000. Rate of Speech Modeling for Large Vocabulary Conversational Speech Recognition.
  31. Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H., 2000.Word-level rate-of-speech modeling using ratespecificphones and pronunciations. In: Proc. IEEE Int. Conf.Acoust. Speech Signal Process, Istanbul, Vol. 3, pp. 1775-1778.
  32. Zheng, J., Franco, H., Weng, F., Sankar, A., Bratt, H., 2000.Word-level rate-of-speech modeling using ratespecificphones and pronunciations. In: Proc. IEEE Int. Conf.Acoust. Speech Signal Process, Istanbul, Vol. 3, pp. 1775-1778.
Download


Paper Citation


in Harvard Style

Kupryjanow A. and Czyzewski A. (2011). A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2011) ISBN 978-989-8425-72-0, pages 27-33. DOI: 10.5220/0003456300270033


in Harvard Style

Kupryjanow A. and Czyzewski A. (2011). A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2011) ISBN 978-989-8425-72-0, pages 27-33. DOI: 10.5220/0003456300270033


in Bibtex Style

@conference{sigmap11,
author={Adam Kupryjanow and Andrzej Czyzewski},
title={A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2011)},
year={2011},
pages={27-33},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003456300270033},
isbn={978-989-8425-72-0},
}


in Bibtex Style

@conference{sigmap11,
author={Adam Kupryjanow and Andrzej Czyzewski},
title={A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2011)},
year={2011},
pages={27-33},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003456300270033},
isbn={978-989-8425-72-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2011)
TI - A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD
SN - 978-989-8425-72-0
AU - Kupryjanow A.
AU - Czyzewski A.
PY - 2011
SP - 27
EP - 33
DO - 10.5220/0003456300270033


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2011)
TI - A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD
SN - 978-989-8425-72-0
AU - Kupryjanow A.
AU - Czyzewski A.
PY - 2011
SP - 27
EP - 33
DO - 10.5220/0003456300270033