Auditory Features Analysis for BIC-based Audio Segmentation

Tomasz Maka

2014

Abstract

Audio segmentation is one of the stages in audio processing chain whose accuracy plays a primary role in the final performance of the audio recognition and processing tasks. This paper presents an analysis of auditory features for audio segmentation. A set of features is derived from a time-frequency representation of an input signal and has been calculated based on properties of human auditory system. An analysis of several sets of audio features efficiency for BIC-based audio segmentation has been performed. The obtained results show that auditory features derived from different frequency scales are competitive to the widely usedMFCC feature in terms of accuracy and the number of detected points.

References

  1. Castan, D., Ortega, A., Villalba, J., Miguel, A., and Lleida, E. (2013). Segmentation-by-classification system based on factor analysis. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 783-787.
  2. Cettolo, M. and Vescovi, M. (2003). Efficient audio segmentation algorithms based on the bic. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2003).
  3. Chen, S. and Gopalakrishnan, P. (1998). Speaker, environment and channel change detection and clustering via the bayesian information criterion,. In In Proc. DARPA Broadcast News Transcription and Understanding Workshop.
  4. Cheng, S. and Wang, H. (2003). A sequential metric-based audio segmentation method via the bayesian information criterion. In Proceedings EUROSPEECH 2003, Geneva, Switzerland.
  5. Cheng, S., Wang, H., and Fu, H. (2008). Bic-based audio segmentation by divide-and-conquer. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008).
  6. Cooke, M. (2005). Modelling Auditory Processing and Organisation. Cambridge University Press.
  7. Foote, J. and Cooper, M. (2003). Media segmentation using self-similarity decomposition. In SPIE Storage and Retrieval for Multimedia Databases, volume 5021, pages 167-175.
  8. Garofolo, J., Fiscus, J., and Le, A. (2004). 2002 Rich Transcription Broadcast News and Conversational Telephone Speech. Linguistic Data Consortium.
  9. Ghitza, O. (1994). Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Transactions on Speech Audio Processing, 2:115-132.
  10. Rabiner, L. and Schafer, W. (2010). Theory and Applications of Digital Speech Processing. Prentice-Hall, 1st edition.
  11. Shao, Y. and Wang, D. (2009). Robust speaker identification using auditory features and computational auditory scene analysis. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009).
  12. Smith, J. (2011). Spectral Audio Signal Processing. W3K Publishing, 1st edition.
  13. Wang, D. and Brown, G. J. (2006). Computational Auditory Scene Analysis. John Wiley & Sons, Inc., 1st edition.
  14. Wu, C. and Hsieh, C. (2006). Multiple change-point audio segmentation and classification using an mdl-based gaussian model. IEEE Transactions on Audio, Speech, and Language Processing, 14(2).
  15. Xue, H., Li, H., Gao, C., and Shi, Z. (2010). Computationally efficient audio segmentation through a multi-stage bic approach. In 3rd International Congress on Image and Signal Processing (CISP2010).
Download


Paper Citation


in Harvard Style

Maka T. (2014). Auditory Features Analysis for BIC-based Audio Segmentation . In Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2014) ISBN 978-989-758-046-8, pages 48-53. DOI: 10.5220/0005063800480053


in Bibtex Style

@conference{sigmap14,
author={Tomasz Maka},
title={Auditory Features Analysis for BIC-based Audio Segmentation},
booktitle={Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2014)},
year={2014},
pages={48-53},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005063800480053},
isbn={978-989-758-046-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2014)
TI - Auditory Features Analysis for BIC-based Audio Segmentation
SN - 978-989-758-046-8
AU - Maka T.
PY - 2014
SP - 48
EP - 53
DO - 10.5220/0005063800480053