An Approach to Off-talk Detection based on Text Classification within an Automatic Spoken Dialogue System

Oleg Akhtiamov, Roman Sergienko, Wolfgang Minker

2016

Abstract

This paper describes the problem of the off-talk detection within an automatic spoken dialogue system. The considered corpus contains realistic conversations between two users and an SDS. A two- (on-talk and off-talk) and a three-class (on-talk, problem-related off-talk, and irrelevant off-talk) problem statement are investigated using a speaker-independent approach to cross-validation. A novel off-talk detection approach based on text classification is proposed. Seven different term weighting methods and two classification algorithms are considered. As a dimensionality reduction method, a feature transformation based on term belonging to classes is applied. The comparative analysis of the proposed approach and a baseline one is performed; as a result, the best combinations of the text pre-processing methods and classification algorithms are defined for both problem statements. The novel approach demonstrates significantly better classification effectiveness in comparison with the baseline for the same task.

References

  1. Sebastiani, F. 2002. Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1-47.
  2. Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5):513-523.
  3. Debole, F. and Sebastiani, F. 2004. Supervised term weighting for automated text categorization. Text mining and its applications:81-97. Springer Berlin Heidelberg.
  4. Soucy P. and Mineau G. W. 2005. Beyond TFIDF Weighting for Text Categorization in the Vector Space Model. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005):1130-1135.
  5. Xu, H. and Li, C. 2007. A Novel term weighting scheme for automated text Categorization. Intelligent Systems Design and Applications, 2007. ISDA 2007. Seventh International Conference on:759-764. IEEE.
  6. Lan, M., Tan, C. L., Su, J., and Lu, Y. 2009. Supervised and traditional term weighting methods for automatic text categorization. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(4):721-735.
  7. Ko, Y. 2012. A study of term weighting schemes using class information for text classification. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval:1029-1030. ACM.
  8. Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E., and Minker, W. 2014. Opinion Mining and Topic Categorization with Novel Term Weighting. Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Association for Computational Linguistics, Baltimore, Maryland, USA, 84-89.
  9. Fan, R. E., Chang, K. W., Hsieh C. J., Wang X. R., Lin C. J. 2008. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9, 1871-1874.
  10. Yang, Y., and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. ICML, vol. 9:412-420.
  11. Batliner, A., Hacker, C., and Noth, E. 2006. To Talk or not to Talk with a Computer: On-Talk vs. Off-Talk. How People Talk to Computers, Robots, and Other Artificial Communication Partners, 79-100.
  12. Batliner, A., Fischer, K., Huber, R., Spilker, J., Noth, E. 2003. How to Find Trouble in Communication. Speech Communication, 40, 117-143.
  13. Batliner, A., Nutt, M., Warnke, V., Noth, E., Buckow, J., Huber, R., Niemann, H. 1999. Automatic Annotation and Classification of Phrase Accents in Spontaneous Speech. Proc. of Eurospeech99, 519-522.
  14. Zhou, Y., Li, Y., and Xia, S. 2009. An improved KNN text classification algorithm based on clustering. Journal of computers, 4(3), 230-237.
  15. Sergienko, R., Muhammad, S., and Minker, W. 2016. A comparative study of text preprocessing approaches for topic detection of user utterances. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016).
  16. Baeza-Yates, R; Ribeiro-Neto, B. 1999. Modern Information Retrieval. New York, NY: ACM Press, Addison-Wesley, 75.
  17. Shriberg, E., Stolcke, A., Hakkani-Tur, D., Heck, L. 2012. Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog. Proceedings of Interspeech 2012, 334-337.
  18. Shafait, F., Reif, M., Kofler, C., and Breuel, T. M. 2010. Pattern recognition engineering. In: RapidMiner Community Meeting and Conference, Citeseer, vol 9.
Download


Paper Citation


in Harvard Style

Akhtiamov O., Sergienko R. and Minker W. (2016). An Approach to Off-talk Detection based on Text Classification within an Automatic Spoken Dialogue System . In Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, ISBN 978-989-758-198-4, pages 288-293. DOI: 10.5220/0005977802880293


in Bibtex Style

@conference{icinco16,
author={Oleg Akhtiamov and Roman Sergienko and Wolfgang Minker},
title={An Approach to Off-talk Detection based on Text Classification within an Automatic Spoken Dialogue System},
booktitle={Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,},
year={2016},
pages={288-293},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005977802880293},
isbn={978-989-758-198-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,
TI - An Approach to Off-talk Detection based on Text Classification within an Automatic Spoken Dialogue System
SN - 978-989-758-198-4
AU - Akhtiamov O.
AU - Sergienko R.
AU - Minker W.
PY - 2016
SP - 288
EP - 293
DO - 10.5220/0005977802880293