Toward a Guide Agent who Actively Intervene Inter-user Conversation – Timing Definition and Trial of Automatic Detection using Low-level Nonverbal Features

Hung-Hsuan Huang; Shochi Otogi; Ryo Hotta; Kyoji Kawagoe

doi:10.5220/0005759704590464

Toward a Guide Agent who Actively Intervene Inter-user Conversation – Timing Definition and Trial of Automatic Detection using Low-level Nonverbal Features

Hung-Hsuan Huang, Shochi Otogi, Ryo Hotta, Kyoji Kawagoe

2016

Abstract

As the advance of embodied conversational agent (ECA) technologies, there are more and more real-world deployed applications of ECA’s. The guides in museums or exhibitions are typical examples. However, in these situations, the agent systems usually need to engage groups of visitors rather than individual ones. In such a multi-user situation, which is much more complex than single user one, specialized additional features are required. One of them is the ability for the agent to smoothly intervene user-user conversation. In order to realize this, at first, a Wizard-of-Oz (WOZ) experiment was conducted for collecting human interaction data. By analyzing the collected data corpus, four kinds of timings that potentially allow the agent to do intervention were found. The collected corpus was then annotated with these defined timings by recruited evaluators with a dedicated and intuitive tool. Finally, as the trial of the possibility of automatic detection on these timings, the use of non-verbal low level features were able to achieve a moderate accuracy.

References

Argyle, M. and Cook, M. (1976). Gaze and Mutual Gaze. Cambridge University Press.
Baba, N., Huang, H.-H., and Nakano, Y. (2012). Addressee identification for human-human-agent multiparty conversations in different proxemics. In 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Eye Gaze and Multimodality,14th International Conference on Multimodal Interaction (ICMI 2012).
Bohus, D. and Horvitz, E. (2010). Facilitating multiparty dialog with gaze, gesture, and speech. In In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction.
Clark, H. H. and Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13:259-294.
Duncan, S. (1972). Some signals and rules for taking speaking turns in conversations. Journal of Personality and Psychology, 23(2):283-292.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The weka data mining software: An update. ACM SIGKDD Explorations, 11(1):11-18.
Huang, H.-H., Baba, N., and Nakano, Y. (2011). Making virtual conversational agent aware of the addressee of users' utterances in multi-user conversation from nonverbal information. In 13th International Conference on Multimodal Interaction (ICMI'11), pages 401-408.
Jonsdottir, G. R. and Thorisson, K. (2009). Teaching computers to conduct spoken interviews: Breaking the realtime barrier with learning. In Ruttkay, Z., Kipp, M., Nijholt, A., Hogni, H., and Vilhjalmsson, editors, 9th International Conference on Intelligent Virtual Agents (IVA'09), volume 5773/2009 of LNCS, pages 446- 459, Amsterdam, Netherlands. Springer Berlin.
Kendon, A. (1967). some functions of gaze direction in social interaction. Acta Psychologica, 26:22-63.
Kopp, S., Gesellensetter, L., Kramer, N. C., and Wachsmuth, I. (2005). A conversational agent as museum guide - design and evaluation of a real-world application. In Proceedings of the 5th International Conference on Intelligent Virtual Agents (IVA'05), Kos, Greece.
Nakano, M., Dohsaka, K., Miyazaki, N., ichi Hirasawa, J., Tamoto, M., Kawamori, M., Sugiyama, A., and Kawabata, T. (1999). Handling rich turn-taking in spoken dialogue systems. In European Conference on Speech Communication and Technology (EUROSPEECH'99).
Renals, S., Hain, T., and Bourlard, H. (2007). Recognition and understanding of meetings the ami and amida projects. In IEEE Workhshop on Automatic Speech Recognition and Understanding (ASRU'07).
Sacks, H., Schegloff, E. A., and Jefferson, G. (1974). a simplest systematics for the organization of turn-taking for conversation. language, 50(4):696-735.
Subramanian, R., Staiano, J., Kalimeri, K., Sebe, N., and Pianesi, F. (2010). Putting the pieces together: Multimodal analysis of social attention in meetings. In Proceedings of the International Conference on Multimedia, pages 659-662.
Takemae, Y., Otsuka, K., and Mukawa, N. (2003). Video cut editing rule based on participants' gaze in multiparty conversation. In 11th ACM International Conference on Multimedia.
Traum, D. (2003). Issues in multiparty dialogues. In Advances in Agent Communication, International Workshop on Agent Communication Languages (ACL'03), pages 201-211.
Traum, D., Aggarwal, P., Artstein, R., Foutz, S., Gerten, J., Katsamanis, A., Leuski, A., Noren, D., and Swartout, W. (2012). Ada and grace: Direct interaction with museum visitors. In 12th International Conference on Intelligent Virtual Agents (IVA 2012), pages 245-251.
Waibel, A., Bett, M., Finke, M., and Stiefelhagen, R. (1998). Meeting browser: Tracking and summarizing meetings. In DARPA Broadcast News Transcription and Understanding Workshop, pages 281-286.
Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., and Zechner, K. (2001). Advances in automatic meeting record creation and access. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing 2001 (ICASSP'01), Seattle, USA.

Download

Paper Citation

in Harvard Style

Huang H., Otogi S., Hotta R. and Kawagoe K. (2016). Toward a Guide Agent who Actively Intervene Inter-user Conversation – Timing Definition and Trial of Automatic Detection using Low-level Nonverbal Features . In Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-172-4, pages 459-464. DOI: 10.5220/0005759704590464

in Bibtex Style

@conference{icaart16,
author={Hung-Hsuan Huang and Shochi Otogi and Ryo Hotta and Kyoji Kawagoe},
title={Toward a Guide Agent who Actively Intervene Inter-user Conversation – Timing Definition and Trial of Automatic Detection using Low-level Nonverbal Features},
booktitle={Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2016},
pages={459-464},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005759704590464},
isbn={978-989-758-172-4},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Toward a Guide Agent who Actively Intervene Inter-user Conversation – Timing Definition and Trial of Automatic Detection using Low-level Nonverbal Features
SN - 978-989-758-172-4
AU - Huang H.
AU - Otogi S.
AU - Hotta R.
AU - Kawagoe K.
PY - 2016
SP - 459
EP - 464
DO - 10.5220/0005759704590464