Transcript Segmentation Using Utterance Cosine Similarity Measure

Caroline Chibelushi, Bernadette Sharp, Andy Salter

Abstract

One of the problems addressed by the Tracker project is the extraction of the key issues discussed at meetings through the analysis of transcripts. Whilst the task of topic extraction is an easy task for humans it has proven difficult task to automate given the unstructured nature of our transcripts. This paper proposes a new approach to transcript segmentation based on the Utterance Cosine Similarity (UCS) method. Our segmentation approach is based on the notion of semantic similarity of utterances within the transcripts that measures the content similarity, semantic relationships, and use distance to differentiate same topics that appear in different context. The method is illustrated using one of the 17 transcripts in our study.

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J. & Yang, Y. 1998, 'Topic Detection and Tracking: Final Report', in Proceedings of the DARPA Broadcast news Transcription and Understanding Workshop.
  2. Barzilay, R. & Elhadad, M. 1999, 'Using Lexical Chains for Text Summarization', in Advances in Automatic Text Summarization, eds. Mani, I., et al., MIT Press, Cambridge, MA, Madrid, Spain, pp. 111--121.
  3. Beeferman, D., Berger, A. & Laffety, J. 1999, 'Statistical Models for Text Segmentation', Machine Learning, Special Issue on Natural Language Processing, vol. 34, no. 1-3, pp. 177-210.
  4. Boguraev, B. K. & Neff, M. S. 2000, 'Discourse segmentation in aid of document summarization', in Proceedings of 33rd Annual Hawaii International Conference on System Sciences, (HICSS), IEEE, Maui, Hawaii, pp. 778-787.
  5. Chibelushi, C., Sharp, B. & Salter, A. 2004, 'A Text Mining Approach to Tracking Elements of Decision Making: a pilot study', in Proceeding of the 1st International Workshop on Natural Language Understanding and Cognitive Science, NLUCS 2004, ed. Sharp, B., INSTICC Press, Porto, Portugal, pp. 51-63.
  6. Choi, F., Wiemer-Hastings, P. & Moore, J. 2001, 'Latent Semantic Analysis for Text Segmentation', in Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing., pp. 109 - 117.
  7. Choi, F. Y. Y. 2000, 'Advances in domain independent linear text segmentation', in Proceedings of NAACL00, Seattle.
  8. Green, S. 1997, Automatically Generating Hypertext By Comparing Semantic Similarity, University of Toronto, Technical Report number 366.
  9. Hearst, M. 1994, 'Multi-paragraph Segmentation of Expository Text', in Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pp. 9-16.
  10. Katz, S. M. 1996, 'Distribution of Context Words and Phrases in Text and Language Modelling', Natural language Engineering, vol. 2, no. 1, pp. 15-59.
  11. Mintzberg, H., Waters, J., Pettigrew, A. M. & Butler, R. 1990, 'Studying deciding: an exchange of views between Mintzberg and Waters, Pettgrew and Butler', Organisation Studies, vol. 11, no. 1, pp. 1-16.
  12. Passoneau, R. & Litman, D. 1993, 'Intention-based Segmentation: Human Reliability and Correlation with Linguistic Cues.78 in Proceedings of the 31st Annual Meeting of the Association of Computational Linguistics (ACL-93), pp. 148-155.
  13. Rayson, P. 2001, 'Wmatrix: a Web-based Corpus Processing Environment.78 in ICAME 2001 Conference, Université Catholique de Louvain, Belgium.
  14. Reynar, J. 1998, Topic Segmentation: Algorithms and Applications, University of Pennsylvania.
  15. Richmond, K., Smith, A. & Amitay, E. 1997, 'Detecting Subject Boundaries within Text: A language independent statistical approach', in Proceedings of the 2nd Conference on Emperical Methods in Natural Language Processing, Rhode Island, USA.
  16. Stokes, N. 2003, 'Spoken and Written News Story Segmentation using Lexical Chains', in Proceedings of HLT-NAACL, Student Research Workshop, Edmonton, pp. 49-54.
  17. Yamron, J., Carp, I., Gillick, L., Lowe, S. & Mulbregt, P. V. 1998, 'A Hidden Markov Model Approach to Text Segmentation and Event Tracking', in Proceedings of ICASSP'98, IEEE, Seatle, WA, pp. 333-336.
Download


Paper Citation


in Harvard Style

Chibelushi C., Sharp B. and Salter A. (2005). Transcript Segmentation Using Utterance Cosine Similarity Measure . In Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005) ISBN 972-8865-23-6X, pages 78-90. DOI: 10.5220/0002560900780090


in Bibtex Style

@conference{nlucs05,
author={Caroline Chibelushi and Bernadette Sharp and Andy Salter},
title={Transcript Segmentation Using Utterance Cosine Similarity Measure},
booktitle={Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)},
year={2005},
pages={78-90},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002560900780090},
isbn={972-8865-23-6X},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Workshop on Natural Language Understanding and Cognitive Science - Volume 1: NLUCS, (ICEIS 2005)
TI - Transcript Segmentation Using Utterance Cosine Similarity Measure
SN - 972-8865-23-6X
AU - Chibelushi C.
AU - Sharp B.
AU - Salter A.
PY - 2005
SP - 78
EP - 90
DO - 10.5220/0002560900780090