Authors:
György Szaszák
1
;
Máté Ákos Tündik
1
and
András Beke
2
Affiliations:
1
Budapest University of Technology and Economics, Hungary
;
2
Research Institute for Linguistics of the Hungarian Academy of Sciences, Hungary
Keyword(s):
Audio, Speech, Summarization, Tokenization, Speech Recognition, Latent Semantic Indexing.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Pre-Processing and Post-Processing for Data Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
This paper addresses speech summarization of highly spontaneous speech. The audio signal is transcribed using
an Automatic Speech Recognizer, which operates at relatively high word error rates due to the complexity
of the recognition task and high spontaneity of speech. An analysis is carried out to assess the propagation
of speech recognition errors into syntactic parsing. We also propose an automatic, speech prosody based audio
tokenization approach and compare it to human performance. The so obtained sentence-like tokens are
analysed by the syntactic parser to help ranking based on thematic terms and sentence position. The thematic
term is expressed in two ways: TF-IDF and Latent Semantic Indexing. The sentence scores are calculated as
a linear combination of the thematic term score and a positional score. The summary is generated from the
top 10 candidates. Results show that prosody based tokenization reaches human average performance and that
speech recognition errors p
ropagate moderately into syntactic parsing (POS tagging and dependency parsing).
Nouns prove to be quite error resistant. Audio summarization shows 0.62 recall and 0.79 precision by an
F-measure of 0.68, compared to human reference. A subjective test is also carried out on a Likert-scale. All
results apply to spontaneous Hungarian.
(More)