Automatic Attendance Rating of Movie Content using Bag of Audio Words Representation

Avi Bleiweiss

doi:10.5220/0004604601420150

Automatic Attendance Rating of Movie Content using Bag of Audio Words Representation

Avi Bleiweiss

2013

Abstract

The sensory experience of watching a movie, links input from both sight and hearing modalities. Yet traditionally, the motion picture rating system largely relies on the visual content of the film, to make its informed decisions to parents. The current rating process is fairly elaborate. It requires a group of parents to attend a full screening, manually prepare and submit their opinions, and vote out the appropriate audience age for viewing. Rather, our work explores the feasibility of classifying age attendance of a movie automatically, resorting to solely analyzing the movie auditory data. Our high performance software records the audio content of the shorter movie trailer, and builds a labeled training set of original and artificially distorted clips. We use a bag of audio words to effectively represent the film sound track, and demonstrate robust and closely correlated classification accuracy, in exploiting boolean discrimination and ranked retrieval methods.

References

Baeza-Yates, R. and Ribeiro-Neto, B., editors (1999). Modern Information Retrieval. ACM Press Series/Addison Wesley, Essex, UK.
CARA (1968). Classification and Rating Administration. http://www.filmratings.com/.
Chechik, G., Ie, E., Rehn, M., Bengio, S., and Lyon, R. F. (2008). Large scale content-based audio retrieval from text queries. In ACM International Conference on Multimedia Information Retrieval (MIR), Vancouver, Canada.
Davis, S. B. and Marmelstien, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4):357-366.
Ellis, K., Coviello, E., and Lanckriet, G. R. (2011). Semantic annotation and retrieval of music using a bag of systems representation. In International Society for Music Information and Retrieval Conference (ISMIR), pages 723-728, Miami, FL.
Gersho, A. and Gray, R. M., editors (1992). Vector Quantization and Signal Compression. Kluwer Academic Publishers, Boston, MA.
(1990). Internet http://www.imdb.com/.
Joachims, T. (1999). Making large-scale svm learning practical. In Advances in Kernel Methods: Support Vector Learning, pages 169-184. MIT-Press.
Liu, Y., Zhao, W., Ngo, C., Xu, C., and Lu, H. (2010). Coherent bag-of audio words model for efficient largescale video copy detection. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR), pages 89-96, Xi'an, China.
Lloyd, S. P. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129-137.
MPAA (1922). Motion Picture Association of America. http://www.mpaa.org/.
Noll, A. M. (1964). Short-time spectrum and cepstrum techniques for vocal-pitch detection. Acoustical Sociecty of America, 36(2):296-302.
Pancoast, S. and Akbacak, M. (2012). Bag-of-audio-words approach for multimedia event classification. In Conference of the International Speech Communication Association, Portland, OR.
Perelygin, A. and Jones, M. R. (2011). Detecting audiovideo asynchrony. Machine Learning , Stanford, http://cs229.stanford.edu/projects2011.html.
Rabiner, L. R. and Schafer, R. W., editors (2007). Introduction to Digital Speech Processing. Now Publishers Inc., Hanover, MA.
Riley, M., Heinen, E., and Ghosh, J. (2008). A text retrieval approach to content-based audio retrieval. In International Society for Music Information and Retrieval Conference (ISMIR), pages 295-300, Philadelphia, PA.
Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing. Communication of the ACM, 18(11):613-620.
Wang, F., Sun, Z., Zhang, D., and Ngo, C. (2012). Semantic indexing and media event detection: ECNU at TRECVID 2012. In TREC Video Retrieval Evaluation Workshop (TRECVID), Gaithersburg, MD.
Wu, Z., Ke, Q., Sun, J., and Shum, H. Y. (2009). A multisample, multi-tree approach to bag-of-words image representation for image retrieval. In IEEE International Conference on Computer Vision, (ICCV), pages 1992-1999, Kyoto, Japan.
Yang, J., Jiang, Y., Hauptmann, A. G., and Ngo, C. (2007). Evaluating a bag-of-visual-words representations in scene classification. In ACM International Workshop on Multimedia Information Retrieval (MIR), pages 197-206, Bavaria, Germany.

Download

Paper Citation

in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2013)
TI - Automatic Attendance Rating of Movie Content using Bag of Audio Words Representation
SN - 978-989-8565-74-7
AU - Bleiweiss A.
PY - 2013
SP - 142
EP - 150
DO - 10.5220/0004604601420150

in Harvard Style

Bleiweiss A. (2013). Automatic Attendance Rating of Movie Content using Bag of Audio Words Representation . In Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2013) ISBN 978-989-8565-74-7, pages 142-150. DOI: 10.5220/0004604601420150

in Bibtex Style

@conference{sigmap13,
author={Avi Bleiweiss},
title={Automatic Attendance Rating of Movie Content using Bag of Audio Words Representation},
booktitle={Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2013)},
year={2013},
pages={142-150},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004604601420150},
isbn={978-989-8565-74-7},
}