Comparing Topic Models for a Movie Recommendation System

Sonia Bergamaschi, Laura Po, Serena Sorrentino

2014

Abstract

Recommendation systems have become successful at suggesting content that are likely to be of interest to the user, however their performance greatly suffers when little information about the users preferences are given. In this paper we propose an automated movie recommendation system based on the similarity of movie: given a target movie selected by the user, the goal of the system is to provide a list of those movies that are most similar to the target one, without knowing any user preferences. The Topic Models of Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA) have been applied and extensively compared on a movie database of two hundred thousand plots. Experiments are an important part of the paper; we examined the topic models behaviour based on standard metrics and on user evaluations, we have conducted performance assessments with 30 users to compare our approach with a commercial system. The outcome was that the performance of LSA was superior to that of LDA in supporting the selection of similar plots. Even if our system does not outperform commercial systems, it does not rely on human effort, thus it can be ported to any domain where natural language descriptions exist. Since it is independent from the number of user ratings, it is able to suggest famous movies as well as old or unheard movies that are still strongly related to the content of the video the user has watched.

References

  1. Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. on Knowl. and Data Eng., 17(6):734-749.
  2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993-1022.
  3. Debnath, S., Ganguly, N., and Mitra, P. (2008). Feature weighting in content based recommendation system using social network analysis. In Proceedings of the 17th international conference on World Wide Web, WWW 7808, pages 1041-1042, New York, NY, USA. ACM.
  4. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407.
  5. Dumais, S. T. (2004). Latent semantic analysis. Annual Review of Information Science and Technology, 38(1):188-230.
  6. Ekstrand, M. D., Riedl, J. T., and Konstan, J. A. (2011). Collaborative filtering recommender systems. Found. Trends Hum.-Comput. Interact., 4(2):81-173.
  7. Farinella, T., Bergamaschi, S., and Po, L. (2012). A nonintrusive movie recommendation system. In OTM Conferences (2), pages 736-751.
  8. Gemulla, R., Nijkamp, E., Haas, P. J., and Sismanis, Y. (2011). Large-scale matrix factorization with distributed stochastic gradient descent. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7811, pages 69-77, New York, NY, USA. ACM.
  9. Griffiths, T., Steyvers, M., and Tenenbaum, J. (2007). Topics in semantic representation. Psychological Review, 114(2):211-244.
  10. Gunawardana, A. and Shani, G. (2009). A survey of accuracy evaluation metrics of recommendation tasks. The Journal of Machine Learning Research, 10:2935- 2962.
  11. Jin, X., Mobasher, B., and Zhou, Y. (2005). A web recommendation system based on maximum entropy. In ITCC (1), pages 213-218. IEEE Computer Society.
  12. Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8):30-37.
  13. Krestel, R., Fankhauser, P., and Nejdl, W. (2009). Latent dirichlet allocation for tag recommendation. In Bergman, L. D., Tuzhilin, A., Burke, R. D., Felfernig, A., and Schmidt-Thieme, L., editors, RecSys, pages 61-68. ACM.
  14. Lee, M. D. and Welsh, M. (2005). An empirical evaluation of models of text document similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, CogSci2005, pages 1254-1259. Erlbaum.
  15. Moshfeghi, Y., Piwowarski, B., and Jose, J. M. (2011). Handling data sparsity in collaborative filtering using emotion and semantic based features. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 7811, pages 625-634, New York, NY, USA. ACM.
  16. Musto, C. (2010). Enhanced vector space models for content-based recommender systems. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 7810, pages 361-364, New York, NY, USA. ACM.
  17. Navigli, R. (2009). Word sense disambiguation: A survey. ACM Comput. Surv., 41(2).
  18. Park, L. A. F. and Ramamohanarao, K. (2009). An analysis of latent semantic term self-correlation. ACM Trans. Inf. Syst., 27(2):8:1-8:35.
  19. Po, L. and Sorrentino, S. (2011). Automatic generation of probabilistic relationships for improving schema matching. Inf. Syst., 36(2):192-208.
  20. Rashid, A. M., Karypis, G., and Riedl, J. (2008). Learning preferences of new users in recommender systems: an information theoretic approach. SIGKDD Explor. Newsl., 10(2):90-100.
  21. R?ehu°r?ek, R. and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45-50, Valletta, Malta. ELRA. http://is.muni.cz/publication/884893/en.
  22. Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18:613-620.
  23. Shi, Y., Larson, M., and Hanjalic, A. (2013). Mining contextual movie similarity with matrix factorization for context-aware recommendation. ACM Trans. Intell. Syst. Technol., 4(1):16:1-16:19.
  24. Sorrentino, S., Bergamaschi, S., Gawinecki, M., and Po, L. (2010). Schema label normalization for improving schema matching. Data Knowl. Eng., 69(12):1254- 1273.
Download


Paper Citation


in Harvard Style

Bergamaschi S., Po L. and Sorrentino S. (2014). Comparing Topic Models for a Movie Recommendation System . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-758-024-6, pages 172-183. DOI: 10.5220/0004835601720183


in Bibtex Style

@conference{webist14,
author={Sonia Bergamaschi and Laura Po and Serena Sorrentino},
title={Comparing Topic Models for a Movie Recommendation System},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2014},
pages={172-183},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004835601720183},
isbn={978-989-758-024-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - Comparing Topic Models for a Movie Recommendation System
SN - 978-989-758-024-6
AU - Bergamaschi S.
AU - Po L.
AU - Sorrentino S.
PY - 2014
SP - 172
EP - 183
DO - 10.5220/0004835601720183