SPEEDING UP LATENT SEMANTIC ANALYSIS - A Streamed Distributed Algorithm for SVD Updates

Radim Řehůřek

Abstract

Since its inception 20 years ago, Latent Semantic Analysis (LSA) has become a standard tool for robust, unsupervised inference of semantic structure from text corpora. At the core of LSA is the Singular Value Decomposition algorithm (SVD), a linear algebra routine for matrix factorization. This paper introduces a streamed distributed algorithm for incremental updates, which allows the factorization to be computed rapidly in a single pass over the input matrix on a cluster of autonomous computers.

References

  1. Berry, M. (1992). Large-scale sparse singular value computations. The International Journal of Supercomputer Applications, 6(1):13-49.
  2. Brand, M. (2006). Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and its Applications, 415(1):20-30.
  3. Comon, P. and Golub, G. (1990). Tracking a few extreme singular values and vectors in signal processing. Proceedings of the IEEE, 78(8):1327-1343.
  4. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American society for information science, 41(6):391-407.
  5. ehek, R. and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of LREC 2010 workshop on New Challenges for NLP Frameworks, pages 45-50.
  6. Golub, G. and Van Loan, C. (1996). Matrix computations. Johns Hopkins University Press.
  7. Levy, A. and Lindenbaum, M. (2000). Sequential Karhunen-Loeve basis extraction and its application to images. IEEE Transactions on Image processing, 9(8):1371.
  8. Salton, G. (1989). Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA.
  9. Vigna, S. (2008). Distributed, large-scale latent semantic analysis by index interpolation. In Proceedings of the 3rd international conference on Scalable information systems, pages 1-10. ICST.
  10. Zha, H., Marques, O., and Simon, H. (1998). Large-scale SVD and subspace-based methods for Information Retrieval. Solving Irregularly Structured Problems in Parallel, pages 29-42.
  11. Zha, H. and Simon, H. (1999). On updating problems in Latent Semantic Indexing. SIAM Journal on Scientific Computing, 21:782.
  12. Zha, H. and Zhang, Z. (2000). Matrices with low-rankplus-shift structure: Partial SVD and Latent Semantic Indexing. SIAM Journal on Matrix Analysis and Applications, 21:522.
Download


Paper Citation


in Harvard Style

Řehůřek R. (2011). SPEEDING UP LATENT SEMANTIC ANALYSIS - A Streamed Distributed Algorithm for SVD Updates . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8425-40-9, pages 446-451. DOI: 10.5220/0003191304460451


in Bibtex Style

@conference{icaart11,
author={Radim Řehůřek},
title={SPEEDING UP LATENT SEMANTIC ANALYSIS - A Streamed Distributed Algorithm for SVD Updates},
booktitle={Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2011},
pages={446-451},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003191304460451},
isbn={978-989-8425-40-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - SPEEDING UP LATENT SEMANTIC ANALYSIS - A Streamed Distributed Algorithm for SVD Updates
SN - 978-989-8425-40-9
AU - Řehůřek R.
PY - 2011
SP - 446
EP - 451
DO - 10.5220/0003191304460451