# Random Projections with Control Variates

### Keegan Kang, Giles Hooker

#### Abstract

Random projections are used to estimate parameters of interest in large scale data sets by projecting data into a lower dimensional space. Some parameters of interest between pairs of vectors are the Euclidean distance and the inner product, while parameters of interest for the whole data set could be its singular values or singular vectors. We show how we can borrow an idea from Monte Carlo integration by using control variates to reduce the variance of the estimates of Euclidean distances and inner products by storing marginal information of our data set. We demonstrate this variance reduction through experiments on synthetic data as well as the colon and kos datasets. We hope that this inspires future work which incorporates control variates in further random projection applications.

#### References

- Achlioptas, D. (2003). Database-friendly Random Projections: Johnson-Lindenstrauss with Binary Coins. J. Comput. Syst. Sci., 66(4):671-687.
- Ailon, N. and Chazelle, B. (2009). The Fast JohnsonLindenstrauss Transform and Approximate Nearest Neighbors. SIAM J. Comput., 39(1):302-322.
- Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., and Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12):6745-6750.
- Boutsidis, C. and Gittens, A. (2012). Improved matrix algorithms via the subsampled randomized hadamard transform. CoRR, abs/1204.0062.
- Boutsidis, C., Zouzias, A., and Drineas, P. (2010). Random projections for k-means clustering. In Lafferty, J. D., Williams, C. K. I., Shawe-Taylor, J., Zemel, R. S., and Culotta, A., editors, Advances in Neural Information Processing Systems 23, pages 298-306. Curran Associates, Inc.
- Fern, X. Z. and Brodley, C. E. (2003). Random projection for high dimensional data clustering: A cluster ensemble approach. pages 186-193.
- Li, P. and Church, K. W. (2007). A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations. Comput. Linguist., 33(3):305-354.
- Li, P., Hastie, T., and Church, K. W. (2006a). Improving Random Projections Using Marginal Information. In Lugosi, G. and Simon, H.-U., editors, COLT, volume 4005 of Lecture Notes in Computer Science, pages 635-649. Springer.
- Li, P., Hastie, T. J., and Church, K. W. (2006b). Very Sparse Random Projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 7806, pages 287-296, New York, NY, USA. ACM.
- Liberty, E., Ailon, N., and Singer, A. (2008). Dense fast random projections and lean walsh transforms. In Goel, A., Jansen, K., Rolim, J. D. P., and Rubinfeld, R., editors, APPROX-RANDOM, volume 5171 of Lecture Notes in Computer Science, pages 512-522. Springer.
- Lichman, M. (2013). UCI machine learning repository.
- Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate analysis. Academic Press.
- Paul, S., Boutsidis, C., Magdon-Ismail, M., and Drineas, P. (2012). Random Projections for Support Vector Machines. CoRR, abs/1211.6085.
- Ross, S. M. (2006). Simulation, Fourth Edition. Academic Press, Inc., Orlando, FL, USA.
- Vempala, S. S. (2004). The Random Projection Method, volume 65 of DIMACS series in discrete mathematics and theoretical computer science. Providence, R.I. American Mathematical Society. Appendice p.101- 105.

#### Paper Citation

#### in Harvard Style

Kang K. and Hooker G. (2017). **Random Projections with Control Variates** . In *Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,* ISBN 978-989-758-222-6, pages 138-147. DOI: 10.5220/0006188801380147

#### in Bibtex Style

@conference{icpram17,

author={Keegan Kang and Giles Hooker},

title={Random Projections with Control Variates},

booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

year={2017},

pages={138-147},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0006188801380147},

isbn={978-989-758-222-6},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,

TI - Random Projections with Control Variates

SN - 978-989-758-222-6

AU - Kang K.

AU - Hooker G.

PY - 2017

SP - 138

EP - 147

DO - 10.5220/0006188801380147