Post Lasso Stability Selection for High Dimensional Linear Models

Niharika Gauraha, Tatyana Pavlenko, Swapan k. Parui

Abstract

Lasso and sub-sampling based techniques (e.g. Stability Selection) are nowadays most commonly used methods for detecting the set of active predictors in high-dimensional linear models. The consistency of the Lassobased variable selection requires the strong irrepresentable condition on the design matrix to be fulfilled, and repeated sampling procedures with large feature set make the Stability Selection slow in terms of computation time. Alternatively, two-stage procedures (e.g. thresholding or adaptive Lasso) are used to achieve consistent variable selection under weaker conditions (sparse eigenvalue). Such two-step procedures involve choosing several tuning parameters that seems easy in principle, but difficult in practice. To address these problems efficiently, we propose a new two-step procedure, called Post Lasso Stability Selection (PLSS). At the first step, the Lasso screening is applied with a small regularization parameter to generate a candidate subset of active features. At the second step, Stability Selection using weighted Lasso is applied to recover the most stable features from the candidate subset. We show that under mild (generalized irrepresentable) condition, this approach yields a consistent variable selection method that is computationally fast even for a very large number of variables. Promising performance properties of the proposed PLSS technique are also demonstrated numerically using both simulated and real data examples.

References

  1. Bach, F. R. (2008). Bolasso: model consistent lasso estimation through the bootstrap. Proceedings of the 25th international conference on Machine learning, ACM, pages 33-40.
  2. Bühlmann, P., Kalisch, M., and Meier, L. (2014). Highdimensional statistics with a view towards applications in biology. Annual Review of Statistics and its Applications, 1:255-278.
  3. Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Verlag.
  4. Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. Ann. Statist., 32(2):407-499.
  5. Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning; Data Mining, Inference and Prediction. New York: Springer.
  6. Javanmard, A. and Montanari, A. (2013). Model selection for high-dimensional regression under the generalized irrepresentability condition. In Proceedings of the 26th International Conference on Neural Information Processing Systems, pages 3012-3020.
  7. Meinshausen, N. (2007). Relaxed lasso. Computational Statistics and Data Analysis, 52(1):374-393.
  8. Meinshausen, N. and Bühlmann, P. (2006). Highdimensional graphs and variable selection with the lasso. Annals of Statistics, 34:1436-1462.
  9. Meinshausen, N. and Bühlmann, P. (2010). Stability selection (with discussion). J. R. Statist. Soc, 72:417-473.
  10. Tian, E., Zhan, F., Walker, R., Rasmussen, E., Ma, Y., Barlogie, B., and Shaughnessy, J. J. (2003). The role of the wnt-signaling antagonist dkk1 in the development of osteolytic lesions in multiple myeloma. N Engl J Med., 349(26):2483-2494.
  11. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc, 58:267-288.
  12. van de Geer, S., Bhlmann, P., and Zhou, S. (2011). The adaptive and the thresholded lasso for potentially misspecified models (and a lower bound for the lasso). Electron. J. Statist., 5:688-749.
  13. Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541-2563.
  14. Zhou, S. (2009). Thresholded lasso for high dimensional variable selection and statistical estimation. NIPS, pages 2304-2312.
  15. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418-1429.
Download


Paper Citation


in Harvard Style

Gauraha N., Pavlenko T. and Parui S. (2017). Post Lasso Stability Selection for High Dimensional Linear Models . In Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-222-6, pages 638-646. DOI: 10.5220/0006244306380646


in Bibtex Style

@conference{icpram17,
author={Niharika Gauraha and Tatyana Pavlenko and Swapan k. Parui},
title={Post Lasso Stability Selection for High Dimensional Linear Models},
booktitle={Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2017},
pages={638-646},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006244306380646},
isbn={978-989-758-222-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Post Lasso Stability Selection for High Dimensional Linear Models
SN - 978-989-758-222-6
AU - Gauraha N.
AU - Pavlenko T.
AU - Parui S.
PY - 2017
SP - 638
EP - 646
DO - 10.5220/0006244306380646