Towards Large-scale Gaussian Process Models for Efficient Bayesian Machine Learning

Fabian Berns, Christian Beecks

Abstract

Gaussian Process Models (GPMs) are applicable for a large variety of different data analysis tasks, such as time series interpolation, regression, and classification. Frequently, these models of bayesian machine learning instantiate a Gaussian Process by a zero-mean function and the well-known Gaussian kernel. While these default instantiations yield acceptable analytical quality for many use cases, GPM retrieval algorithms allow to automatically search for an application-specific model suitable for a particular dataset. State-of-the-art GPM retrieval algorithms have only been applied for small datasets, as their cubic runtime complexity impedes analyzing datasets beyond a few thousand data records. Even though global approximations of Gaussian Processes extend the applicability of those models to medium-sized datasets, sets of millions of data records are still far beyond their reach. Therefore, we develop a new large-scale GPM structure, which incorporates a divide-&-conquer-based paradigm and thus enables efficient GPM retrieval for large-scale data. We outline challenges concerning this newly developed GPM structure regarding its algorithmic retrieval, its integration with given data platforms and technologies, as well as cross-model comparability and interpretability.

Download


Paper Citation