A Gradient Descent based Heuristic for Solving Regression Clustering Problems

Enis Kayış

Abstract

Regression analysis is the method of quantifying the effects of a set of independent variables on a dependent variable. In regression clustering problems, the data points with similar regression estimates are grouped into the same cluster either due to a business need or to increase the statistical significance of the resulting regression estimates. In this paper, we consider an extension of this problem where data points belonging to the same level of another partitioning categorical variable should belong to the same partition. Due to the combinatorial nature of this problem, an exact solution is computationally prohibitive. We provide an integer programming formulation and offer gradient descent based heuristic to solve this problem. Through simulated datasets, we analyze the performance of our heuristic across a variety of different settings. In our computational study, we find that our heuristic provides remarkably better solutions than the benchmark method within a reasonable time. Albeit the slight decrease in the performance as the number of levels increase, our heuristic provides good solutions when each of the true underlying partition has a similar number of levels.

Download


Paper Citation