points to the centres of the clusters to which they
belong). The smaller the inertia, the better the
clustering is indicated. However, as the increase of k,
the inertia induction becomes progressively smaller
(Kodinariya, 2013). By analysing the Figure 1, k=2 is
chosen.
This Figure 2 shows the results of customer
clustering based on K-Means (k=2). Downscaling
high-dimension data to two dimensions by PCA For
easy visualisation and label the clustering result. Each
point in the graph represents a data sample and each
color represents a clustering result. There are two
main clusters represent in yellow and purpleX axis
and Y axis represent the first two principal
components of PCA respectively. These two
components capture the information of the largest
variance in the original high dimension data, which
makes different customer groups can be distinguished
in the reduced dimensionality space. PCA
Component 1 primarily reflects differences in
characteristics associated with age or annual income,
while PCA Component 2 is associated with
membership duration or fitness frequency. The purple
cluster represents a group of customers whose
projections on PCA Component 1 and Component 2
are centred on the lift side. Based on the properties of
PCA, these customers show similarity in the
characteristics of being order or having a low fitness
frequency. The yellow cluster is located on the right
side, these customers are younger, having a higher
fitness frequency or having a higher annual income
compared with the purple cluster customers. Two
advisers or gym managers. 1. As purple cluster
customers are old and have a low fitness frequency,
gym can provide more suitable low-intensity fitness
programmes such as yoga, tai chi or course of health
management. It can satisfy their need for fitness and
improve their participation and loyalty as well. For
the younger yellow cluster customers, they prefer
high fitness frequency which means that gym can add
high-intensity course or introduce more challenge
programmes such as Crossfit, HIIT training etc., to
keep their interest and activeness. 2. For higher
annual income yellow cluster customers, gym can
provide higher-end membership services such as
personalised fitness coaching, nutritional advice or
premium facility access. Also, the gym can consider
launching high-end membership plan to increase the
value-added experience for customers. For customers
with relatively low annual incomes, gym can provide
affordable membership program or promote special
offer to attract and retain them to make sure their
continued use of gym’s service.
4 CONCLUSIONS
This study successfully applied machine learning
algorithms to segment gym customers. By using K-
Means clustering and PCA for visualization, this
paper divided the customer data into distinct groups,
providing valuable insights for gym management.
The Elbow Method determined that the optimal
number of clusters was 2. The analysis highlighted
that while K-Means and PCA are effective tools for
customer segmentation, the limitations of these
techniques should be considered, especially when
dealing with complex high-dimensional data. Future
research should explore more advanced clustering
algorithms and dimensionality reduction techniques
to enhance the accuracy of clustering and the clarity
of visualizations. Additionally, incorporating other
customer data, such as behavioral patterns and
preferences, could further refine the segmentation
process and lead to more tailored marketing
strategies.
REFERENCES
Ahmed, M., Seraj, R., & Islam, S. M. S. 2020. The k-means
algorithm: A comprehensive survey and performance
evaluation. Electronics, 9(8), 1295.
Golub, T. R., et al. 1999. Molecular Classification of
Cancer: Class Discovery and Class Prediction by Gene
Expression Monitoring. Science
Hamerly, G., & Elkan, C. 2003. Learning the k in k-
means. Advances in neural information processing
systems, 16.
Han, J., Pei, J., & Kamber, M. 2011. Data mining: concepts
and techniques.
Kodinariya, T. M., & Makwana, P. R. 2013. Review on
determining the number of cluster in K-means
clustering. International Journal of Advance Research
in Computer Science and Management Studies, 1(6),
90-95.
Maćkiewicz, A., & Ratajczak, W. 1993. Principal
components analysis (PCA). Computers &
Geosciences, 19(3), 303-342.
MacQueen, J. 1967. Some methods for classification and
analysis of multivariate observations. Proceedings of
the 5th Berkeley Symposium on Mathematical
Statistics and Probability.
Moro, S., Cortez, P., & Rita, P. 2015. A data-driven
approach to predict the success of bank telemarketing.
Decision Support Systems, 62, 22-31.
Rigatti, S. J. 2017. Random forest. Journal of Insurance
Medicine, 47(1), 31-39.
Thorndike, R. L. 1953. Who belongs in the family?
Psychometrika, 18(4), 267–276.