ClustSize: An Algorithmic Framework for Size-Constrained Clustering
Diego Vallejo-Huanga, Diego Vallejo-Huanga, Cèsar Ferri, Fernando Martínez-Plumed
2025
Abstract
Size-constrained clustering addresses a fundamental need in many real-world applications by ensuring that clusters adhere to user-specified size limits, whether to balance groups or to satisfy domain-specific requirements. In this paper, we present ClustSize, an interactive web platform that implements two advanced algorithms: K-MedoidsSC and CSCLP, to perform real-time clustering of tabular data under strict size constraints. Developed in R Studio using the Shiny framework and deployed on Shinyapps.io, ClustSize not only enforces precise cluster cardinalities, but also facilitates dynamic parameter tuning and visualisation for enhanced user exploration. We comprehensive validate its performance through comprehensive benchmarking, also evaluating runtime, RAM usage, load, and stress conditions, and gather usability insights via user surveys. Post-deployment evaluations confirm that both algorithms consistently produce clusters that exactly meet the specified size limits, and that the system reliably supports up to 50 concurrent users and maintains functionality under stress, processing approximately 90 requests in 5 seconds. These results highlight the potential of integrating advanced size-constrained clustering into interactive web platforms for practical data analysis.
DownloadPaper Citation
in Harvard Style
Vallejo-Huanga D., Ferri C. and Martínez-Plumed F. (2025). ClustSize: An Algorithmic Framework for Size-Constrained Clustering. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 481-490. DOI: 10.5220/0013558900003967
in Bibtex Style
@conference{data25,
author={Diego Vallejo-Huanga and Cèsar Ferri and Fernando Martínez-Plumed},
title={ClustSize: An Algorithmic Framework for Size-Constrained Clustering},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={481-490},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013558900003967},
isbn={978-989-758-758-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - ClustSize: An Algorithmic Framework for Size-Constrained Clustering
SN - 978-989-758-758-0
AU - Vallejo-Huanga D.
AU - Ferri C.
AU - Martínez-Plumed F.
PY - 2025
SP - 481
EP - 490
DO - 10.5220/0013558900003967
PB - SciTePress