ClustSize: An Algorithmic Framework for Size-Constrained Clustering

Diego Vallejo-Huanga, Diego Vallejo-Huanga, Cèsar Ferri, Fernando Martínez-Plumed

2025

Abstract

Size-constrained clustering addresses a fundamental need in many real-world applications by ensuring that clusters adhere to user-specified size limits, whether to balance groups or to satisfy domain-specific requirements. In this paper, we present ClustSize, an interactive web platform that implements two advanced algorithms: K-MedoidsSC and CSCLP, to perform real-time clustering of tabular data under strict size constraints. Developed in R Studio using the Shiny framework and deployed on Shinyapps.io, ClustSize not only enforces precise cluster cardinalities, but also facilitates dynamic parameter tuning and visualisation for enhanced user exploration. We comprehensive validate its performance through comprehensive benchmarking, also evaluating runtime, RAM usage, load, and stress conditions, and gather usability insights via user surveys. Post-deployment evaluations confirm that both algorithms consistently produce clusters that exactly meet the specified size limits, and that the system reliably supports up to 50 concurrent users and maintains functionality under stress, processing approximately 90 requests in 5 seconds. These results highlight the potential of integrating advanced size-constrained clustering into interactive web platforms for practical data analysis.

Download


Paper Citation


in Harvard Style

Vallejo-Huanga D., Ferri C. and Martínez-Plumed F. (2025). ClustSize: An Algorithmic Framework for Size-Constrained Clustering. In Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-758-0, SciTePress, pages 481-490. DOI: 10.5220/0013558900003967


in Bibtex Style

@conference{data25,
author={Diego Vallejo-Huanga and Cèsar Ferri and Fernando Martínez-Plumed},
title={ClustSize: An Algorithmic Framework for Size-Constrained Clustering},
booktitle={Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2025},
pages={481-490},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013558900003967},
isbn={978-989-758-758-0},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 14th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - ClustSize: An Algorithmic Framework for Size-Constrained Clustering
SN - 978-989-758-758-0
AU - Vallejo-Huanga D.
AU - Ferri C.
AU - Martínez-Plumed F.
PY - 2025
SP - 481
EP - 490
DO - 10.5220/0013558900003967
PB - SciTePress