A Machine-Learning, Predictive-Analytical Model for Thyroid-Cancer Risk Assessment

Sanjay Manda, Manohar Adapa, Harsha Sai Jasty, Rishma Sree Pathakamuri, Siddhartha Vinnakota, Bonaventure Chidube Molokwu

2025

Abstract

Thyroid cancer is a significant health problem globally due to the increasing number of people being diagnosed, while existing methods to diagnose it heavily rely on invasive biopsies and imaging that fail to account for various patient risk factors. This research aims to develop a comprehensive and precise model to forecast thyroid cancer risk through the application of state-of-the-art machine learning techniques. We utilized a number of preprocessing methods such as imputation of missing values, outlier detection, categorical feature encoding, and the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance. We utilized advanced feature engineering methods such as polynomial transformation, logarithmic scaling, and clinical risk scoring to extract important predictive patterns. Our model was thoroughly tested using the CatBoost (Categorical Boosting) algorithm against other algorithms (Logistic Regression, Random Forest, XGBoost, and LightGBM). The CatBoost model showed outstanding prediction performance with 88% accuracy, 93% precision, 78% recall, 85% F1-score, and ROC-AUC of 90%. These findings suggest that CatBoost can differentiate well between thyroid cancer high-risk and low-risk cases. This robust prediction model identifies individuals at risk early and accurately, assists in making informed clinical decisions, and could reduce healthcare expenditure and prevent futile treatment, improving patient quality of life.

Download


Paper Citation


in Harvard Style

Manda S., Adapa M., Jasty H., Pathakamuri R., Vinnakota S. and Molokwu B. (2025). A Machine-Learning, Predictive-Analytical Model for Thyroid-Cancer Risk Assessment. In Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST; ISBN 978-989-758-772-6, SciTePress, pages 350-357. DOI: 10.5220/0013692200003985


in Bibtex Style

@conference{webist25,
author={Sanjay Manda and Manohar Adapa and Harsha Jasty and Rishma Pathakamuri and Siddhartha Vinnakota and Bonaventure Molokwu},
title={A Machine-Learning, Predictive-Analytical Model for Thyroid-Cancer Risk Assessment},
booktitle={Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST},
year={2025},
pages={350-357},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013692200003985},
isbn={978-989-758-772-6},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Web Information Systems and Technologies - Volume 1: WEBIST
TI - A Machine-Learning, Predictive-Analytical Model for Thyroid-Cancer Risk Assessment
SN - 978-989-758-772-6
AU - Manda S.
AU - Adapa M.
AU - Jasty H.
AU - Pathakamuri R.
AU - Vinnakota S.
AU - Molokwu B.
PY - 2025
SP - 350
EP - 357
DO - 10.5220/0013692200003985
PB - SciTePress